APA Handbook of Behavior Analysis [1] 9781433811111, 7507307228, 0902895291, 0882797431

Behavior analysis emerged from the nonhuman laboratories of B. F. Skinner, Fred Keller, Nate Schoenfeld, Murray Sidman,

2,060 121 18MB

English Pages 557 Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

APA Handbook of Behavior Analysis [2] 9781433811111, 7507307228, 0902895291, 0882797431

Behavior analysis emerged from the nonhuman laboratories of B. F. Skinner, Fred Keller, Nate Schoenfeld, Murray Sidman,

1,953 110 8MB Read more

APA Handbook of Psychopathology [1] 2017033373, 9781433828362, 1433828367

Volume 1: Psychopathology: Understanding, Assessing, and Treating Adult Mental Disorders

864 95 8MB Read more

APA Handbook of Psychopathology [2] 2017033373, 9781433828362, 1433828367

Volume 2: Child and Adolescent Psychopathology

360 26 5MB Read more

The Concise APA Handbook: APA 7th Edition [7 ed.] 9781648021831, 9781648021848, 9781648021855

4,456 630 17MB Read more

Applied Behavior Analysis

1,734 124 3MB Read more

APA Handbook of Clinical Psychology [1, First ed.] 9781444301724, 0735702893, 9781483325569

The Handbook provides a comprehensive overview of: the history of clinical psychology, specialties and settings, theoret

243 21 4MB Read more

Handbook of Moral Behavior and Development. Volumen 1: Theory

861 87 18MB Read more

Handbook of Applied Behavior Analysis: Integrating Research into Practice 3031199634, 9783031199639

This book provides comprehensive coverage of applied behavioral analysis (ABA). It examines the history and training met

902 125 51MB Read more

Handbook of Applied Behavior Analysis (3D Photorealistic Rendering) [2 ed.] 1462543758, 9781462543755

Widely regarded as the authoritative work on the principles and practice of applied behavior analysis (ABA), this indisp

1,276 170 9MB Read more

Structural Analysis, Understanding Behavior 9781119321569

445 45 47MB Read more

APA Handbook of Behavior Analysis [1]
9781433811111, 7507307228, 0902895291, 0882797431

Author / Uploaded
Gregory J. Madden

Categories
Psychology

Table of contents :
I. Overview

Single-Case Research Methods: An Overview
Iver H. Iversen
The Five Pillars of the Experimental Analysis of Behavior
Kennon A. Lattal
Translational Research in Behavior Analysis
William V. Dube
Applied Behavior Analysis
Dorothea C. Lerman, Brian A. Iwata, and Gregory P. Hanley

II. Single-Case Research Designs

Single-Case Experimental Designs
Michael Perone and Daniel E. Hursh
Observation and Measurement in Behavior Analysis
Raymond G. Miltenberger and Timothy M. Weil
Generality and Generalization of Research Findings
Marc N. Branch and Henry S. Pennypacker
Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology
Neville M. Blampied
Visual Analysis in Single-Case Research
Jason C. Bourret and Cynthia J. Pietras
Quantitative Description of Environment–Behavior Relations
Jesse Dallery and Paul L. Soto
Time-Series Statistical Analysis of Single-Case Data
Jeffrey J. Borckardt, Michael R. Nash, Wendy Balliet, Sarah Galloway, and Alok Madan
New Methods for Sequential Behavior Analysis
Peter C. M. Molenaar and Tamara Goode

III. The Experimental Analysis of Behavior

Pavlovian Conditioning
K. Matthew Lattal
The Allocation of Operant Behavior
Randolph C. Grace and Andrew D. Hucks
Behavioral Neuroscience
David W. Schaal
Stimulus Control and Stimulus Class Formation
Peter J. Urcuioli
Attention and Conditioned Reinforcement
Timothy A. Shahan
Remembering and Forgetting
K. Geoffrey White
The Logic and Illogic of Human Reasoning
Edmund Fantino and Stephanie Stolarz-Fantino
Self-Control and Altruism
Matthew L. Locey, Bryan A. Jones, and Howard Rachlin
Behavior in Relation to Aversive Events: Punishment and Negative Reinforcement
Philip N. Hineline and Jesús Rosales-Ruiz
Operant Variability
Allen Neuringer and Greg Jensen
Behavioral Pharmacology
Gail Winger and James H. Woods

Citation preview

Contents VOLUME 1. METHODS AND PRINCIPLES Part I. Overview Chapter 1. Single-Case Research Methods: An Overview ……………………………………… 3 Iver H. Iversen Chapter 2. The Five Pillars of the Experimental Analysis of Behavior ………………... 33 Kennon A. Lattal Chapter 3. Translational Research in Behavior Analysis …………………………………….. 65 William V. Dube Chapter 4. Applied Behavior Analysis ………………………………………………………………… 81 Dorothea C. Lerman, Brian A. Iwata, and Gregory P. Hanley

Part II. Single-Case Research Designs Chapter 5. Single-Case Experimental Designs …………………………………………………… 107 Michael Perone and Daniel E. Hursh Chapter 6. Observation and Measurement in Behavior Analysis ………………………. 127 Raymond G. Miltenberger and Timothy M. Weil Chapter 7. Generality and Generalization of Research Findings ……………………….. 151 Marc N. Branch and Henry S. Pennypacker Chapter 8. Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology …………………………………………………………... 177 Neville M. Blampied Chapter 9. Visual Analysis in Single-Case Research ………………………………………….. 199 Jason C. Bourret and Cynthia J. Pietras

Chapter 10. Quantitative Description of Environment–Behavior Relations ……………………………………………………………………………. 219 Jesse Dallery and Paul L. Soto Chapter 11. Time-Series Statistical Analysis of Single-Case Data …………… 251 Jeffrey J. Borckardt, Michael R. Nash, Wendy Balliet, Sarah Galloway, and Alok Madan Chapter 12. New Methods for Sequential Behavior Analysis ………………….. 267 Peter C. M. Molenaar and Tamara Goode

Part III. The Experimental Analysis of Behavior Chapter 13. Pavlovian Conditioning ………………………………………………………. 283 K. Matthew Lattal Chapter 14. The Allocation of Operant Behavior ……………………………………. 307 Randolph C. Grace and Andrew D. Hucks Chapter 15. Behavioral Neuroscience ……………………………………………………. 339 David W. Schaal Chapter 16. Stimulus Control and Stimulus Class Formation …………………. 361 Peter J. Urcuioli Chapter 17. Attention and Conditioned Reinforcement …………………………. 387 Timothy A. Shahan Chapter 18. Remembering and Forgetting …………………………………………….. 411 K. Geoffrey White Chapter 19. The Logic and Illogic of Human Reasoning ……………………….… 439 Edmund Fantino and Stephanie Stolarz-Fantino

Chapter 20. Self-Control and Altruism …………………………………………………… 463 Matthew L. Locey, Bryan A. Jones, and Howard Rachlin Chapter 21. Behavior in Relation to Aversive Events: Punishment and Negative Reinforcement ………………………….. 483 Philip N. Hineline and Jesús Rosales-Ruiz Chapter 22. Operant Variability …………………………………………………………….. 513 Allen Neuringer and Greg Jensen Chapter 23. Behavioral Pharmacology …………………………………………………… 547 Gail Winger and James H. Woods

Chapter 1

Single-Case Research Methods: An Overview Iver H. Iversen

My experiments had indeed gone well. I was getting data from a single rat that were more orderly and reproducible than the averages of large groups in mazes and discrimination boxes, and a few principles seemed to be covering a lot of ground. (Skinner, 1979, p. 114) Replication is the essence of believability. (Baer, Wolf, & Risley, 1968, p. 95) Single-case research methods refer to a vast collection of procedures for conducting behavioral research with individual subjects. Such methods are used in basic research and for improving behavioral problems with educational and therapeutic interventions. Analyses and interpretations of data collected with research methods for individual subjects have developed into procedures that are considerably different from those used in research with groups of subjects. In this chapter, I provide an overview of designs, analyses, and interpretations of research results and treatment outcomes using single-case research methods. Background and History The case in single-case research methods essentially refers to a unit of analysis for an individual, a few people in a group, or a large group with varying membership. Single-case research methods should be contrasted with single-case studies that ordinarily consist of anecdotal narrations of what happened to a given person. No treatments or manipulations of experimental variables take place in case studies.

Research Methods Using Single Subjects The single-case research method involves repeated measures of one individual’s behavior before, during, and often after an experimental, educational, or therapeutic intervention. Data are collected repeatedly over several observation periods or sessions in what is customarily called a time series. The objective is to change a behavior to determine the variables that control that behavior. When environmental variables have been found that reliably change behavior, the method can be used to control behavior. The investigator, educator, or therapist can make the behavior start and stop and change frequency or duration at a specific time and place. Therefore, single-case methods provide a tool for a science of behavior at the level of the individual subject. When an educator or therapist can effectively control the client’s behavior, then methods exist for helping that client acquire new behavior or overcome a problem with existing behavior. This ability to help a client by using methods to control the client’s behavior is exactly what often creates controversy around the use of such methods. The methods for controlling behavior and for helping the client are the same, but the verbs control and help are not synonymous. To control behavior means to be able to change behavior reliably, and in this context control has a technical meaning originating from the laboratory. The word control, however, also has political and societal meanings related to authoritative restrictions of behavior for the individual. The intended helping function of an applied behavior

I thank Dominik Guess and Wendon Henton for valuable comments on earlier versions of this chapter. DOI: 10.1037/13937-001 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

3

Iver H. Iversen

science is the opposite—to establish enrichment and expansion of the individual’s behavioral repertoire, not to restrict it. This complex issue has lead to several considerations regarding the ethics involved in helping a client. For example, Skinner (1978) argued that a client who needs help to obtain essential goods for survival should have a right to acquire a behavior that can provide the goods rather than merely be provided the goods regardless of any behavior. Similarly, Van Houten et al. (1988) stated that individuals who are recipients . . . of treatment designed to change their behavior have the right to a therapeutic environment, services whose overriding goal is personal welfare, treatment by a competent behavior analyst, programs that teach functional skills, behavioral assessment and ongoing evaluation, and the most effective treatment procedures available. (p. 381) Because of the overall success of single-case research methods, a plethora of articles, book chapters, and entire books devoted to the topic have appeared over the past 40 years. Recent publications in this vast literature illustrate the wide use of single-case research methods: Specific topics are basic methodology (Barlow, Nock, & Hersen, 2009; J. M. Johnston & Pennypacker, 2009), educational research (Kennedy, 2005; Sulzer-Azaroff & Mayer, 1991), health sciences (Morgan & Morgan, 2009), clinical and applied settings (Kazdin, 2011), community settings (O’Neill, McDonnell, Billingsly, & Jenson, 2010), and medicine (Janosky, Leininger, Hoerger, & Libkuman, 2009). The term single-case research methods is synonymous with a variety of related terms, the most common of which are single-subject designs and N = 1 designs. The last term should be avoided because it is misleading. N = 1 obviously means the use of only one subject. However, N has an entirely different meaning in statistics, where it stands for the number of data points collected, not for the number of subjects. In customary group research, each subject contributes exactly one data point. In contrast, single-case research methods generate a high 4

number of data points for each individual because of the repeated observations of that individual’s behavior. Therefore, it is incorrect and misleading to refer to the single-case research method as an N = 1 design, falsely implying that only one data point has been collected and that the researcher is trying to promote a finding based on a single data point.

Glimpses Into the History of Single-Case Research Methods I. P. Pavlov’s (1927) Conditioned Reflexes had a major influence on B. F. Skinner’s decision to study psychology and on his choice of research methodology (Catania & Laties, 1999; Iversen, 1992; Skinner, 1966). Skinner was impressed with Pavlov’s precise, quantitative measures of behavior in one organism at a time. Thus, Skinner (1956) once wrote, “I had the clue from Pavlov: Control your conditions and you will see order” (p. 223). Pavlov described in great detail control conditions and recording of individual drops of saliva at a given time of day for a single animal. Figure 1.1 shows an early version of a single-case research method used by Pavlov (1928). The dog, Krasavets, had previously been conditioned to a positive tone (positive stimulus, or S+) with food powder. Over nine trials, Pavlov alternated the S+ with a negative tone (negative stimulus, or S−) that had not been conditioned to food. The S+ elicited several drops of saliva from two glands, and the S− elicited no saliva. The

Figure 1.1. Illustration of an early single-case research method. Detailed data for a single dog from Pavlov’s experiments on classical conditioning. Positive tone is followed by food; negative tone is not. From Lectures on Conditioned Reflexes: Twenty-Five Years of Objective Study of the Higher Nervous Activity (Behaviour) of Animals (Vol. 1, p. 173), by I. P. Pavlov, 1928, New York, NY: International Publishers. In the public domain.

Single-Case Research Methods

a lternation was irregular by design so that Pavlov could examine what happened to the conditioned reflex to S+ after several presentations of S−; indeed, the elicitation was reduced, as can be seen for S+ at 2:32. Thus, at the level of the individual dog, Pavlov first demonstrated a baseline of reliable elicitation of saliva to S+ and then demonstrated that repeated, successive presentations of S− inhibited the flow of saliva on the next presentation of S+. Pavlov’s work in Russia and contemporary work in Europe by Wundt on perception and by Weber and Fechner on psychophysics grew from physiology, in which the customary method was to investigate the effects of independent variables on individual organisms (Boring, 1929). In Europe, Ebbinghaus (1885/1913) became famous for his memory studies using a single subject (himself). In the United States, Thorndike’s (1911) research on the law of effect with individual cats quickly became well-known (Boakes, 1984). Apparently, Thorndike was ahead of Pavlov by a few years. Thus, Pavlov (1928) wrote, Some years after the beginning of the work with our new method I learned that somewhat similar experiments on animals had been performed in America, and indeed not by physiologists but by psychologists. Thereupon I studied in more detail the American publications, and now I must acknowledge that the honour of having made the first steps along this path belongs to E. L. Thorndike. By two or three years his experiments preceded ours. (pp. 39–40) Pavlov’s work apparently did not appear in English until a paper in Science (Pavlov, 1906) and the translation of Conditioned Reflexes (Pavlov, 1927), long after Thorndike had finished his early animal research. In addition, Jacque Loeb, educated in the German tradition of physiology of individual animals (e.g., Loeb, 1900), had a major influence on Pavlov, Thorndike, Watson, and Skinner (Greenspan & Baars, 2005). At the risk of digressing too far into the history of psychology, I cannot help mentioning the work of the French physiologist Claude Bernard

(1865/1957). Nearly 150 years ago, and long before Loeb, Pavlov, Thorndike, Watson, and Skinner, he articulated in clear terms the need for what he called comparative experiments with individual, intact animals. When using several animals for comparison, he realized that the errors in his data stemmed from variations from animal to animal, and he wrote that “to remove this source of error, I was forced to make the whole experiment on the same animal . . . because in this respect two frogs are not always comparable” (p. 183). Bernard’s work appeared in Russian, and his work had a major influence on Russian physiologists while Pavlov was a young researcher. Of particular historical significance, Pavlov’s main professor, Botkin, had been a student of Bernard’s (see Paré, 1990), and Pavlov expressed the greatest admiration for Bernard’s experimental approaches (Todes, 2002). Indeed, according to Wood (2004), Pavlov was an apostle of Bernard’s. As for single-case methods with humans, Watson and Rayner (1920) and Jones (1924) demonstrated conditioning and extinction of fear reactions in infants (i.e., “little Albert” and “little Peter”). The methods were crude, and the data were qualitative descriptions with scant operational definitions. However, multiple training conditions, repetition of test conditions, and tests for transfer made these studies influential in psychology (Harris, 1979). Thorndike (1927) reported an impressive laboratory study with adults that serves as a very early model of the A-B-A design (see below). Subjects received instructions such as “Draw an x-inch line,” where x was 3, 4, 5, or 6. Such instructions were presented in random order, and, being blindfolded, the subjects never saw what they drew. The consequence, or effect, of drawing was the experimenter saying either “right” or “wrong.” Each subject had one early session with no effect delivered, then seven training sessions with an effect, and then one more late session without an effect. Figure 1.2 shows average data for all subjects and individual data for two subjects. All subjects improved accuracy of line drawing during training but dropped in accuracy when the effect was removed in the late test; 16 of 24 subjects had a gain compared with their scores in the early test (e. g., Subject 29), whereas eight subjects had no gain and instead a drop (e.g., 5

Iver H. Iversen

S ubject 42) compared with the early test. Thorndike concluded that the effect was responsible for the gain in line-drawing accuracy. In the area of motor learning and control, this experiment is considered a classic method for demonstration of the effects of feedback on performance and learning (e.g., Schmidt & Lee, 2005). With the advent of Skinner’s (1938) rigorous experimental methods featuring operationally defined measures of behavior and highly controlled conditions for individual animal subjects, singlecase research methods developed rapidly and laid the foundation for successful application to humans, which began around the 1950s. For examples of early behavior research and therapy using humans (i.e., 1950–1965), see the collections of articles in Eysenck (1960) and Ullman and Krasner (1965). In addition, Wolpe (1958) developed single-case methods for treatment of phobias in humans. The methods of behavior analysis for individual subjects were laid out clearly in influential texts by Keller and Schoenfeld (1950), Skinner (1953), and Bijou and Baer (1961). The Journal of the Experimental Analysis of Behavior was founded in 1958 and published experiments based on single-case methodology using both human and nonhuman subjects. When the Journal of Applied Behavior Analysis appeared in 1968, single-case research methods were further established as important research tools that were not for animals only. For a more thorough history of behavior analysis and single-case research methods, see Barlow et al. (2009), Kazdin (2011), and Blampied (1999).

Scientific Method Figure 1.2. Example of a historically early A-B-A design for all subjects (top) and for two subjects (middle and bottom). Twenty-four blindfolded human subjects drew lines. Percentage correct expresses how many lines were drawn within criterion length (3, 4, 5 or 6 inches). In early and late tests (Thorndike’s terms), no consequence was presented to the subjects, whereas during training subjects were presented with the experimenter’s saying “right” or “wrong” depending on their performance. Data from Thorndike (1927).

6

One important method, namely that of comparing collected data from at least two different conditions, is common across different research disciplines. Reaching this point has not come easy. The history of science is full of anecdotes that illustrate the struggle a particular scientist had in convincing contemporary scholars of new findings. Boorstin (1983) related how Galileo had to go through painstaking steps of comparing different conditions to demonstrate that through his telescope one could in fact see how things at a great distance looked. In about 1650, Galileo first aimed the telescope at buildings

Single-Case Research Methods

and other earthly objects and then made a drawing of what he saw. Then he walked to the location seen in the telescope and made a drawing of what he saw there. He compared such drawings over and over to demonstrate that the telescope worked. Using this method, Galileo could prepare his audience for an important inductive step. When the telescope was aimed at objects in the sky, the drawing made of what one saw through the telescope would reflect the structure of what existed at the faraway location. As is well-known from the history of science, many of Galileo’s contemporaries said that what they saw in the telescope was inside the telescope and not far away. Yet, his method of repeated comparisons of different viewing conditions was an important step in convincing at least some scientists of his time that a telescope was an instrument that could be used to view distant, unreachable objects. In a review of the history of experimental control, Boring (1954) gave a related example of how Pascal in 1648 tested whether a new instrument could measure barometric pressure. When a glass tube closed at one end was filled with mercury and then inverted with the open end immersed in a cup of mercury, a vacuum would form at the closed end of the tube. Pascal had the idea that the weight of air pressing on the mercury in the open cup influenced the height of the column of mercury. Hence, the length of the vacuum at the top of the tube should change at a higher altitude, where the weight of air was supposedly less than on the ground. Pascal sent family members with two such instruments made of glass tubes, cups, and mercury to take the measurements. Readings were first made for both instruments at the foot of a mountain. Then one person stayed with one instrument and took readings throughout the day. Another person carried the second instrument up the mountain and took measurements at different locations; when back at the foot of the mountain, measurements were taken again with both instruments. Clearly, the essence of the method is the comparison between one condition at the foot of the mountain, the control condition, and the elevated conditions. Because the control readings stayed the same throughout the day, whereas the elevated condition readings changed, proof existed that the instrument worked as intended. This experiment was a real-life

demonstration by Pascal of a scientific principle as well as of a method of testing. Although these anecdotes are amusing several hundred years later for their extreme and cumbersome methods of testing, the practicing scientist of today still on occasion faces opposition to conclusions from experiments. Thus, behavior analysts sometimes find themselves in a position in which it is difficult to convince psychologists with different training that changes in contingencies of reinforcement can bring about large and robust behavior changes. For example, I once presented to colleagues in different areas of psychology some data that demonstrated very reliable and precise stimulus control of behavior in rats. A colleague objected that such mechanistic, on–off control of behavior is not psychology. Others in the audience were suspicious of the extremely low variability in the data, which was believed to have come from an equipment error and not from the method of controlling behavior. The method of comparing two or more series of readings with the same measurement instrument under two or more different conditions is the hallmark of the scientific method (Boring, 1954). Whether the research design is a between-groups or a within-subject comparison of the effects of a manipulation of a variable on behavior, the shared element is always the comparisons of different conditions using the same measurement (see also Baer, 1993). Designs for Single-Case Research Methods A variety of designs have been developed for research with animals and humans. Because this chapter is an overview, I can only describe the most common designs. Chapter 5 of this volume covers designs in considerably more detail. Experimental designs are specific to the problem the researcher seeks to investigate. However, an experimental design is also like an interface between the researcher and the subject because information passes in both directions. An essential aspect of single-case research methods is the interaction between subject and researcher, educator, or therapist. Pavlov, Thorndike, and Skinner, as originators 7

Iver H. Iversen

of the single-case method, modified their apparatus and experimental procedures for individual animals depending on how the animal reacted to the experimental procedures (Skinner, 1956). The scientist modifies the subject’s behavior by experimental manipulations that, when reliable and replicable, constitute the foodstuff of science; equally as important for the process of discovery is the subject’s influence on the researcher’s behavior. Sidman (1960) described how the researcher needs to “listen” to how the data change in accordance with experimental manipulations. To discover laws of behavior, the experimenter needs to adjust methods depending on how they affect the subject’s behavior. In basic research, the experimental situation is often a dynamic exchange between researcher and subject that may lead to novel discoveries and designs. In applied behavior analysis, as successful experiments and treatments are replicated over time, designs tend to become relatively fixed and standardized. The strength of standardization is a high degree of internal validity and ease in communicating the design. However, the weakness of standardization is that researchers or therapists may primarily investigate problems that fit the standard designs (Sidman, 1981). Nonetheless, the standard designs covered here serve as basic building blocks in both basic research and application. My main focus is the underlying logic of each of the covered designs.

A-B Design The essence of single-case research methods is the comparison of repeated measures of the same behavior for the same individual for at least two different experimental, educational, or treatment conditions. Figure 1.3 shows a schematic of an A-B design, using hypothetical data. Phase A is repeated baseline recording of behavior across several observation periods (sessions) before any intervention takes place. The intervention is a change in the individual’s environment, also across several sessions, usually in the form of a change in how reinforcement is provided contingent on behavior. The data in Phase A serve as a comparison with behavior changes in Phase B. However, the comparison is logically from an extension of data from Phase A to Phase B. Data in Phase A are used to forecast or predict what would have happened during the next sessions had the intervention in Phase B not been introduced. If the data in Phase A are without trends and of low variability and are taken across several sessions, then the experienced researcher or therapist predicts and assumes that this behavior will continue at that level if no interruptions in the environment occur. Thus, the dotted line in Phase B in Figure 1.3 indicates the projected trend of the behavior from Phase A to Phase B. The effect on behavior of introducing the change in Phase B is thus evaluated against the backdrop of the projected data from Phase A. Because forecast behavior cannot be measured, the

Figure 1.3. A-B design using hypothetical data. 8

Single-Case Research Methods

difference between Phase A and Phase B data constitutes the experimental effect, as indicated by the bracket. The validity of the statement that the intervention in Phase B caused the change in behavior from A to B thus depends not only on the change of behavior in Phase B but also on how good the prediction is that the behavior would have remained the same had the intervention not been introduced in Phase B. Ideally, the behavior should not change in Phase B unless the intervention took place. The change should not occur by chance. Because other changes in the environment might take place at the same time as the intervention is introduced, the researcher or therapist can never be sure whether the intervention or some other factor produces the behavior change. Such unintended factors are customarily called confounding variables. For example, a child may have an undesirable low rate of smiling in the baseline phase (A). Then when social reinforcement is explicitly made contingent on smiling in the intervention phase (B), the rate of smiling may increase above that in the baseline (A). If the sessions stretch out over several weeks, with one session each day, then a host of other changes could happen at the same time as the intervention. For example, a parent could return from a trip away from home, the child could recover from the flu, a bully in class may be away for a few weeks, and so on. Such other factors could have made the rate of smiling increase by themselves or in addition to the effect of providing social reinforcement for smiling. For additional considerations regarding the role of confounding variables, see Kazdin (1973). Because of the potential difficulty in controlling such confounding variables,

the A-B design by itself is uncommon in clinical and educational situations. However, the A-B design is very common in laboratory demonstrations, especially with animal subjects, where confounding variables can be controlled experimentally.

A-B-A Design To demonstrate clear experimental control while reducing the potential influence of confounding variables, the experimenter can supplement the A-B design with a withdrawal of the change in Phase B and a return to the conditions of Phase A, as a backto-baseline control. This design is therefore often called a withdrawal design. When the baseline measured in the first Phase A can be recovered in the second Phase A, after the intervention has changed the behavior in Phase B, then the researcher has shown control of behavior and demonstrated high internal validity. The behavior is changed by the intervention and changed back when the intervention is removed. The researcher therefore has full control over the behavior of the individual subject or client, which means that the behavior can be changed at the researcher’s or therapist’s discretion. Figure 1.4 shows a cumulative record of lever pressing for a single rat during baseline, acquisition, and extinction of lever pressing within one session. The rapid increase in lever pressing when it produces food contrasts with the very low rate of lever pressing during the baseline phase when lever pressing did not produce food. The rapid decline in lever pressing when reinforcement is withdrawn demonstrates the control reinforcement had over the behavior. The experienced investigator comes to learn that when such clear control is obtained with

Figure 1.4. A-B-A design: a cumulative record of lever pressing by a single rat. A = baseline with response-independent reinforcement. B = continuous reinforcement; each lever press produced reinforcement. Second A = extinction. Data from Iversen (2010). 9

Iver H. Iversen

one individual, then it will also be found with other, similar individuals. In clinical and educational applications, the A-B-A design is useful to show that an intervention is effective and that a given behavior change is under the therapist’s control and not caused by other factors. However, a critical problem arises with this design regarding clinical or educational significance for the participant and for the caregivers. If the therapist follows an A-B-A design and changes behavior from a low level in Phase A to a high level in Phase B and then back again to a low level in the second Phase A to demonstrate control, in the end there is no gain for the client or for the caregivers. The educator or therapist can hardly convince the caregivers that it is a great step forward to know that the behavior can now be brought back to the same level as when the participant came in for treatment. The A-B-A design has shown gain in control of the behavior but has produced no clinical or educational gain. Therefore, the A-B-A design is not useful as a stand-alone treatment in clinical or educational situations. A concern with the A-B-A design is that behavior that changed in Phase B may not always show a reversal back to the level seen in the first Phase A when the intervention is withdrawn. For example, for motor skill acquisition such as cycling or walking, the acquired behavior may not drop back to the baseline level when reinforcement is removed in the second Phase A. The individual has acquired a new skill in Phase B that may lead to new contingent reinforcers that were not within reach before the skill was acquired. When behavior does not return to baseline level after the intervention is removed, the possibility also exists that the behavior change was produced by or aided by a confounding variable and not by the intervention. The educator and therapist therefore face a dilemma. They want to help the client acquire a new behavior that is lacking or to suppress an existing unwanted behavior. They also want, however, to be able to communicate to other therapists and educators that they have developed a method to control the behavior of interest. To help the client, they desire the behavior change in Phase B to remain intact when the intervention is removed. To show 10

control over the behavior, they desire the behavior change in Phase B to not remain intact when the intervention is removed.

A-B-A-B Design To resolve some of these difficulties, an additional phase can be added to the A-B-A design in which the intervention (B) is repeated, thereby forming an A-BA-B design, often called a reversal-replication design or a double-replication design. Figure 1.5 (top) shows an example (R. G. Fox, Copeland, Harris, Rieth, & Hall, 1975) in which the number of math problems completed by one eighth-grade underachieving student changed when the teacher paid attention to her when she worked on the assignments. The number of problems completed shows a gradual and very large change in the treatment condition. Then, when the baseline was reinstated, the number dropped to baseline levels only to increase again when the treatment was reintroduced. Because the number of problems completed is high in both treatment (B) phases and low in both baseline (A) phases, the data demonstrate control by the intervention of reinforcing completion of math problems in a single individual child. This design satisfies both the need to demonstrate control of behavior and the need to help the client because the behavior is at the improved level when the treatment ends after the last Phase B. This off–on–off–on logic of the A-B-A-B design mimics daily situations in which people determine whether something works or not by turning it on and off a few times. The A-B-A-B design shows that control of behavior can be repeated for the same individual— that is, intrasubject replication. This ability of the researcher to replicate the behavioral change in a single individual provides a tremendous source of knowledge about the conditions that control behavior. Such results are of practical significance for teachers and family or caregivers because as Baer et al. (1968) succinctly stated, “Replication is the essence of believability” (p. 95). The A-B-A-B design is customarily shown as a time series with several sessions in each phase. The underlying off–on–off–on logic also exists in other procedures in which two conditions alternate several times within a session, as in multiple schedules (e.g., two or more schedules of reinforcement

Single-Case Research Methods

Figure 1.5. Top: A-B-A-B design with follow-up (post checks). From “A Computerized System for Selecting Responsive Teaching Studies, Catalogued Along Twenty-Eight Important Dimensions, by R. G. Fox, R. E. Copeland, J. W. Harris, H. J. Rieth, and R. V. Hall, in E. Ramp and G. Semb (Eds.), Behavior Analysis: Areas of Research and Application (p. 131), 1975, Englewood Cliffs, NJ: Prentice-Hall, Inc. Copyright 1975 by Prentice-Hall, Inc. Reprinted with permission. Bottom: Illustration of a within-session repeated AB * N design. Event record showing onset of discriminative stimulus, first pen; response, second pen; reinforcement, third pen. Data are for one rat after discrimination training showing perfect discrimination performance. Data from Iversen (2010).

alternate after a few minutes, each under a separate stimulus). Such A-B-A-B-A-B-A-B . . . or A-B * N designs demonstrate powerful, repeated behavior control. The A-B * N design can also be implemented at the moment-to-moment level. Figure 1.5, bottom, shows a sample of an event record in which a rat promptly presses a lever each time a light turns on and almost never presses it when the light is off. The A-B * N design can itself serve as a baseline for other designs. For example, this discrimination procedure can serve as a baseline for assessment of the effects of other factors such as food deprivation, stimulus factors, drugs, and so forth. The method of repeated A-B * N changes within a session can also serve as a baseline for comparison with the outcome

on occasional test trials in which a stimulus is altered, such as, for example, in the determination of stimulus generalization. With various modifications, the repeated A-B changes within a session also form the basis of methods used in research and education, such as the matching-to-sample procedure (see Discrete-Trial Designs section, below).

Multiple-Baseline Designs A popular expansion of the A-B design is the multiple-baseline design. This design is used when reversing the behavior back to the baseline level is not desirable from an educational or therapeutic perspective. After an undesirable behavior has been changed to a desirable level, educators, therapists, 11

Iver H. Iversen

and caregivers are reluctant to bring the behavior back to its former undesirable level to prove that it was the intervention that changed the behavior to the acceptable level. Indeed, it may be considered unethical to force the removal of a desirable behavior acquired by a person with disability. With the multiple-baseline design, the possible influence of confounding variables is not assessed by withdrawing the intervention condition as with the A-B-A and A-B-A-B designs. Instead, data are compared with simultaneously running baselines for other behaviors, situations, or individuals. The multiplebaseline design probably had its formal origin in Baer et al. (1968). Multiple-baseline designs are not ordinarily used with animal subjects because reversals to baseline and replications are not undesirable or unethical. Figure 1.6 illustrates with hypothetical data the underlying logic of multiple-baseline designs. Two children in the same environment have the same behavioral deficit. In the top display, the target behavior is recorded concurrently for both children for several sessions as a baseline before intervention. For Peter, the intervention begins at Session 16 and continues for 10 sessions, and the target behavior shows a clear increase compared with baseline. For Allen, the behavior is still recorded throughout the intervention phase for Peter, but no intervention is scheduled for Allen. However, Allen also shows a similar large increase in the target behavior. Faced with such data, one would be forced to conclude either that the behavior change for Peter may not have been the result of the intervention but of other factors in the environment or that Allen imitated Peter’s new behavior change. Such data never find their way to publication because they do not demonstrate a clear effect of the intervention for him. The bottom display shows, also with hypothetical data, the more customary published data pattern from such a multiple-baseline design across subjects. The baseline for Allen continues undisturbed, whereas Peter’s behavior changes during the intervention. When the same intervention is then also introduced for Allen, his behavior shows an increase similar to that of Peter. The data show that the intended behavior change occurs only when the intervention is introduced. The data also show successful 12

Figure 1.6. Confounding variable in a multiple- baseline design. Top two graphs: Allen’s baseline behavior changes during the intervention for Peter, suggesting the influence of a confounding variable during intervention. Bottom two graphs: Allen’s behavior does not change during the intervention for Peter but changes when his intervention starts, suggesting control by the intervention. Data are hypothetical.

r eplication of the treatment effect. The benefit of this design is that when faced with such data, the therapist or educator can fairly safely conclude that the intervention caused the behavior change. An additional benefit for the client is that no forced withdrawal of treatment (return to baseline) occurs, which would have ruined any educational or therapeutic gain for that client. Figure 1.7 shows an empirical example of teaching words to an 11-year-old boy with autism using a

Single-Case Research Methods

Figure 1.7. Multiple-baseline design across behaviors for one child. From “Increasing Spontaneous Language in Three Autistic Children,” by J. L. Matson, J. A. Sevin, D. Fridley, and S. R. Love, 1990, Journal of Applied Behavior Analysis, 23, p. 231. Copyright 1990 by the Society for the Experimental Analysis of Behavior, Inc., Lawrence, KS. Reprinted with permission.

multiple-baseline design across behaviors (Matson, Sevin, Fridley, & Love, 1990). Baselines of three words were recorded concurrently. Saying “please” was taught first while the baseline was continued for the other two target words. Saying “thank you” was then taught while the baseline was continued for the last target word, which was the last to be taught. The data show that teaching one word made saying that word increase in frequency, whereas saying the other words did not increase in frequency. In addition, the data show that saying a given word did not increase in frequency until teaching for that particular word started. Thus, the data generated by this design demonstrate clear control over the target behaviors. In addition, the data show that the control was established by the teaching methods and not by other confounding variables. The logic of the multiple-baseline designs dictates that the baselines run in parallel (concurrently). For example, with a multiple-baseline design across subjects, one would not run the first subject in January, the next subject in February, and the third subject in March. The point of the design is that the baseline for the second subject guards

against possible confounding variables that might occur simultaneously with the introduction of the intervention for the first subject; similarly, the baseline for the third subject guards against possible confounding variables associated with the introduction of the intervention for the first and second subjects. The multiple-baseline design offers several levels of comparison of effects and no effects of the intervention (Iversen, 2012). Figure 1.8 illustrates, with hypothetical data, the many comparisons that can be made in multiple-baseline designs, in this case a multiple-baseline design across subjects. For example, with four clients, the baselines of the same behavior are recorded concurrently for all clients, and the intervention is introduced successively across clients. Each client’s baseline serves as a comparison for the effect of the intervention on that client (as indicated with dotted arrows at A, B, C, and D). The intervention phase for one client is compared with the baseline for the next client (reading from the top of the chart) to determine whether possible confounding variables might produce concomitant changes in the behavior of other 13

Iver H. Iversen

Figure 1.8. Logic of the multiple-baseline across-subjects design. A, B, C, and D arrows refer to change in behavior during the intervention compared with the baseline for each individual; a, b, c, d, e, and f arrows refer to the absence of a change for one individual when the intervention has an effect at the same time for another individual. Data are hypothetical.

clients when the behavior is made to change for the previous client. Thus, the dotted lines (marked a, b, c, d, e, and f ) indicate the possible assessments of effects of confounding variables. In this hypothetical example, there are four demonstrations of an intervention effect because the behavior score increases each time the intervention is introduced. In addi14

tion, there are six demonstrations of the absence of the possible effect of confounding variables because the behavior score does not change for one client when the intervention takes effect for another client. Ideally, to provide maximal internal validity, the intervention should produce a change in behavior for the client for whom the intervention takes place

Single-Case Research Methods

and should have no effect on the behavior of the other clients. Notice that the data for the client in the bottom display provides the most information, with a change when the intervention takes place for that client and three comparisons during the baseline to introduction of interventions for the other clients but without any change on the baseline for the last client. Baer (1975) noted that multiple-baseline designs offer an opportunity for systematic replication (across subjects, responses, or situations) but does not offer an opportunity for direct replication for the same subject, response, or situation (i.e., there is no return to baseline or subsequent return to intervention). In essence, the technique is repeated, but control over behavior is not. Thus, an essential component of functional behavior analysis is lost with the multiple-baseline design, yet the technique is well suited for applied behavior analysis. Successful, systematic replications of procedures across behaviors, subjects, and situations and across laboratories, classrooms, and clinics over time offer important evidence that the designs are indeed responsible for the behavior changes. In general, multiple-baseline designs are conducted across behaviors or situations for the same individual or across individuals (with the same behavior measured for all individuals). The designs have also been used for groups of individuals. Multiplebaseline designs are appealing to educators and therapists and have become so popular that they are often presented in textbooks as the golden example of modern behavior analysis tools. However, these designs do not quite live up to the original formulation of single-case research methods’ being an interaction between the investigator and the subject. The investigator should be able to change the procedure as needed depending on how the subject’s behavior changes as a function of the investigator’s experimental manipulations. To be successful, applied behavior analysis should guard against becoming a discipline in which participants are pushed through a rigid protocol with a predetermined number of sessions and fixed set of conditions regardless of how their behavior changes. In a recent interview, Sidman (as cited in Holth, 2010) pointed out that to be effective, both basic and applied research requires

a two-way interaction between experimenter and subject and therapist and client, respectively.

Gradual or Sudden Changes in A-B, A-B-A, A-B-A-B, and Multiple-Baseline Designs The literature review for this chapter revealed that two distinctly different patterns of behavior change appear to be associated with the A-B, A-B-A, A-B-A-B, and multiple-baseline designs. Figure 1.9 exemplifies this issue for the A-B-A-B design using hypothetical data so as not to highlight or critique specific studies. In the top graph, the behavior change is a gradual increase in the B phase and a gradual decrease in the second A phase with a return to baseline conditions, and the last B phase also shows a gradual increase in the behavior, as in the first B phase. In the bottom graph, the behavior in the first B phase shifts up abruptly as soon as the intervention takes

Figure 1.9. Two different patterns of data in A-B-A-B designs. The top display shows an example of gradual acquisition in both B phases and gradual extinction in the second A phase. The bottom display shows abrupt changes in the B phases and in return to the A phase. Data are hypothetical. 15

Iver H. Iversen

place, stays at the same level, and then just as abruptly shifts down to the same level as in the first A phase with a similar abrupt shift up in the last B phase. With animals, data customarily look as illustrated in the top graph, because phase B is usually contingent reinforcement and the change in Phase B can be considered behavior acquisition. Similarly, when reinforcement is removed in the second A phase, behavior will ordinarily extinguish gradually across sessions. In studies with humans, however, both of these data patterns appear in the literature, often without comment or clarification. When data show a large, abrupt change in the very first session of intervention and an equally large and abrupt change back to baseline in the first session of withdrawal, the participant either made contact with the changed reinforcement contingency immediately or responded immediately to an instruction regarding the change, such as “From now on you have to do x to earn the reinforcer” or “From now on you will no longer earn the reinforcer when you do x.” Thus, the two different behavior patterns observed in the A-BA-B design with human clients, gradual versus abrupt changes, could conceivably reflect whether the behavior was under control by contingencies of reinforcement or by discriminative stimuli (instruction). Skinner (1969) drew a distinction between contingency-shaped behavior and rule-governed behavior. Behavior that is rule governed already exists in the individual’s repertoire and is therefore switched on and off by the instructions, and the experiment or treatment becomes an exercise in stimulus control, or rule following. However, contingency-shaped behavior may not exist before an intervention intended to control the behavior, and the experiment or treatment becomes a demonstration or study of acquisition. Thus, with human participants, two fundamentally different behavioral processes may underlie the different patterns of behavior seen with the use of the A-B-A-B design (and also with the A-B, A-B-A, and multiple-baseline designs). Unfortunately, authors do not always explain the procedure carefully enough that one can determine whether the participant was instructed about changes in procedure when a new phase was initiated (see also Kazdin, 1973). Perhaps future systematic examinations of existing literature can 16

evaluate the frequency and root of the different behavior patterns in these designs.

Alternating-Treatments or Multielement Designs A variant of the A-B-A-B design is a more random alternation of two or more components of intervention or conditions for research. For example, the effects of three doses of a drug on a behavioral baseline of operant conditioning for an individual may be compared across sessions with the dose selected randomly for each session. Thus, a possible sequence of conditions might be B-A-A-C-B-C-C-AB-A-C-B, and so on. This design allows for random presentation of each condition and can assess sequential effects in addition to determining the effect of several levels of an independent variable. The basic design has its origin in Sidman (1960) under the label multielement manipulation and has since been labeled multielement design or alternating treatments. This design is somewhat similar to the design used in functional assessment (see the section Functional Assessment later in this chapter). However, an important difference is that in functional assessment, each condition involves assessment of existing behavior under familiar conditions, whereas with the alternating-treatments design, each condition is an intervention seeking to change behavior. Alternating-treatments designs are considerably more complex than what can be covered here (see, e.g., Barlow et al., 2009).

Changing-Criterion Designs When a behavior needs to be changed drastically in topography, duration, or frequency, an effective approach is to change the criterion for reinforcement in small steps. This approach is essentially the method of shaping by successive approximation (Cooper, Heron, & Heward, 2007). Concrete, measurable criteria are applied in a stepwise fashion in accordance with behavioral changes. Each step serves as a baseline for comparison to the next step. Withdrawals or reversals are rare because the goal is to establish a drastic behavior change. The method can be characterized as an A-B-C-D-E-F-G . . . design, although it is rarely written that way. For

Single-Case Research Methods

example, as a laboratory demonstration, the duration of lever holding by a rat may be increased in small steps of first 200 milliseconds across sessions, then 500 milliseconds, then 1 second, and so forth. Eventually, the rat may steadily hold the lever down for an extended period of time, say up to 10 seconds or longer if the final criterion is set at 10 seconds (e.g., Brenagan & Iversen, 2012). The changingcriterion design may also be used in the form of stimulus-shaping methods in educational research in which stimuli are faded in or faded out or modified in topography. For example, the spoken word house is taught when a stimulus, say a schematic of a house, is presented to a child. Over time the schematic is modified in many small steps into the printed word HOUSE; at each step, the correct response to the stimulus remains the spoken word house. A variety of other schematics of objects, each with its own separate spoken word, are similarly modified into printed words over time. Eventually, a situation is created in which the child may produce spoken words that correspond to the printed words (e.g., Cooper et al., 2007). McDougall, Hawkins, Brady, and Jenkins (2006) described various implementations of the changing-criterion design in education and suggested that the changing-criterion design can profitably be combined with A-B-A-B or multiple-baseline designs to establish optimal designs for individuals in need of large-scale behavior changes. Actual laboratory experiments and educational or clinical interventions using variations of changingcriterion designs are customarily highly complex mixtures of different methods. For example, to generate visual guidance of motor behavior in chimpanzees, Iversen and Matsuzawa (1996) introduced an automated training protocol to bring line drawing with a finger on a touch screen under stimulus control. Finger movement left “electronic ink” on the screen surface. The top-left diagram in Figure 1.10 shows a sketch of the 10-step training procedure. Each session presented in mixed order four trial types as four orientations of an array of circles (as indicated). The chimpanzees had to touch the circles to produce reinforcement. Stimuli changed across and within sessions. Thus, within Steps 2 and 3, the circles were moved closer across trials, and

the number of circles increased as well. The objective was to enable a topography change from touchlift to a continuous finger motion across circles, as in touch-drag. The lower diagram in Figure 1.10 shows the development of drawing for one subject for one of the four trial types on the monitor. Touch-lift is indicated by a dot and touch-drag (touch and then move the finger while it still has contact with the monitor) is indicated by a thick line that connects successive circles. The figure shows the interplay between procedure and results. The circles come closer together, the number of circles increases, and the chimpanzee’s behavior changes accordingly from touch-lift to touch-drag. Small specks of not lifting the finger between consecutive circles initially appear in Session 13, and the first full sweep across all circles without lifting appears already in Session 14 and is fully developed by Session 16. The time to complete each trial (vertical lines) shortens as the topography changed from touch-lift to touch-drag. Eventually, the chimpanzee swept over the circles in one movement for all four trial types. In the remaining steps, the stimuli were changed by fading techniques across and within sessions from an array of circles to just two dots, one where drawing should start and one where it should end. An additional aspect of the method was that the chimpanzees were also taught to end the trials themselves by pressing a trial termination key, which was introduced in Step 6. Thereby, the end of the drawn trace (lifting of the finger) came under control by the stimuli on the monitor and not by delivery of reinforcement. The final performance was a highly precise drawing behavior under visual guidance by the stimuli on the screen. Such stimulus control of complex motor performance was acquired with a completely automated method over a span of about 100 sessions (3–4 weeks) entirely by continuously rearranging reinforcement contingencies and stimulus-fading procedures in small steps without the use of verbal instruction. In the final performance, the chimpanzees would look at the dots, aim one finger at the start dot, rapidly move the finger across the monitor, lift the finger at the second dot, and then press the trial termination key to produce reinforcement, all in less than 1 second for each trial. The top right graph in Figure 1.10 shows the 17

Iver H. Iversen

Figure 1.10. Top left: Schematic of the experimental procedure. Top right: Frequency plot of angles of drawing, as illustrated in the top images. Bottom: Data are from one trial type, and all trials of that type are shown in six successive sessions. Number of circles and distance between circles changed within and across sessions. A dot indicates touch-lift, and a black line indicates touch-drag. From “Visually Guided Drawing in the Chimpanzee (Pan Troglodytes),” by I. H. Iversen and T. Matsuzawa, 1996, Japanese Psychological Research, 38, pp. 128, 131, 133. Copyright 1996 by John Wiley & Sons, Inc. Reprinted with permission.

resulting control of the angle of the drawn trace for each trial type. When behavior control techniques are successful and produce very reliable and smooth performance, spectators of the final performance may not quite believe that the subjects could at some point in time not do this. In fact, a renowned developmental psychologist happened to visit the laboratory while the drawing experiment was ongoing. On seeing one of the chimpanzees draw line after line smoothly and without hesitation (in Step 10), he exclaimed that the investigators had wasted their 18

time training the chimpanzees and added that they could “obviously” already draw because such smooth and precise motor performance could not at all have been acquired by simple shaping. Such comments may be somewhat amusing to behavior analysts. Yet, they are made by professionals in other areas of psychology and reveal a disturbing lack of understanding of and respect for effective behavior control techniques developed with the use of single-case research methods. Unfortunately, the commentaries also reveal a failure by behavior analysts to promote understanding about behavior

Single-Case Research Methods

analysis, even to professionals in other areas of psychology.

Discrete-Trial Designs A discrete trial is the presentation of some stimulus material and implementation of a response– reinforcer contingency that applies only in the presence of that stimulus. The discrete-trial design has long been a standard procedure for use with animals in all sorts of experiments within and outside of behavior analysis. Complex experiments and educational projects, for example, involving conditional discriminations (e.g., matching to sample), are often based on discrete-trial designs (see Volume 2, Chapter 6, this handbook). Historically, the discrete-trial method has become almost the hallmark of applied behavior analysis, especially for its use in education and treatment of children with intellectual disabilities (Baer, 2005). For example, the method may be as simple as presenting a picture of an animal, and the response that produces the reinforcer in this trial is the spoken name of the animal on the picture; in another trial, the picture may be of another animal, and the reinforced response is the name of that animal. Thus, the method is useful for teaching which stimuli (verbal or pictorial) should control which responses and when. Loosely speaking, the discretetrial method teaches when a response is permitted and when it is not. The method may be used in a very informal way, as in training when normal activities should and should not occur. The method may also be presented very formally in an automated arrangement, as in the above example with chimpanzees. In applied behavior analysis, the discretetrial method is useful for teaching single units of behavior such as acquisition of nouns but is less useful for teaching sequential behaviors, such as brushing teeth (Steege & Mace, 2007). Apparently, the very term discrete-trial teaching has of late, outside the field of applied behavior analysis, come to be considered as a simple procedure for simple behaviors. Attempts have been made, therefore, to present applied behavior analysis to audiences outside of behavior analysis as a considerably richer set of methods than just discrete-trial methods (Ghezzi, 2007; Steege & Mace, 2007).

Time Scales of Single-Case Research Methods The time scale can vary considerably for single-case research methods. Laboratory demonstrations with animals using the A-B-A design to show acquisition and extinction of a simple operant can usually be accomplished in a matter of 20 to 30 minutes (e.g., Figure 1.4). Educational interventions using A-B-AB designs may last for weeks. Therapeutic interventions for children with autism spectrum disorder may last a few years, with multiple changes in designs within this time period (e.g., Lovaas, 1987). Green, Brennan, and Fein (2002), for example, described a behavior analysis treatment project for one toddler with autism, Catherine. Treatment continued for 3 years with gradually increasing complexity, beginning with instruction in home settings, through other settings, and eventually to regular preschool settings with minimal instruction. Figure 1.11 shows the chronological order of skill introduction for the 1st year. The design is a gradual introduction of skill complexity in which previously acquired skills serve as prerequisites for new skills; the logic of the design is similar to that of the changing-criterion design, mentioned earlier, except that the criterion change is across topographically different behaviors and situations. There is no baseline for each skill other than the therapist’s knowledge that a given skill was absent or not sufficiently developed before it was explicitly targeted for acquisition treatment. There are no withdrawals because, for such real-life therapeutic interventions, they would force removal of an acquired, desirable skill. Effective withdrawals may even be impossible for many of the acquired skills such as eye contact, imitation, and speech. Instead, continued progress revealed that the overall program of intervention was successful. Besides, the study replicated previous similar studies. Green et al. concluded that “over the course of 3 years of intense, comprehensive treatment, Catherine progressed from exhibiting substantial delays in multiple skill domains to functioning at or above her age level in all domains” (p. 97). Multiple-baseline designs can on occasion last for years. D. K. Fox, Hopkins, and Anger (1987), for example, introduced a token reinforcement 19

Iver H. Iversen

Figure 1.11. Timeline for teaching various skills to a single child in an intense behavior analysis program. Arrows indicate that instruction continued beyond the time period indicated here. From “Intensive Behavioral Treatment for a Toddler at High Risk for Autism,” by G. Green, L. C. Brennan, and D. Fein, 2002, Behavior Modification, 26, p. 82. Copyright 2002 by Sage Publications. Reprinted with permission.

rogram for safety behaviors in two open-pit mines. p Concurrent baseline recordings of safety were recorded for both mines. After 2 years in one mine, the contingencies were changed for that mine, and the baseline was continued for the other mine for another 3 years before the contingencies were changed for that mine, too. For this 15-year project, at each mine the contingencies were applied at the level of both the individual worker and teams of workers. This example serves as a reminder that for 20

single-case research methods, the case may not necessarily be one individual but a group of individuals— and in this case, the group may even change members over time. The length of time a project lasts is not an essential aspect of single-case research designs. The essential aspects are that data consist of repeated observations or recordings and that such data are compared across different experimental or treatment conditions.

Single-Case Research Methods

Using Single-Case Designs for Assessment A multitude of methods have been developed to assess behavior. The general methodology is similar to single-case designs, which can be used in testing which specific stimuli control a given client’s behavior.

Functional Assessment Functional assessment seeks to ascertain the immediate causes of problem behavior by identifying antecedent events that initiate the behavior and consequent events that reinforce and maintain the behavior. The goal of assessment is to determine the most appropriate intervention for the given individual. For example, a child exhibiting self-injurious behavior may be placed in several different situations to determine which situations and possible consequences of the behavior in those situations affect the behavior frequency. The situations may be (a) alone, (b) alone with toys, (c) with one parent, (d) with the other parent, (e) with a sibling, and (f) with a teacher. The child is exposed to one situation each session with situations alternating across sessions. For determination of reliability, each situation is usually presented more than once. If the problem behavior in this example occurs most frequently when either parent is present, and it is observed that the parent interacts with the child when the problem behavior occurs, then the inference is drawn that the problem behavior may be maintained by parental interaction (i.e., positive reinforcement from attention) and that the parent’s entry is an antecedent for the behavior (i.e., parental entrance is an immediate cause of initiation of the behavior). The therapist will then design an intervention based on the information collected during assessment. Causes of the problem behavior are inferred from functional assessment methods, because no experimental analysis is performed in which the causes are manipulated systematically to determine how they influence behavior. Functional assessment of problem behaviors has become very prevalent; special issues and books have been published on how to conduct the assessments (e.g., Dunlap & Kincaid, 2001; Neef & Iwata, 1994). Recently, functional assessment has appeared to have taken on a life of its own, separate

from the goal of providing impetus for intervention, with publications presenting results from assessment alone without actual follow-up intervention. Thus, the reader is left wondering whether the causes of the problem behavior revealed by the assessment package were also causes that could be manipulated and whether such manipulations would in fact improve the problem behavior. Functional assessment serves as a useful clinical and educational tool to determine possible immediate causes of problem behavior. However, reliability of assessment outcome cannot be fully evaluated without a direct link between the inferred causes of problem behavior in assessment and the outcome of subsequent intervention using manipulations of these inferred causes for the same client.

Assessment Using Discrete-Trial Procedures Many children and adults with intellectual disorders have not acquired expressive language, and communication with them can be difficult or impossible. A method called facilitated communication claims to enable communication with such clients by having a specially trained person, the facilitator, hold the client’s hand or arm while the client types with one finger on a keyboard. Because some typed messages have expressed advanced language use without the clients ever having shown other evidence of such language, facilitated communication has been questioned as a means of authentic communication. The question is whether the facilitator rather than the client could be the author of the typed messages. To test for such a possibility, several investigators have used single-case research methods to examine stimulus control of typing (e.g., Montee, Miltenberger, & Wittrock, 1995; Wheeler, Jacobson, Paglieri, & Schwartz, 1993). The most common test is to first present the client and the facilitator with a series of pictures in discrete trials and ask the client to type the name of the object shown on the picture. If the client types the correct object names, then the method is changed by adding test probes on some trials. On those test trials, the pictures are arranged such that the facilitator sees one picture, and the client sees another picture; an important added 21

Iver H. Iversen

methodological feature is that the client cannot see the picture the facilitator sees, and vice versa. These studies have shown with overwhelming reliability that the clients type the name of the picture that the facilitators see; if the clients see a different picture, then the clients do not type the name of the picture they see but instead type the name of the picture the facilitator sees. Thus, these studies have demonstrated, client by client, that it is the facilitator who types the messages and that the pictures do not control correct typing by the clients—the clients cannot communicate the names of the pictures they see. For commentaries and reviews of this literature, see Green (2005); Green and Shane (1994); and Jacobson, Mulick, and Schwartz (1995). As Green (2005) pointed out, the continued false beliefs by facilitators, family, caregivers, and news media that clients actually communicate with facilitated communication may in fact deprive them of an opportunity to receive effective, scientifically validated treatment.

Real-Life Assessment Using Multistage Reversal-Replication Designs Applications of single-case research methods with clients in real-life situations away from clinic or school may not always follow a pure, prearranged formula. Contextual factors, such as varying client health and family situations, are additional considerations. In such cases, the consistent replication of behavior patterns in similar conditions across multiple sessions becomes the indicator of whether a given treatment or assessment has the intended outcome. For example, completely paralyzed patients with amyotrophic lateral sclerosis were trained to communicate using only their brainwaves (via electroencephalogram) to control the movement of a cursor on a computer screen (Birbaumer et al., 1999). Letters were placed on the screen, and the patient could move the cursor toward them and thereby spell words to communicate. In additional experiments, abilities to distinguish verbs from nouns, odd from even numbers, and consonants from vowels and to perform simple computations were assessed in a matching-to-sample–type task (Iversen et al., 2008). The top part of Figure 1.12 shows a schematic of the events during one 5-second 22

Figure 1.12. Top: Schematic of the events in a single trial. The first 1.5 seconds is an observation period (in this case, the presession instruction was to always select the noun of a noun–verb choice; new words appeared on each trial, and the correct position varied randomly from trial to trial). A 0.5-second baseline of electroencephalogram (EEG) is then recorded, followed by an active phase in which the patient can control the cursor (ball) on the screen for 3 seconds. Bottom: Data from one patient with amyotrophic lateral sclerosis from one training day with several successive tasks. T = task; T1 = simple target; T5 = noun–verbs; T7 = color matching; T8 = addition or subtraction matching. From “A Brain– Computer Interface Tool to Assess Cognitive Functions in Completely Paralyzed Patients With Amyotrophic Lateral Sclerosis,” by I. H. Iversen, N. Ghanayim, A. Kübler, N. Neumann, N. Birbaumer, and J. Kaiser, 2008, Clinical Neurophysiology, 119, pp. 2217, 2220. Copyright 2008 by Elsevier. Reprinted with permission.

trial. The patient gets online visual feedback from the electroencephalogram in the form of cursor movement. If the cursor reaches the correct stimulus, then a smiley face appears on the screen. The

Single-Case Research Methods

patients lived in their private homes with constant care and were assessed intermittently over several sessions spanning a few hours once or twice each week. The ideal scenario was always to test patients several times on a very simple task, such as steering the cursor to a filled box versus an open box, to make sure that the electrodes were attached correctly, equipment worked, and the patient could still use the electroencephalogram feedback to control the cursor. Once the patient reached at least 85% correct on a simple task, then tasks with assessment stimuli (e.g., nouns and verbs) were presented for one or several sessions. If the neurodegenerative amyotrophic lateral sclerosis destroys the patient’s ability to discriminate words or numbers, then the patient should show a deficit in such tasks compared with when the patient can solve a simple task. To determine whether a patient has a potential deficit in a given skill, such as odd–even discrimination, then it is necessary to know that the patient can still do the simple task of moving the cursor to the correct target when the target is just a filled box. Thus, to interpret the data, it was necessary to present the simple task several times at the beginning of, during, and end of a given day the patient was visited by the testing team. It proved challenging at times to convince family members that it was necessary to repeat the simple tasks several times because, as the family members said, “You already know that he can do that, so why do you keep wasting time repeating it?” The bottom part of Figure 1.12 shows the results for each of 16 consecutive sessions for a single test day for one patient. The task numbers on the x-axis refer to the type of training task, with Task 1 (T1) being the simplest, which was repeated several times. The overall design forms a phase sequence of A-B-C-AC-D-A, in which the A phases present the simple task and serve as a baseline control, and the B, C, and D phases present more complex test material. The A phases thus serve as a control for equipment integrity and for the patient’s basic ability to move the cursor to a correct target. For example, had the training day ended after the session in Phase D, it would have been difficult to conclude that the patient had a deficit in this task because the deteriorating percentage correct could reflect a loose

electrode, equipment failure, or the lack of patient cooperation. That the patient scored high again on the simple task in the last A phase, immediately after the session in the D phase, demonstrates that the patient had some difficulties discriminating the stimuli presented in Task 8 in the D phase (i.e., addition and subtraction, such as 3 + 5 or 7 − 2). Among the many findings is that several warm-up sessions were necessary at the beginning of the day before the patient reached the usual 85% correct on the simple task. Several such training days with this very ill, speechless and motionless patient demonstrated that the patient had some deficits in basic skills. Thus, the data showed that the patient, a former banker, now had problems with very simple addition and subtraction. This example illustrates the use of single-case research methods in complex living situations with patients with extreme disability. To extract meaningful data from research with such patients in a varying environment, it is necessary to know repeatedly that the complex recording and control equipment works as intended and that patients’ basic skills are continuously intact because no verbal communication from the patient is possible (i.e., the patient cannot tell the trainer that he is tired or that something else may be the matter). Indeed, both trainers and family members had to be instructed in some detail, and often, as to why it was necessary to repeat the simple tasks several times on each visit to the patient’s home. The multiphase replication design with repeated presentation of phases of simple tasks in alternation with more complex tasks is a necessary component of single-case research or testing methods applied to complex living situations in which communication may be compromised or impossible. Data Analysis Methods A fundamental aspect of single-case designs is that behavior is recorded repeatedly and under the same set of methods and definitions across different experimental, educational, or treatment conditions. Chapter 6 of this volume covers these issues; see also J. M. Johnston and Pennypacker (2009) and Cooper et al. (2007). 23

Iver H. Iversen

Data Recording Methods Automated recording techniques (customarily used with animal subjects) require frequent verification that monitoring devices (switches, photocells, touch screens, etc.) record behavior as intended and that these monitoring devices are calibrated correctly. With observational methods, calibration of criteria for response occurrence used by different observers is a crucial issue, and intraobserver as well as interobserver agreement calculations are essential for objective recording of behavior (see Chapter 6, this volume). Equally essential is intraand interpersonnel consistency in the methods of delivering consequences to clients in educational and clinical settings. However, reports of agreement scores do not ordinarily include information about how consistently personnel follow described methods. Common measures of behavior are frequency of occurrence (often converted to response rate), response duration, and response topography (e.g., Barlow et al., 2009). A given behavior can also be analyzed in terms of its placement in time as a behavior pattern or its placement among other behaviors, as in analyses of the sequential properties of behavior (e.g., Henton & Iversen, 1978; Iversen, 1991; see also Chapter 12, this volume).

Visual Data Analysis Methods Behavior analysis has a long tradition (beginning with Pavlov, Thorndike, and Skinner) of focusing on nonstatistical, visual analyses of data from single subjects (Iversen, 1991). In Chapter 9 of this volume, Bourret and Pietras describe a variety of methods of visual data analysis. Such analyses are now fairly standard and are covered in most textbooks on single-case research designs (e.g., Kennedy, 2005; Morgan & Morgan, 2009) and in textbooks on behavior analysis in general (e.g., Cooper et al., 2007). Visual analyses of data are not particular to experimental or applied behavior analysis and permeate all sciences and other forms of communication, as exemplified in the books by Tufte (e.g., 1983, 1990; see also Iversen, 1988) on analyzing and presenting visual information. Within behavior analysis, a classic text on visual analysis of behav24

ioral data is Parsonson and Baer (1978), which covers analysis of data from basic single-case research designs. Fundamental issues in visual analysis are evaluations of baseline stability and baseline trends. Baselines should ideally vary little, and experimenters should analyze any conditions responsible for unexplained variation (Sidman, 1960). Data from an intervention cannot always be interpreted if variability from the baseline carries over into the intervention phase. Trends in baselines can be problematic if they are in the direction of the expected experimental or therapeutic effect. For example, if a baseline rate of behavior gradually increases over several observation periods, and behavior increases further during the intervention, then it can be difficult or impossible to determine whether the intervention was responsible for the increase (e.g., Cooper et al., 2007). The expressions appropriate baseline and inappropriate baseline have appeared in the literature to emphasize this issue. An appropriate baseline is either a flat baseline without a trend or a baseline with a trend in the direction opposite to the expected effect. For example, if the rate of self-injury in a child is steadily increasing in baseline, and the intervention is expected to decrease the behavior, then an increasing baseline is appropriate for intervention because the behavior is expected to decrease in the intervention even against an increasing baseline. There is certainly no rationale in waiting for self-injury to stabilize before intervention starts. However, an inappropriate baseline is one that has a trend in the same direction as the expected outcome of the intervention (see also Cooper et al., 2007). Patterns of data are important in interpretations of behavior changes, in addition to descriptive statistical evaluations. For example, the two trends previously noted here in data for A-B, A-B-A, A-B-AB, and multiple-baseline designs (i.e., Figure 1.9) would not be apparent had data been presented only as averages for each phase of recorded behavior. Plotting data for each session captures trends, variability, and intervention outcomes. Without such data, the investigator easily misses important information that might lead to subsequent changes in

Single-Case Research Methods

procedure. In fact, successful interaction between researcher and subject depends on visual data analysis concomitant with project progression (e.g., Sidman, 1960; Skinner, 1938)

Statistical Data Analysis Methods Statistical analyses of behavioral data in both basic research and application have been controversial since Skinner’s (1938) The Behavior of Organisms, which was devoid of traditional statistical analyses. Behavior analysis gathers information about the behavior of individual subjects, whereas the traditional statistical approach gathers information about groups of subjects and offers no information about data from individual subjects. However, visual analyses of behavioral data are not always sufficient, and statistical methods can be used to supplement the analysis. Several authors have analyzed the ongoing controversy regarding use of statistics in behavior analysis and psychology (e.g., Barlow et al., 2009; Kratochwill & Brody, 1978; see Chapter 7, this volume). Chapters 11 and 12 of this volume provide information on new statistical techniques, which may prove useful for behavior analysts conducting time-series single-case research designs. A major issue with the use of statistics in behavior analysis is the treatment of variability in data. Visual analyses take variability as informative data that can prompt an experimental analysis of the sources of this variability (e.g., Sidman, 1960). With statistical analysis, however, variability is artificially compressed to a more manageable single number (variance or standard deviation) without analysis of the source of the variability. Thus, visual analysis and statistical analysis can be seen as antithetical tools to uncover causes of variability in the behavior of the individual subject. Consider, for example, the use of a simple t test for evaluation of data from two successive phases of an A-B design (e.g., Figure 1.3) as an illustration of the problems one may face with the use of a statistical test created for entirely different experimental designs. If the B phase shows acquisition of behavior at a gradually increasing rate compared with the baseline in the A phase, the behavior analyst has no problems visually identifying a large effect of the manipulation (given that confounding variables can

be ruled out). If a t test is applied, however, comparing the baseline data with the treatment data, the standard deviation for the B phase may be very large because data range from the baseline level to the highest rate when treatment is most effective. The result may be a nonsignificant effect. Besides, the data in the B phase showing a gradual acquisition may not be normally distributed and may not be independent measures (i.e., as a series of increasing values, data on session n influence data on session n + 1); the result is that the t test is not valid. However, the failure of a common statistical test to show an effect certainly does not mean that such data are unimportant or that there is no effect of the experimental manipulation. Single-case research methods are not designed for hypothesis testing and inferential statistics but for analysis of behavior of the individual subject and for development of methods that can serve to help individuals acquire appropriate behavior. The assumptions of inferential statistics, meant for between-group comparisons, with independent observations, random selection of subjects, random allocation to treatment, and random treatment onset and offset are obviously not fulfilled in single-case research methods. Eventually, however, statistical tests appropriate for single-case methods may evolve from further developments in analyses of interrupted time-series data (e.g., Barlow et al., 2009; Crosbie, 1993; see Chapters 11 and 12, this volume). Aggregation from individual data points through averages and standard deviations to statistical tests to p values and to the final binary statement “yes or no” is, of course, common in all sciences, including psychology. Quantitative reduction of complex behavior patterns to yes or no eases communication of theories and ideas through publications and presentations, in which actual data from individual subjects or individual trials may be omitted. In contrast, a focus on data linked more directly to experimental manipulations can lead to demonstrations of stunning control and prediction of behavior at the moment-to-moment level for the individual subject (e.g., Henton & Iversen, 1978; Sidman, 1960), which is much closer to everyday interactions between behavior and environment. In daily life, people respond promptly to single instances of 25

Iver H. Iversen

interpersonal and environmental cues. Such stimulus control of behavior is the essence of interhuman communication and conduct. By demonstrating control of behavior at this level for the individual subject, behavior analysis can be both a science of behavior and a tool for educational and therapeutic interventions.

Quantitative Descriptions of Behavior– Environment Relations Behavioral data from single-case designs have invited quantitative descriptions of the relationships between behavior and environmental variables. Such descriptions vary considerably from issue to issue and attract general interest in both basic research (Shull, 1991) and application (see Chapter 10, this volume). Single-Case Designs and Group Studies Compared “Operant methods make their own use of Grand Numbers; instead of studying a thousand rats for one hour each, or a hundred rats for ten hours each, the investigator is likely to study one rat for a thousand hours” (Skinner, 1969, p. 112). Research involving comparisons of groups of subjects, in which each group is exposed once to one level of a manipulation, is rare in behavior analysis, in particular with animal subjects. Yet, group research or studies with a large number of participants sometimes have relevance for behavior analysis. For example, in a recent interview (Holth, 2010), Murray Sidman remarked that largescale implementation of behavior analysis techniques may require prior demonstration of the effectiveness of those techniques in large populations. Thus, studies using randomization and control groups may be necessary for promulgation of effective behavior control techniques. A positive outcome of a group study may not make behavior analysts know more about control of an individual client’s behavior, yet such an outcome may nonetheless make more people know about behavior analysis. For example, the widely cited group study by Lovaas (1987) generated broad interest in behavior analysis methods for treatment of children with 26

autism. One group of children (n = 19) with the diagnosis of autism received intensive behavior analysis procedures (40 hours/week) for 2 years, and another group (n = 19) with the same diagnosis did not receive the same treatment. The differences between the groups were vast; for example, 49% of the children in the treatment group showed significant gains in IQ and other measures of behavioral functioning compared with the other group. This study helped generate respect among parents and professionals for behavioral methods in treatment of children with autism. Since the Lovaas (1987) study, several other studies have similarly compared two groups of children (sometimes randomly assigned) with similar levels of autism spectrum disorder, in which one group received intensive behavior analysis treatment for 1 or several years, and the other group received less treatment or no treatment other than what is ordinarily provided by the child’s community (so-called “eclectic” treatment). For example, Dawson et al. (2009) recently reported one such study with 48 children (18–30 months old) diagnosed with autism spectrum disorder in a randomized controlled trial. Statistical procedures demonstrated significant gains in a variety of behavioral measures for the group that received the treatment. For this study and the Lovaas study, each child went through complex and intense procedures based on single-case research methods for about 2 years. The group comparisons mainly used standard measures of the children’s performances. Enough group studies have, in fact, been conducted that several meta-analyses of the efficacy of behavior analysis treatment of children with autism spectrum disorder have now been performed on the basis of such studies. Group comparison methods have also increased the visibility of behavior analysis techniques in areas of application. For example, Taub et al. (2006) used behavioral techniques in treatment of patients with paralysis of arms or legs resulting from stroke or brain injury. With the less affected or “normal” limb restrained to prevent its use, 21 patients who received massed practice of the affected limb (6 hours/ day for 10 consecutive weekdays with shaping of movement and establishment of stimulus control of movement using social reinforcement) were

Single-Case Research Methods

c ompared with a matched control group of 20 patients who received customary, standard physical therapy and general support. Treatment patients showed huge and clinically significant gains in motor control of the affected arm, whereas patients in the control group showed no such gains. A similar example is provided in research by Horne et al. (2004). These investigators had first obtained reliable single-case data that reinforcement techniques could be used effectively to increase the consumption of fruits and vegetables among schoolchildren. To promote the findings, Horne et al. conducted a large group study with 749 schoolchildren. The children were split into an experimental group and a control group. A baseline of fruit and vegetable consumption was taken first. Then at lunchtime, fruit and vegetable consumption was encouraged by having the children in the experimental group watch video adventures featuring heroic peers (the Food Dudes) who enjoy eating fruits and vegetables, and the children received reinforcers for eating fruit and vegetables. Children in the control group had free access to fruit and vegetables. Compared with the children in the control group, fruit and vegetable consumption was significantly higher among the children in the experimental group. On the basis of such data, this program has now been implemented on a large scale in all schools in Ireland and in other places in England (Lowe, 2010). Such group-comparison studies, published in journals that do not ordinarily publish studies using single-case research methods, may be helpful in promoting general knowledge about behavior analysis to a much wider audience. In addition, when effective, evidence-based behavior-analytic treatments become broadly recognized from publications with a large number of participants and in renowned journals, then granting agencies, insurance companies, journalists, and maybe even university administrators and politicians start to pay attention to the findings. For behavior analysts, group studies may not seem to add much basic knowledge beyond what is already known from studies using single-case research methods. Publication of group studies may, however, be a tactic for promotion of basic, important, and effective behavioral techniques beyond the readership of behavior analysts. The need for

informing the general public, therapists, scientists in other areas, educators, politicians, and so forth about behavior analysis techniques should, however, be balanced with a concern for ethical treatment of the participants involved. When a behavior analyst, based on experience with previous results, knows full well that a particular treatment using single-case methodology has proven to be effective over and over, then it is indeed an ethical dilemma to knowingly split a population of children in need of that treatment into two groups for comparison of treatment versus no treatment. Children who are in a control group for a few years for a valid statistical comparison with the treatment group may thus be deprived of an opportunity for known, effective treatment. Several so-called randomized controlled group studies and meta-analyses of such studies have been conducted over the past few decades to determine whether applied behavior modification actually works (e.g., Spreckley & Boyd, 2009). For elaborate comments and critiques of some meta-analysis studies of applied behavior analysis methods, see Kimball (2009). These time-consuming studies, with a presumed target audience of policymakers and insurance companies, would clearly not have been undertaken unless countless studies using singlecase methods had already demonstrated that the behavior of an individual child can be modified and sustained with appropriate management of reinforcement contingencies and stimulus control techniques. The sheer mass of already existing studies based on single-case methodology with successful outcomes, for literally thousands of individuals across a variety of behavior problems and treatment settings, poses the question, “How many more randomized controlled group studies and subsequent meta-analyses are necessary before single-case methods can be accepted in general as effective in treatment?” The use of inferential statistics as a method of proof leads to the very odd situation that such metaanalyses may explicitly exclude the results from application of single-case methods with individual clients (e.g., Spreckley & Boyd, 2009), even though the overall purpose of the meta-analyses is to decide whether such methods work for individual clients. 27

Iver H. Iversen

Profound misunderstandings of what can be accomplished by single-case research methods in general can on occasion be heard among pedagogues and critics of applied behavior analysis interventions for children with developmental disorders. The argument is that the trained behavior would have developed anyway given enough time (without training). For example, Spreckley and Boyd (2009) stated that “what is too often forgotten is that the overwhelming majority of children with [autism spectrum disorder] change over time as part of their development as opposed to change resulting from an intervention” (p. 343). Commentaries such as these present a negation of the immense accumulation of experimental and applied hard evidence that individual behavior can indeed be effectively and reliably changed with the use of behavior control techniques. Yet such comments persist and slow the application. Some behavior analysts appropriately react when professionals make such claims without supporting data (e.g., Kimball, 2009; Morris, 2009). Group studies may have their place in behavior analysis when intervention should not be withdrawn because of the social significance of the target behavior. Baer (1975) suggested combining methods of multiplebaseline designs and group comparisons, in which one group of subjects first serves as a comparison to another group who receives intervention; later, the comparison group also receives the same intervention. Conclusion Behavior analysts have developed designs and techniques that can increase or decrease target behavior on a certain occasion and time. These methods serve as tools for an experimental analysis of behavior. The same tools are used in applied behavior analysis to modify and maintain behavior for the purpose of helping a client. Single-case research designs offer a wide range of methods, and this overview has merely scratched the surface. Because of their accumulated successes, single-case designs are now being adopted in areas outside of behavior analysis such as medicine (Janosky et al., 2009), occupational therapy (M. V. Johnston & Smith, 2010), and pain management (Onghena & Edgington, 2005). Indeed, Guyatt et al. (2000), in their review of 28

evidence-based medicine, placed single-case research designs with randomization of treatment highest in their hierarchy of strength of evidence for treatment decisions. Single-case research designs feature an important component of replication of behavioral phenomena for a single individual (J. M. Johnston & Pennypacker, 2009; Sidman, 1960). With direct replication, the intervention is repeated, and when behavior changes on each replication, the experimenter or therapist has identified one of the variables that control the behavior. With systematic replication, the same subject or different subjects or species may be exposed to a variation of an original procedure (Sidman, 1960), and a successful outcome fosters the accumulation of knowledge (see also Chapter 7, this volume). Sidman’s (1971) original demonstration of stimulus equivalence stands out as a golden example of the scientific value of replication as a method of proof because the remarkable results using a single child, in a carefully designed experiment, have been replicated numerous times and thereby spurred development of a whole new field of research and application. Behavior analysts are sometimes weary of designs that compare groups of subjects because no behavioral phenomena are established at the level of the individual subject with such designs. However, to promote behavioral phenomena using single-case research designs, comparisons of experiments are common, and one thereby inevitably compares groups of subjects across experiments, often across species as well. Indeed, such comparisons often prove the generality of the basic principles discovered with one small set of subjects. Besides, establishing the efficiency of a behavioral procedure to be implemented on a large population scale may require certain types of controlled group studies. Group designs and multiple-baseline designs can also be profitably combined to produce powerful and socially relevant effects on a large scale (e.g., D. K. Fox et al., 1987). With wide implementations in psychology, education, medicine, and rehabilitation, single-case methodology is now firmly established as a viable means for discovery as well as for application of basic behavioral mechanisms. Of late, behavior

Single-Case Research Methods

anagement interventions based on single-case m methodology followed up with efficiency determinations from population studies have successfully demonstrated how the science of the individual can be a science for all. Individual prediction is of tremendous importance, so long as the organism is to be treated scientifically. (Skinner, 1938, p. 444) Man has at his disposal yet another powerful resource—natural science with its strictly objective methods. This science, as we all know, is making big headway every day. The facts and considerations I have placed before you are one of the numerous attempts to employ—in studying the mechanism of the highest vital manifestations in the dog, the representative of the animal kingdom which is man’s best friend—a consistent, purely scientific method of thinking. (From Pavlov’s acceptance speech on receiving the Nobel Prize in 1904; Pavlov, 1955, p. 148)

References Baer, D. M. (1975). In the beginning there was the response. In E. Ramp & G. Semb (Eds.), Behavior analysis: Areas of research and application (pp. 16–30). Englewood Cliffs, NJ: Prentice-Hall. Baer, D. M. (1993). Advising as if for research productivity. Clinical Psychologist, 46, 106–109. Baer, D. M. (2005). Letters to a lawyer. In W. L. Heward, T. E. Heron, N. A. Neff, S. M. Peterson, D. M. Sainato, G. Cartledge, . . . J. C. Dardig (Eds.), Focus on behavior analysis in education: Achievements, challenges, and opportunities (pp. 3–30). Upper Saddle River, NJ: Pearson. Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91–97. doi:10.1901/jaba.1968.1-91 Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Single case experimental designs: Strategies for studying behavior change (3rd ed.). New York, NY: Pearson Education. Bernard, C. (1957). An introduction to the study of experimental medicine. New York, NY: Dover. (Original work published 1865)

Bijou, S. W., & Baer, D. M. (1961). Child development I: A systematic and empirical theory. New York, NY: Appleton-Century-Crofts. doi:10.1037/11139-000 Birbaumer, N., Ghanayim, N., Hinterberger, T., Iversen, I., Kotchoubey, B., Kübler, A., . . . Flor, H. (1999). A spelling device for the paralysed. Nature, 398, 297–298. doi:10.1038/18581 Blampied, N. M. (1999). A legacy neglected: Restating the case for single-case research in cognitive behavior therapy. Behaviour Change, 16, 89–104. doi:10.1375/ bech.16.2.89 Boakes, R. (1984). From Darwin to behaviorism: Psychology and the minds of animals. New York, NY: Cambridge University Press. Boorstin, D. J. (1983). The discoverers: A history of man’s search to know his world and himself. New York, NY: Vintage Books. Boring, E. G. (1929). A history of experimental psychology. New York, NY: Appleton-Century-Crofts. Boring, E. G. (1954). The nature and history of experimental control. American Journal of Psychology, 67, 573–589. doi:10.2307/1418483 Brenagan, W., & Iversen, I. H. (2012). Methods to differentially reinforce response duration in rats. Manuscript in preparation. Catania, A. C., & Laties, V. G. (1999). Pavlov and Skinner: Two lives in science. Journal of the Experimental Analysis of Behavior, 72, 455–461. doi:10.1901/jeab.1999.72-455 Cooper, J. O., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis (2nd ed.). Upper Saddle River, NJ: Pearson Education. Crosbie, J. (1993). Interrupted time-series analysis with brief single-subject data. Journal of Consulting and Clinical Psychology, 61, 966–974. doi:10.1037/0022006X.61.6.966 Dawson, G., Rogers, S., Munson, J., Smith, M., Winter, J., Greenson, J., . . . Varley, J. (2009). Randomized, controlled trial of an intervention for toddlers with autism: The early start Denver model. Pediatrics, 125, e17–e23. doi:10.1542/peds.2009-958 Dunlap, G., & Kincaid, D. (2001). The widening world of functional assessment: Comments on four manuals and beyond. Journal of Applied Behavior Analysis, 34, 365–377. doi:10.1901/jaba.2001.34-365 Ebbinghaus, H. (1913). Memory (H. A. Rueger & C. E. Bussenius, Trans.). New York, NY: Teachers College. (Original work published 1885) Eysenck, H. J. (1960). Behavior therapy and the neuroses. New York, NY: Pergamon Press. Fox, D. K., Hopkins, B. L., & Anger, W. K. (1987). The long-term effects of a token economy on safety performance in open-pit mining. Journal of Applied 29

Iver H. Iversen

Behavior Analysis, 20, 215–224. doi:10.1901/ jaba.1987.20-215 Fox, R. G., Copeland, R. E., Harris, J. W., Rieth, H. J., & Hall, R. V. (1975). A computerized system for selecting responsive teaching studies, catalogued along twenty-eight important dimensions. In E. Ramp & G. Semb (Eds.), Behavior analysis: Areas of research and application (pp. 124–158). Englewood Cliffs, NJ: Prentice-Hall.

Iversen, I. H. (1991). Methods of analyzing behavior patterns. In I. H. Iversen & K. A. Lattal (Eds.), Techniques in the behavioral and neural sciences: Experimental analysis of behavior, Part 2 (pp. 193–242). New York, NY: Elsevier. Iversen, I. H. (1992). Skinner’s early research: From reflexology to operant conditioning. American Psychologist, 47, 1318–1328. doi:10.1037/0003066X.47.11.1318

Ghezzi, P. M. (2007). Discrete trials teaching. Psychology in the Schools, 44, 667–679. doi:10.1002/pits.20256

Iversen, I. H. (2010). [Laboratory demonstration of acquisition of operant behavior]. Unpublished raw data.

Green, G. (2005). Division fellow, Gina Green, reacts to CNN program “Autism is a World” which focuses on facilitated communication. Psychology in Mental Retardation and Developmental Disabilities, 31, 7–10.

Iversen, I. H. (2012). Tutorial: Multiple baseline designs. Manuscript in preparation.

Green, G., Brennan, L. C., & Fein, D. (2002). Intensive behavioral treatment for a toddler at high risk for autism. Behavior Modification, 26, 69–102. doi:10.1177/0145445502026001005 Green, G., & Shane, H. C. (1994). Science, reason, and facilitated communication. Journal of the Association for Persons with Severe Handicaps, 19, 151–172. Greenspan, R. J., & Baars, B. J. (2005). Consciousness eclipsed: Jacques Loeb, Ivan P. Pavlov, and the rise of reductionistic biology after 1900. Consciousness and Cognition, 14, 219–230. doi:10.1016/j.concog. 2004.09.004 Guyatt, G. H., Haynes, R. B., Jaeschke, R. Z., Cook, D. J., Green, L., Naylor, C. D., . . . Richardson, W. S. (2000). Users’ guides to the medical literature: XXV. Evidence-based medicine: Principles for applying the users’ guides to patient care. JAMA, 284, 1290–1296. doi:10.1001/jama.284.10.1290 Harris, B. (1979). Whatever happened to little Albert? American Psychologist, 34, 151–160. doi:10.1037/ 0003-066X.34.2.151 Henton, W. W., & Iversen, I. H. (1978). Classical conditioning and operant conditioning: A response pattern analysis. New York, NY: Springer-Verlag. Holth, P. (2010). A research pioneer’s wisdom: An interview with Dr. Murray Sidman. European Journal of Behavior Analysis, 11, 181–198. Horne, P. J., Tapper, K., Lowe, C. F., Hardman, C. A., Jackson, M. C., & Woolner, J. (2004). Increasing children’s fruit and vegetable consumption: A peermodeling and rewards-based intervention. European Journal of Clinical Nutrition, 58, 1649–1660. doi:10.1038/sj.ejcn.1602024 Iversen, I. H. (1988). Tactics of graphic design: A review of Tufte’s The Visual Display of Quantitative Information [Book review]. Journal of the Experimental Analysis of Behavior, 49, 171–189. doi:10.1901/jeab.1988.49-171 30

Iversen, I. H., Ghanayim, N., Kübler, A., Neumann, N., Birbaumer, N., & Kaiser, J. (2008). A braincomputer interface tool to assess cognitive functions in completely paralyzed patients with amyotrophic lateral sclerosis. Clinical Neurophysiology, 119, 2214–2223. doi:10.1016/j.clinph.2008.07.001 Iversen, I. H., & Matsuzawa, T. (1996). Visually guided drawing in the chimpanzee (Pan troglodytes). Japanese Psychological Research, 38, 126–135. doi:10.1111/j.1468-5884.1996.tb00017.x Jacobson, J. W., Mulick, J. W., & Schwartz, A. A. (1995). A history of facilitated communication: Science, pseudoscience, and antiscience. American Psychologist, 50, 750–765. doi:10.1037/0003-066X.50.9.750 Janosky, J. E., Leininger, S. L., Hoerger, M. P., & Libkuman, T. M. (2009). Single subject designs in biomedicine. New York, NY: Springer-Verlag. doi:10.1007/978-90-481-2444-2 Johnston, J. M., & Pennypacker, H. S. (2009). Strategies and tactics in behavioral research (3rd ed.). New York, NY: Routledge. Johnston, M. V., & Smith, R. O. (2010). Single subject design: Current methodologies and future directions. OTJR: Occupation,Participation and Health, 30, 4–10. doi:10.3928/15394492-20091214-02 Jones, M. C. (1924). A laboratory study of fear: The case of Peter. Pedagogical Seminary, 31, 308–315. doi:10. 1080/08856559.1924.9944851 Kazdin, A. E. (1973). Methodological and assessment considerations in evaluating reinforcement programs in applied settings. Journal of Applied Behavior Analysis, 6, 517–531. doi:10.1901/jaba.1973.6-517 Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings (2nd ed.). New York, NY: Oxford University Press. Keller, F. S., & Schoenfeld, W. N. (1950). Principles of psychology. New York, NY: Appleton-CenturyCrofts. Kennedy, C. H. (2005). Single-case designs for educational research. New York, NY: Pearson Allyn & Bacon.

Single-Case Research Methods

Kimball, J. W. (2009). Comments on Spreckley and Boyd (2009). Science in Autism Treatment, 6, 3–19. Kratochwill, T. R., & Brody, G. H. (1978). Single subject designs: A perspective on the controversy over employing statistical inference and implications for research and training in behavior modification. Behavior Modification, 2, 291–307. doi:10.1177/014544557823001 Loeb, J. (1900). Comparative physiology of the brain and comparative psychology. New York, NY: Putnam. doi:10.5962/bhl.title.1896 Lovaas, O. I. (1987). Behavioral treatment and normal educational and intellectual functioning in young autistic children. Journal of Consulting and Clinical Psychology, 55, 3–9. doi:10.1037/0022-006X.55.1.3 Lowe, F. (2010, September). Can behavior analysis change the world? Paper presented at the Ninth International Congress on Behavior Studies, Crete, Greece. Matson, J. L., Sevin, J. A., Fridley, D., & Love, S. R. (1990). Increasing spontaneous language in three autistic children. Journal of Applied Behavior Analysis, 23, 227–233. doi:10.1901/jaba.1990.23-227 McDougall, D., Hawkins, J., Brady, M., & Jenkins, A. (2006). Recent innovations in the changing criterion design: Implications for research and practice in special education. Journal of Special Education, 40, 2–15. doi:10.1177/00224669060400010101

(Ed.), Single-subject research: Strategies for evaluating change (pp. 101–165). New York, NY: Academic Press. Pavlov, I. P. (1906). The scientific investigation of the psychical faculties or processes in higher animals. Science, 24, 613–619. doi:10.1126/science.24.620.613 Pavlov, I. P. (1927). Conditioned reflexes (G. V. Anrep, Trans.). London, England: Oxford University Press. Pavlov, I. P. (1928). Lectures on conditioned reflexes: Twenty-five years of objective study of the higher nervous activity (behaviour) of animals (Vol. 1). New York, NY: International Publishers. doi:10.1037/11081-000 Pavlov, I. P. (1955). Nobel speech delivered in Stockholm on December 12, 1904. In K. S. Koshtoyants (Ed.), I. P. Pavlov: Selected works (pp. 129–148). Honolulu, HI: University Press of the Pacific. Schmidt, R. A., & Lee, T. D. (2005). Motor control and learning: A behavioral emphasis. Champaign, IL: Human Kinetics. Shull, R. L. (1991). Mathematical description of operant behavior: An introduction. In I. H. Iversen & K. A. Lattal (Eds.), Experimental analysis of behavior (Vol. 2, pp. 243–282). New York, NY: Elsevier. Sidman, M. (1960). Tactics of scientific research: Evaluating experimental data in psychology. New York, NY: Basic Books. Sidman, M. (1971). Reading and auditory-visual equivalences. Journal of Speech and Hearing Research, 14, 5–13.

Montee, B. B., Miltenberger, R. G., & Wittrock, D. (1995). An experimental analysis of facilitated communication. Journal of Applied Behavior Analysis, 28, 189–200. doi:10.1901/jaba.1995.28-189

Sidman, M. (1981). Remarks. Behaviorism, 9, 127–129.

Morgan, D. L., & Morgan, R. K. (2009). Single-case research methods for the behavioral and health sciences. Los Angeles, CA: Sage.

Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. New York, NY: Appleton-CenturyCrofts.

Morris, E. K. (2009). A case study in the misrepresentation of applied behavior analysis in autism: The Gernsbacher lectures. Behavior Analyst, 32, 205–240.

Skinner, B. F. (1953). Science and human behavior. New York, NY: Macmillan.

Neef, N. A., & Iwata, B. A. (1994). Current research on functional analysis methodologies: An introduction. Journal of Applied Behavior Analysis, 27, 211–214. doi:10.1901/jaba.1994.27-211 O’Neill, R. E., McDonnell, J., Billingsly, F., & Jenson, W. (2010). Single case designs in educational and community settings. New York, NY: Merrill. Onghena, P., & Edgington, E. S. (2005). Customization of pain treatment: Single-case design and analysis. Clinical Journal of Pain, 21, 56–68. doi:10.1097/00002508-200501000-00007 Paré, W. P. (1990). Pavlov as a psychophysiological scientist. Brain Research Bulletin, 24, 643–649. doi:10.1016/0361-9230(90)90002-H Parsonson, B. S., & Baer, D. M. (1978). The analysis and presentation of graphic data. In T. R. Kratochwill

Skinner, B. F. (1956). A case history in scientific method. American Psychologist, 11, 221–233. doi:10.1037/ h0047662 Skinner, B. F. (1966). Some responses to the stimulus “Pavlov.”Conditional Reflex, 1, 74–78. Skinner, B. F. (1969). Contingencies of reinforcement. New York, NY: Appleton-Century-Crofts. Skinner, B. F. (1978). The ethics of helping people. In B. F. Skinner (Ed.), Reflections on behaviorism and society (pp. 33–47). Englewood Cliffs, NJ: Prentice Hall. Skinner, B. F. (1979). The shaping of a behaviorist. New York, NY: Knopf. Spreckley, M., & Boyd, R. (2009). Efficacy of applied behavioral intervention in preschool children with autism for improving cognitive, language, and adaptive behavior: A systematic review and meta analysis. 31

Iver H. Iversen

Journal of Pediatrics, 154, 338–344. doi:10.1016/j. jpeds.2008.09.012

Tufte, E. R. (1990). Envisioning information. Cheshire, CT: Graphics Press.

Steege, M. W., & Mace, F. C. (2007). Applied behavior analysis: Beyond discrete trial teaching. Psychology in the Schools, 44, 91–99. doi:10.1002/pits.20208

Ullman, L. P., & Krasner, L. (1965). Case studies in behavior modification. New York, NY: Holt, Rinehart & Winston.

Sulzer-Azaroff, B., & Mayer, G. R. (1991). Behavior analysis for lasting change. Fort Worth, TX: Harcourt Brace.

Van Houten, R., Axelrod, S., Baily, J. S., Favell, J. F., Foxx, R. M., Iwata, B. A., & Lovaas, O. I. (1988). The right to effective behavioral treatment. Journal of Applied Behavior Analysis, 21, 381–384. doi:10.1901/ jaba.1988.21-381

Taub, E., Uswatte, G., King, D. K., Morris, D., Crago, J. E., & Chatterjee, A. (2006). A placebo-controlled trial of constraint induced movement therapy for upper extremity after stroke. Stroke, 37, 1045–1049. doi:10.1161/01.STR.0000206463.66461.97 Thorndike, E. L. (1911). Animal intelligence. New York, NY: Macmillan. Thorndike, E. L. (1927). The law of effect. American Journal of Psychology, 39, 212–222. doi:10.2307/1415413 Todes, D. P. (2002). Pavlov’s physiological factory: Experiment, interpretation, laboratory enterprise. Baltimore, MD: Johns Hopkins University Press. Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.

32

Watson, J. B., & Rayner, R. (1920). Conditioned emotional reactions. Journal of Experimental Psychology, 3, 1–14. doi:10.1037/h0069608 Wheeler, D. L., Jacobson, J. W., Paglieri, R. A., & Schwartz, A. A. (1993). An experimental assessment of facilitated communication. Mental Retardation, 31, 49–59. Wolpe, J. (1958). Psychotherapy by reciprocal inhibition. Stanford, CA: Stanford University Press. Wood, J. D. (2004). The first Nobel prize for integrated systems physiology: Ivan Petrovich Pavlov, 1904. Physiology, 19, 326–330. doi:10.1152/ physiol.00034.2004

Chapter 2

The Five Pillars of the Experimental Analysis of Behavior Kennon A. Lattal

“What is the experimental analysis of behavior?” Skinner (1966) famously asked in an address to Division 25 of the American Psychological Association, now the Division for Behavior Analysis (then the Division for the Experimental Analysis of Behavior). His answer included a set of methods and a subject matter, both of which originated with his research and conceptual analyses that began in the 1930s. Since those early days, what began as operant conditioning has evolved from its humble laboratory and nonhuman animal origins to encompass the breadth of contemporary psychology. This handbook, which is itself testimony to the preceding observation, is the impetus for revisiting Skinner’s question. The developments described in each of the handbook’s chapters are predicated on a few fundamental principles, considered here as the pillars that constitute the foundation of the experimental analysis of behavior (TEAB). These five pillars—research methods, reinforcement, punishment, control by stimuli correlated with reinforcers and punishers, and contextual and stimulus control—are the subject of this chapter. Together, they provide the conceptual and empirical framework for understanding the ways in which environmental events interact with behavior. The pillars, although discussed separately from one another here for didactic purposes, are inextricably linked: The methods of TEAB are pervasive in investigating the other pillars; punishment and stimulus

control are not possible in the absence of the reinforcement that maintains the behavior being punished or under control of other stimuli; stimuli correlated with reinforcers and punishers also have their effects only in the context of reinforcement and are also closely related to other, more direct stimulus control processes; and punishment affects both reinforcement and stimulus control. Pillar 1: Research Methods Research methods in TEAB are more than a set of techniques for collecting and analyzing data. They certainly enable those activities, but, more important, they reflect the basic epistemological stance of behavior analysis: The determinants of behavior are to be found in the interactions between individuals and their environment. This stance led to the adoption and evolution of methods and concepts that emphasize the analysis of functional relations between features of that environment and the behavior of individual organisms. Skinner (1956) put it as follows: We are within reach of a science of the individual. This will be achieved not by resorting to some special theory of knowledge in which intuition or understanding takes the place of observation

This chapter is dedicated to Stephen B. Kendall, who, as my first instructor and, later, mentor in the experimental analysis of behavior, provided an environment that allowed me to learn about the experimental analysis of behavior by experimenting. I thank Karen Anderson, Liz Kyonka, Jack Marr, Mike Perone, and Claire St. Peter for helpful discussions on specific topics reviewed in this chapter, and Rogelio Escobar and Carlos Cançado for their valuable comments on an earlier version of the chapter. DOI: 10.1037/13937-002 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

33

Kennon A. Lattal

and analysis, but through an increasing grasp of relevant conditions to produce order in the individual case. (p. 95)

Single-Case Procedures and Designs Two distinguishing features of research methods in TEAB are emphases on what Bachrach (1960) called the informal theoretical approach and on examining the effects of independent variables on well-defined responses of individual subjects. Skinner’s (1956) review of his early research defined Bachrach’s label. The inductive tradition allows free rein in isolating the variables of which behavior is a function, unencumbered by many of the shoulds, oughts, and inflexibilities of research designs derived from inferential statistical research methods (Michael, 1974; see Chapters 5 and 7, this volume). The essence of the second feature, single-case experimental designs, is that by investigating the effects of the independent variable on individual subjects, each subject serves as its own control. Thus, effects of independent variables are compared, within individual subjects, with a baseline on which the independent variable is absent (or present at some other value). This methodological approach to analyzing the subject matter of psychology can be contrasted with an approach based on the inferential statistical analysis of the data generated across different groups of subjects exposed to the presence or absence or different values of the independent variable (Michael, 1974; see Chapters 7 and 8, this volume). A single-case analysis precludes a major source of variation inherent in all group designs: that variation resulting from between-subjects comparisons. It also minimizes other so-called threats to internal validity such as those associated with statistical regression toward the mean and subject selection biases (cf. Kazdin, 1982). Three central features of single-case research are selecting an appropriate design, establishing baseline performance, and selecting the number of subjects to study. The most basic design involves first establishing a baseline, A; then introducing the independent variable, B; and finally returning to the baseline. From this basic A-B-A design, many variations spring (Kazdin, 1982; see Chapter 5, this volume). Within-subject designs in the tradition of 34

TEAB involve the repeated observation of the targeted response over multiple sessions until an appropriate level of stability is achieved. Without an appropriate design and an appropriate degree of stability in the baseline, attributing the changes in the dependent variable to the independent variable is not possible. Decisions about the criteria for the baseline begin with a definition of the response. Skinner (1966) noted that “an emphasis on rate of occurrence of repeated instances of an operant distinguishes the experimental analysis of behavior” (p. 213). Unless the response is systematically measured and repeatable, stability will be difficult, if not impossible, to achieve. The dimensions of baseline stability are the amount of variability or bounce in the data and the extent of trends. Baseline stability criteria are typically established relative to the anticipated effects of the independent variable. Thus, if a large effect of the independent variable is anticipated, more variation in the baseline is acceptable because, presumably, the effect will be outside the baseline range. Similarly, if a strong downward trend is expected when the independent variable is introduced, then an upward trend in the baseline data is more acceptable than if the independent variable were expected to increase the rate of responding. Some circumstances may not seem to lend themselves readily to single-case designs. An example is that in which behavioral effects cannot be reversed, and thus baselines cannot be recovered. Sidman (1960), however, observed that even in such cases, “the use of separate groups destroys the continuity of cause and effect that characterizes an irreversible behavioral process” (p. 53). In the case of irreversible effects, creative designs in the single-case tradition have been used to circumvent the problem. Boren and Devine (1968), for example, investigated the repeated acquisition of behavioral chains by arranging a task in which monkeys learned a sequence of 10 responses. Once that pattern was stable, the pattern was changed, and the monkeys had to learn a new sequence. This procedure allowed the study of repeated acquisition of behavioral chains in individual subjects across a relatively long period of time. In other cases either in which baselines are unlikely to be reversed or in which it is ethically

The Five Pillars of the Experimental Analysis of Behavior

questionable to do so, a multiple-baseline design often can be used (Baer, Wolf, & Risley, 1968; see Chapter 5, this volume). Selecting the number of subjects is based on both practical concerns and experimenter judgment. A few studies in the Journal of the Experimental Analysis of Behavior were truly single subject in that they involved only one subject (e.g., de Lorge, 1971), but most have involved more. Decisions about numbers of subjects interact with decisions about the design, types of independent variables being studied, and range of values of these variables. Between-subjects direct replications and both between- and withinsubject systematic replications involving different values of the independent variable increase the generality of the findings (see Chapter 7, this volume). Individual-subject research has emphasized experimental control over variability, as contrasted with group designs in which important sources of variability are isolated statistically at the conclusion of the experiment. Sidman (1960) noted that “acceptance of variability as unavoidable or, in some sense as representative of ‘the real world’ is a philosophy that leads to the ignoring of relevant factors” (p. 152). He methodically laid out the tactics of minimizing variability in experimental situations and identified several sources of such variability. Between-subjects variability already has been discussed. Another major source of variability discussed by Sidman is that resulting from weak experimental control. Inferential statistical analysis is sometimes used to supplement experimental analysis. Such analysis is not needed if baseline and manipulation of response distributions do not overlap, but sometimes they do, and in these cases some behavior analysts have argued for their inclusion (see, e.g., Baron [1999] and particularly Davison [1999] on the potential utility of nonparametric statistics in within-subject designs in which baseline and intervention distributions overlap). Others (e.g., Michael, 1974), however, have noted that statistical analysis of group data draws the focus away from an experimental analysis of effects demonstrable in individual subjects, removes the experimenter from the data, and substitutes statistical control for experimental control. A final point with respect to single-case designs relates to the previous discussion of the inductive

method. When individual subjects’ behavior is studied, it is not surprising that the same value of a variable may have different effects across subjects. For example, a relatively brief delay of reinforcement may markedly reduce the responding of one subject and not change the responding of another. Rather than averaging the two, the tactic in TEAB is to conduct a parametric analysis of delay duration with both subjects, to search for orderly and qualitatively similar functional relations across subjects even though, on the basis of intersubject comparisons, individuals may respond differently at any particular value. The achievement of these qualitatively similar functional relations contributes to the generality of the effect. The establishment of experimental control through the methods described in this section is a major theme of TEAB, exemplified in each of the other, empirical pillars of TEAB. Before reviewing those empirical pillars, however, examining how critical features of the environment are defined and used in TEAB is important.

Defining Environmental Events In the laboratory, responses often are defined operationally, as a matter of convenience, in terms of, for example, a switch closure. In principle, however, they are defined functionally, in terms of their effects. Baer (1981) observed that although every response or class of responses has form or structure, operant behavior has no necessary structure. Rather, its structure is determined by environmental circumstance: The structure of operant behavior is limited primarily by our ability to arrange the environment into contingencies with that behavior; to the extent that we can wield the environment more and more completely, to that extent behavior has less and less necessary structure. This is tantamount to saying that it is mainly our current relatively low level of technological control over the environment that seems to leave behavior with apparent necessary structure and that such a limitation is trivial. (Baer, 1981, p. 220). 35

Kennon A. Lattal

Functional definitions in TEAB originated with Skinner’s (1935) concept of the operant. With that analysis, he organized fluid, unique individual responses into integrated units or classes— operants—whereby all the unique members have the same effect on the environment. Stimuli were similarly organized into functional classes on the basis of similarity of environmental (behavioral) effect. He also conceptualized reinforcement not in terms of its forms or features, but functionally, in terms of its effects on responses. Thus, any environmental event can function, in principle, as a reinforcer or as a punisher, or as neither, depending on how it affects behavior. On the question of circularity, Skinner (1938) simply noted that a reinforcing stimulus is defined as such by its power to produce the resulting change. There is no circularity about this; some stimuli are found to produce the change, others not, and they are classified as reinforcing and non-reinforcing accordingly. (p. 62) Another aspect of the contextual basis of reinforcers and punishers is what Keller and Schoenfeld (1950; cf. Michael, 1982) called the establishing operation. The establishing operation delineates the conditions necessary for some event or activity to function as a reinforcer or punisher. In most laboratory research, establishing a reinforcer involves restricted access, be it, for example, to food or to periods free of electric shock delivery. That is, reinforcers and punishers have to be established by constructing a specific context or history. Morse and Kelleher (1977) summarized several experiments in which they suggested that electric shock delivery sufficient to maintain avoidance behavior was established as a positive reinforcer. This establishment was accomplished by creating particular kinds of behavioral histories. McKearney (1972), for example, first maintained responding by a free-operant shockavoidance schedule and then concurrently super imposed a schedule of similar shocks delivered independently of responding. This schedule was in turn replaced by response-dependent shocks scheduled at the same rate as the previously responseindependent ones, and the avoidance schedule was 36

eliminated. Responding then was maintained when its only consequence was to deliver an electric shock that had previously been avoided. As the research cited by Morse and Kelleher (1977) suggests, the type of event is less important than its behavioral effect (which in turn depends on the organism’s history of interaction with the event). Events that increase or maintain responding when made dependent on the response are categorized as reinforcers, and events that suppress or eliminate responding are categorized as punishers. Associating a valence, positive or negative, with these functionally defined reinforcers and punishers is conventional. The valence describes the operation whereby the behavioral effect of the consequence occurs, that is, whether the event is added to or subtracted from the environment, which yields a 2 × 2 contingency table in which valence is shown as a function of behavioral change (maintain or increase in the case of reinforcement, and decrease or eliminate in the case of punishment). Despite widespread adoption of this categorization system, the use of valences has been criticized on the grounds that they are arbitrary and ambiguous (Baron & Galizio, 2005; Michael, 1975). Baron and Galizio (2005) cited an experiment in which “rats kept in a cold chamber would press a lever that turned on a heat lamp” (p. 87) to make the point that in such cases it indeed is difficult to separate cold removal and heat presentation. Presenting food, it has been argued, may be tantamount to removing (or at least reducing) deprivation and removing electric shock may be tantamount to presenting a shock-free period (cf. Verhave, 1962). Although the Michael (1975) and Baron and Galizio position falls on sympathetic ears (e.g., Marr, 2006), the distinction continues. The continued use of the positive–negative distinction is a commentary on its utility in the general verbal community of behavior analysts as well as in application and teaching. Despite potential ambiguities in some circumstances, the operations are clear—events that experimenters present and remove are sufficiently straightforward to allow description. The question of valences, as with any question in a science, should have an empirical answer. Because the jury is still out on this question, the long-enduring practice

The Five Pillars of the Experimental Analysis of Behavior

of identifying valences on the basis of experimental operations is retained in this chapter. Pillar 2: Reinforcement An organism behaves in the context of an environment in which other events are constantly occurring, some as a result of its responses, and others independent of its responses. One outcome of some of these interactions is that the response becomes more likely than it would be in their absence. Such an outcome is particularly effective when a dependency exists between such environmental events and behavior, a two-term contingency involving responding and what will come to function as a reinforcer. This process of reinforcement is fundamental in understanding behavior and is thus a pillar of TEAB. Of the four empirical pillars, reinforcement may be considered the most basic because the other three cannot exist in the absence of reinforcement. Each of the other pillars adds another element to reinforced responding.

Establishing a Response As noted, to establish an operant response, a reinforcer must be established and the target response specified precisely so that it is distinguished from other response forms. Several techniques may then be used to bring about the target response. One is to simply wait until it occurs (e.g., Neuringer, 1970); however, the target response may never occur without more direct intervention. Some responses can be elicited or evoked through a technique known colloquially as “baiting the operandum.” Spreading a little peanut butter on a lever, for example, evokes considerable exploration by the rat of the lever, typically resulting in its depression, which then can be reinforced conventionally. The difficulty is that such baiting sometimes results in atypical response topographies that later can be problematic. A particularly effective technique related to baiting is to elicit the response through a Pavlovian conditioning procedure known as autoshaping (Brown & Jenkins, 1968). Once elicited, the response then can be reinforced. With humans, instructions are often an efficient means of establishing an operant response (see Rules and Instructions section). As with all

techniques for establishing operant responses, the success of the instructions depends on their precision. An alternative form of instructional control is to physically guide the response (e.g., Gibson, 1966). Such guided practice may be considered a form of imitation, although imitation as a more general technique of establishing operant behavior does not involve direct physical contact with the learner. The gold-standard technique for establishing an operant response is the differential reinforcement of successive approximations, or shaping. Discovered by Skinner in the 1940s, shaping involves immediately reinforcing successively closer approximations to the target response (e.g., Eckerman, Hienz, Stern, & Kowlowitz, 1980; Pear & Legris, 1987). A sophisticated analysis of shaping is that of Platt (1973), who extensively studied the shaping of interresponse times (IRTs; the time between successive responses). Baron (1991) also described a procedure for shaping responding under a shock-avoidance contingency. Shaping is part of any organism’s day-to-day interactions with its environment. Whether one is hammering a nail or learning a new computer program, the natural and immediate consequences of an action play a critical role in determining whether a given response will be eliminated, repeated, or modified. Some researchers have suggested that shaping occurs when established reinforcers occur independently of responding. Skinner (1948; see also Neuringer, 1970), for example, provided fooddeprived pigeons with brief access to food at 15-s intervals. Each pigeon developed repetitive stereotyped responses. Skinner attributed the outcome to accidental temporal contiguity between the response and food delivery. His interpretation, however, was challenged by Staddon and Simmelhag (1971) and Timberlake and Lucas (1985), who attributed the resulting behavior to biological and ecological processes rather than reinforcement. Nonetheless, the notion of superstitious behavior resulting from accidental pairings of response and reinforcer remains an important methodological and interpretational concept in TEAB (e.g., the changeover delay used ubiquitously in concurrent schedules is predicated on its value in eliminating the adventitious reinforcement of changing between concurrently available operanda). 37

Kennon A. Lattal

Positive Reinforcement Positive reinforcement is the development or maintenance of a response resulting from the responsedependent, time-limited presentation of a stimulus or event (i.e., a positive reinforcer). Schedules of positive reinforcement. A schedule is a prescription for arranging reinforcers in relation to time and responses (Zeiler, 1984). The simplest such arrangement is to deliver reinforcers independently of responding. Zeiler (1968), for example, first stabilized key pecking of pigeons on fixed-interval (FI) or variable-interval (VI) schedules. Then, the response–reinforcer dependency was eliminated so that reinforcers were delivered independently of key pecking at the end of fixed or variable time periods. This elimination generally reduced response rates, but the patterns of responding continued to be determined by the temporal distribution of reinforcers: Fixed-time schedules yielded positively accelerated responding across the interfood intervals, and variable-time (VT) schedules yielded more evenly distributed responding across those intervals. Zeiler’s (1968) experiment underlines the importance of response–reinforcer dependency in schedulemaintained responding. This dependency has been implemented in two ways in reinforcement schedules. In ratio schedules, either a fixed or a variable number of responses is the sole requirement for reinforcement. In interval schedules (as distinguished from time schedules, in which the response–reinforcer dependency is absent), a single response after a fixed or variable time period is the requirement for reinforcement. Each of these four schedules—fixed ratio (FR), variable ratio (VR), FI, and VI—control wellknown characteristic response patterns. In addition, the distribution of reinforcers in VI and VR schedules, respectively, affect the latency to the first response after a reinforcer and, with the VI schedule, the distribution of responses across the interreinforcer interval (Blakely & Schlinger, 1988; Catania & Reynolds, 1968; Lund, 1976). Other arrangements derive from these basic schedules. For example, reinforcing a sequence of two responses separated from one another by a relatively long or a relatively short time period results in, respectively, low and high rates of responding. The 38

former arrangement is described as differentialreinforcement-of-low-rate (DRL), or an IRT > t, schedule, and the latter as a differential-reinforcementof-high-rate, or an IRT < t, schedule. The latter in particular often is arranged such that the first IRT < t after the passage of a variable period of time is reinforced. The various individual schedules can be combined to yield more complex arrangements, suited for the analysis of particular behavioral processes. The taxonomic details of such schedules are beyond the scope of this chapter (see Ferster & Skinner, 1957; Lattal, 1991). Several of them, however, are described in other sections of this chapter in the context of particular behavioral processes. Schedules of reinforcement are important in TEAB because they provide useful baselines for the analysis of other behavioral phenomena. Their importance, however, goes much further than this. The ways in which consequences are scheduled are fundamental in determining behavior. This point resonates with the earlier Baer (1981) quotation in the Defining Environmental Events section about behavioral structure. The very form of behavior is a function of the organism’s history, of the ways in which reinforcement has been arranged— scheduled—in the past as well as in the present. Parameters of positive reinforcement. The schedules described in the previous section have their effects on behavior as a function of the parameters of the reinforcers that they arrange. Four widely studied parameters of reinforcement are dependency, rate, delay, and amount. The importance of the response–reinforcer dependency in response maintenance has been described in the preceding section. Its significance is underscored by subsequent experiments showing that variations in the frequency with which this dependency is imposed or omitted modulate response rates (e.g., Lattal, 1974; Lattal & Bryan, 1976). In addition, adding response-independent reinforcers when responding is maintained under different schedules changes both rates and patterns of responding (e.g., Lattal & Bryan, 1976; Lattal, Freeman, & Critchfield, 1989). Reinforcement rate is varied on interval schedules by changing the interreinforcer interval and on ratio schedules by varying the response requirement.

The Five Pillars of the Experimental Analysis of Behavior

The effects of reinforcement rate depend on what is measured (e.g., response rate, latency to the first response after a reinforcer). Generally speaking, positively decelerated hyperbolic functions describe the relation between response rate and reinforcement rate (Blakely & Schlinger, 1988; Catania & Reynolds, 1968; Felton & Lyon, 1966; but see the Behavioral Economics section later in this chapter—if the economy is closed, a different relation may hold). Delaying a reinforcer from the response that produces it generally decreases response rates as a function of the delay duration (whether the delay is accompanied by a stimulus change) and the schedule on which it is imposed (Lattal, 2010). The effects of the delay also may be separated from the inevitable changes in reinforcement rate and distribution that accompany the introduction of a delay of reinforcement (Lattal, 1987). Amount of reinforcement includes both its form and its quantity. In terms of form, some reinforcers are substitutable for one another to differing degrees (e.g., root beer and lemon-lime soda), whereas other qualitatively different reinforcers do not substitute for one another, but may be complementary. Two complementary reinforcers covary with one another (e.g., food and water). Reinforcers that vary in concentration (e.g., a 10% sucrose solution vs. a 50% sucrose solution), magnitude (one vs. six food pellets), or duration (1 s versus 6 s of food access) often have variable effects on behavior (see review by Bonem & Crossman, 1988), with some investigators (e.g., Blakely & Schlinger, 1988) reporting systematic differences as a function of duration, but others not (Bonem & Crossman, 1988). One variable that affects these different outcomes is whether the quantitatively different reinforcers are arranged across successive conditions or within individual sessions (Catania, 1963). DeGrandpre, Bickel, Hughes, Layng, and Badger (1993) suggested that reinforcer amount effects are better predicted by taking into account both other reinforcement parameters and the schedule requirements (see Volume 2, Chapter 8, this handbook).

Negative Reinforcement Negative reinforcement is the development or maintenance of a response resulting from the

response-dependent, time-limited removal of some stimulus or event (i.e., a negative reinforcer). Schedules of negative reinforcement. Schedules of negative reinforcement involve contingencies in which situations are either terminated or postponed as a consequence of the response. The prototypical stimulus used as the negative reinforcer in laboratory investigations of negative reinforcement is electrical stimulation, because of both its reliability and its specifiability in physical terms, although examples of negative reinforcement involving other types of events abound. Escape. Responding according to some schedule intermittently terminates the delivery of electric shocks for short periods. Azrin, Holz, Hake, and Allyon (1963) delivered to squirrel monkeys response-independent shocks according to a VT schedule. A fixed number of lever presses suspended shock delivery and changed the stimulus conditions (turning on a tone and dimming the chamber lights) for a fixed time period. Lever pressing was a function of both the duration of the time out and shock intensity, but the data were in the form of cumulative records, precluding a quantitative analysis of the functional relations between responding and these variables. This and an earlier experiment on VI escape schedules with rats (Dinsmoor, 1962) are among the few studies reporting the effects of schedules of negative reinforcement based on shock termination, thereby limiting the generality of the findings. One problem with using escape from electric shock is that shock can elicit responses, such as freezing or emotional reactions, that are incompatible with the operant escape response. An alternative method of studying escape that circumvents this problem is a timeout from the avoidance procedure first described by Verhave (1962). Perone and Galizio (1987, Experiment 1) trained rats to lever press when this response postponed the delivery of scheduled shocks. At the same time, a multiple schedule was in effect for a second lever in the chamber. During one of the multiple-schedule components, pressing the second lever produced timeouts from avoidance (i.e., escape from the avoidance contingency and the stimuli associated with it) according 39

Kennon A. Lattal

to a VI schedule. Presses on the second lever had no effect during the other multiple-schedule component (escape extinction). During the VI escape component, responding on the second lever was of moderate rate and constant over time, but it was infrequent in the escape-extinction component. Other parameters of timeout from avoidance largely have been unexplored. Avoidance. The difference between escape and avoidance is one of degree rather than kind. When the escape procedure is conceptualized conventionally as allowing response-produced termination of a currently present stimulus, avoidance procedures allow responses to preclude, cancel, or postpone stimuli that, in the absence of the response, will occur. The presentation of an electric shock, for example, is preceded by a warning stimulus, during which a response terminates the stimulus and cancels the impending shock. Thus, there is escape from a stimulus associated with a negative reinforcer as well as avoidance of the negative reinforcer itself. Although a good bit of research has been conducted on discriminated avoidance (in which a stimulus change precedes an impending negative reinforcer, e.g., Hoffman, 1966), avoidance unaccompanied by stimulus change is more commonly investigated in TEAB. Free-operant avoidance, sometimes labeled nondiscriminated or unsignaled avoidance, is characterized procedurally by the absence of an exteroceptive stimulus change after the response that postpones or deletes a forthcoming stimulus, such as electric shock. The original free-operant avoidance procedure was described by Sidman (1953) and often bears his name. Each response postponed for a fixed period an otherwise-scheduled electric shock. If a shock was delivered, subsequent shocks were delivered at fixed intervals until the response occurred. These two temporal parameters, labeled, respectively, the response–shock (R-S) and shock–shock (S-S) intervals together determine response rates. Deletion and fixed- and variable-cycle avoidance schedules arrange the cancellation of otherwise unsignaled, scheduled shocks as a function of responding, with effects on responding similar to those of Sidman avoidance (see Baron, 1991). 40

Parameters of negative reinforcement. Two variables that affect the rate of responding under schedules of negative reinforcement are the parameters (e.g., type, frequency [S-S interval in the case of free-operant avoidance], intensity, duration) of the stimulus that is to be escaped or avoided and the duration of the period of stimulus avoidance or elimination yielded by each response (e.g., the R-S interval in Sidman avoidance). Leander (1973) found that response rates on free-operant avoidance schedules were an increasing function of the interaction between electric shock intensity and duration (cf. Das Graças de Souza, de Moraes, & Todorov, 1984). Shock frequency can be manipulated by changing either the S-S or the R-S interval. With other parameters of the avoidance contingency held constant, Sidman (1953) showed that response rates increased with shorter S-S intervals. Response rates also vary inversely with the duration of the R-S interval during free-operant avoidance, such that shorter R-S intervals control higher response rates than longer R-S intervals (Sidman, 1953). Furthermore, Logue and de Villiers (1978) used concurrent variable-cycle avoidance schedules to show that response rates on operanda associated with these schedules were proportional to the frequency of scheduled shocks (rate of negative reinforcement) arranged on the two alternatives. That is, more frequently scheduled shocks controlled higher response rates than did less frequently scheduled shocks.

Extinction Extinction is functionally a reduction or elimination of responding brought about in either of two general operations: by removing the positive or negative reinforcer or by rendering the reinforcer ineffective by eliminating the establishing operation. The former is described hereafter as conventional extinction, because these operations are the ones more commonly used in TEAB when analyzing extinction. With positive reinforcement, the latter is accomplished by either providing continuous access to the reinforcer (satiation) or by removing the response–reinforcer dependency (Rescorla & Skucy, 1969; see Schedules of Positive Reinforcement section earlier in this chapter for the effects of this operation on responding). With

The Five Pillars of the Experimental Analysis of Behavior

negative reinforcement, extinction is accomplished by making the negative reinforcer inescapable. The rapidity of extinction depends both on the organism’s history of reinforcement and probably (although experimental analyses are lacking) on which of the aforementioned procedures are used to arrange extinction (Shnidman, 1968). Herrnstein (1969) suggested that the speed of extinction of avoidance is related to the discriminability of the extinction contingency, an observation that holds as well in the case of positive reinforcement. Extinction effects are rarely permanent once reinforcement is reinstated. Permanent effects of extinction are likely the result of the alternative reinforcement of other responses while extinction of the targeted response is in effect. Extinction also can generate other responses, some of which may be generalized or induced from the extinguished response itself and others of which depend on other stimuli in the environment in which extinction occurs (see Volume 2, Chapter 4, this handbook). Some instances of such behavior are described as schedule induced and are perhaps more accurately labeled extinction induced because they typically occur during those parts of a schedule associated with nonreinforcement (local extinction). For example, such responding is observed during the period after reinforcement under FR or FI schedules, in which the probability of a reinforcer is zero. Azrin, Hutchinson, and Hake (1966; see also Kupfer, Allen, & Malagodi, 2008) found that pigeons attack conspecifics when a previously reinforced key peck response is extinguished. Another example of the generative effects of extinction is resurgence. If a response is reinforced and then extinguished while a second response is concurrently reinforced, extinguishing that second response leads to a resurgence of the first response. The effect occurs whether the second response is or is not extinguished before concurrently reinforcing the second, and the effect depends on parameters of both the first and the second reinforced response (e.g., Bruzek, Thompson, & Peters, 2009; Lieving & Lattal, 2003).

Frameworks Different frameworks have evolved that summarize and integrate the empirical findings deriving from

analyses of the reinforcement of operant behavior. All begin with description. Many involve quantitative analysis and extrapolation through modeling. Others, although also quantitative in the sense of reducing measurement down to numerical representation, are less abstract, remaining closer to observed functional relations. Each has been successful in accounting for numerous aspects of behavioral phenomena. Each also has limitations. None has achieved universal acceptance. The result is that instead of representing a progression with one framework leading to another, these frameworks together make up a web of interrelated observations, each contributing something to the general understanding of how reinforcement has its effects on behavior. The following sections provide an overview of some of these contributions to this web. Levels of influence. Thorndike (1911) observed that of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur. (p. 244) The temporal relation between a response and the reinforcer that follows has been a hallmark of the reinforcement process. Skinner’s (1948) “superstition” demonstration underscored the importance of this relation by suggesting that even in the absence of a programmed consequence of responding, the environment will strengthen whatever response occurs contiguously with the reinforcer. Ferster and Skinner (1957) frequently assigned response– reinforcer temporal contiguity a primary role in accounting for the effects of reinforcement schedules. To say that all were not content with response– reinforcer temporal contiguity as the central mechanism for reinforcement is an understatement. In a seminal experiment, Herrnstein and Hineline (1966) exposed rats to a schedule consisting of two response-independent shock distributions, one frequent, the other less so. Each session started in the 41

Kennon A. Lattal

frequent-shock distribution, and a lever press shifted the distribution of shocks to the leaner one, at which the rat remained until a shock was delivered. At this point, the frequent shock distribution was reinstated and remained in effect until the next response, at which point the above-described cycle repeated. Responses did not eliminate shocks; they only reduced shock frequency. Furthermore, because shocks were distributed randomly in time, temporal discriminations were precluded. Herrnstein and Hineline reasoned that if responding were maintained under this schedule, it would be because of an aggregated effect of reductions in shock frequency over noninstantaneous time periods (cf. Sidman, 1966). Consistent with Herrnstein and Hineline’s findings, Baum’s (1973) description of the correlation-base law of effect remains a cogent summary of a molar framework for reinforcement (see also Baum, 1989; Williams [1983] critiqued the correlational framework). TEAB makes frequent reference to level of analysis. This phrase refers to both the description of data and the framework for accounting for those data. Molecular descriptions are of individual or groups of responses, often in relation to reinforcement. Molar descriptions are of aggregated responses across time and the allocation of time to differing activities. Molecular accounts of reinforcement effects emphasize the role of events occurring at the time of reinforcement (e.g., Peele, Casey, & Silberberg, 1984), and molar accounts emphasize the role of aggregated effects of reinforcers integrated over noninstantaneous time periods (e.g., Baum, 1989). Proponents of each framework have at various times claimed primacy in accounting for the effects of reinforcement, but the isolation of an irrefutable single mechanism at one level or another seems remote. The issue is not unlike others concerning levels of analysis in other disciplines, for example, punctuated equilibrium versus continuous evolution and wave and particle theories of light. The “resolution” of the molar versus molecular issue may ultimately be pragmatic: The appropriate level is that at which behavior is predicted and controlled for the purposes at hand. Relational reinforcement theory. Reinforcers necessarily involve activities related to their access, 42

consumption, or use. Premack (1959) proposed reinforcement to be access to a (relatively) preferred activity, such as eating. For Premack, the first step in assessing reinforcement was to create a preference hierarchy. Next, highly preferred activities were restricted and made accessible contingent on engagement in a nonpreferred activity, with the outcome that the low-probability response increased in frequency. Timberlake and Allison (1974) suggested that the Premack principle was a corollary of a more general response deprivation principle whereby any response constrained below its baseline level can function as a reinforcer for another response that allows the constrained response to rise to its baseline level. Premack’s analysis foreshadowed other conceptualizations of reinforcement contingencies in terms of constraints on behavioral output (e.g., Staddon, 1979). Choice and matching. Perhaps the contribution to modern behavior analysis with the greatest impact is the matching law (Herrnstein, 1970; see Chapter 10, this volume). The matching law is both a summary of a number of empirical reinforcement effects and a framework for integrating those effects. Herrnstein’s (1961) original proposal was a simple quantitative statement that relative responding is distributed proportionally among concurrently available alternatives as a function of the relative reinforcement proportions associated with each alternative. He and others thereafter developed it into its more generalized form, expressed by Baum (1974; see also Staddon, 1968; McDowell, 1989) as r  R1 = b 1  , R2  r2  a

where R1 and R2 are response rates to the two alternatives and r1 and r2 are reinforcement rates associated with those alternatives. The parameters a and b are indices of, respectively, sensitivity (also sometimes labeled the discriminability of the alternatives) and bias (e.g., a preexisting preference for one operandum over the other). When restated in logarithmic form, a and b describe the slope and intercept of a straight line fitted to the plot of the two ratios on either axis of a graph. A rather typical, but not universal, finding in many experiments is undermatching, that is, preferences for the richer alternative are less

The Five Pillars of the Experimental Analysis of Behavior

than is predicted by a strict proportionality between response and reinforcement ratios. This undermatching and overmatching (a greater-than-predicted preference for the richer alternative, and bias) were the impetus for developing the generalized matching law. One of Herrnstein’s (1970) insights was that all behavior should be considered in the framework of choice. Even when only one response alternative is being measured, there is a choice between that response and engaging in other behavior. This observation led to two further conclusions: The total amount of behavior in a situation is constant or fixed (but see McDowell, 1986); it is simply distributed differently depending on the circumstances, and there are unmeasured sources of reinforcement (originally labeled ro, but later re). Thus, deviations from the strict proportionality rule, such as undermatching, are taken by some to reflect changes in re as well as perhaps bias or sensitivity changes. Davison and Tustin (1978) considered the matching law in the broader context of decision theory, in particular, signal detection theory (D. M. Green & Swets, 1966). Originally developed to distinguish sensitivity and bias effects in psychophysical data, Davison and Tustin used signal detection theory to describe the discriminative function of reinforcement in maintaining operant behavior. Their analysis thus describes the mathematical separation of the biasing and discriminative stimulus effects of the reinforcer. More generally, the analysis of choice has been extended across TEAB, from simple reinforcement schedules to foraging, social behavior, and applied behavior analysis (see Volume 2, Chapter 7, this handbook). Response strength and behavioral momentum. Reinforcement makes a response more likely, and this increase in response probability or rate has conventionally been taken by many as evidence of the strength of that response. One difficulty with such a view of response strength is that response rate is not determined simply by whether the response has been reinforced but by how the contingencies by which the response is reinforced are arranged. Thus, the absence of the target response may index its strength in the case of a differential-reinforcementof-other-behavior (DRO) schedule, in which

reinforcement occurs only if the target response does not occur for the specified period. Nevin (1974; see Volume 2, Chapter 5, this handbook) conceptualized response strength as resistance to change of a response when competing reinforcement contingencies impinge on that response. The relation between response strength and resistance to change originated early in the psychology of learning (e.g., Hull, 1943), but Nevin expanded it to schedule-maintained responding. He arranged multiple schedules of reinforcement in which the parameters of the reinforcer—rate, magnitude, and delay, in different experiments—differed in the two components. The reinforcer maintaining the response was made less effective by, in different conditions, removing it (extinction), prefeeding the food-restricted animals (satiation), or providing response-independent reinforcers at different rates during the chamber blackout that separated the two components. Each of these disrupting operations had similar disruptive (response rate–lowering) effects such that the responding maintained by more frequent, larger, or less delayed reinforcers was less reduced than was responding in the other component in which the reinforcers were less frequent, shorter, or more delayed. In considering the resistance of behavior to change, Nevin, Mandell, and Atak (1983) proposed a behavioral analogy to physical momentum, that is, the product of mass and velocity: When responding occurs at the same rate in two different schedule components, but one is less affected by an external variable than is the other, we suggest that the performance exhibiting greater resistance to change be construed as having greater mass. (p. 50) Nevin et al. have shown that reinforcement rate, regardless of the contingency between responses and reinforcers, is a primary determinant of momentum: More frequently reinforced responses are more resistant to change. For example, Nevin, Tota, Torquato, and Shull (1990, Experiment 1) found that responding maintained by a combination of response-dependent and response-independent food deliveries (a VI schedule + a VT schedule) was 43

Kennon A. Lattal

more resistant to disruption than was responding maintained by response-dependent food delivery only. This was the case because the reinforcement rate in the VI + VT component was higher, even though response rate was lower in this component. Nevin et al. (1990) observed that “as a result of Pavlovian contingencies, nonspecific effects underlying Pavlovian contingencies resulting from the association of discriminative stimuli with different rates of reinforcement may have nonspecific effects that ‘arouse or motivate operant behavior’” (p. 374). Another consideration in behavioral momentum may be response rate. Different reinforcement rates result in different response rates. When reinforcement rates are held constant and response rates are varied, the lower response rates are often more resistant to change (Blackman, 1968; Lattal, 1989). Behavioral economics. Skinner (1953) observed that “statements about goods, money, prices, wages, and so on, are often made without mentioning human behavior directly, and many important generalizations in economics appear to be relatively independent of the behavior of the individual” (p. 398). The chasm described by Skinner has long since been bridged, both in TEAB (Madden, 2000; see Volume 2, Chapter 8, this handbook) and in the discipline of economics. This bridging has come about through the mutual concern of the two disciplines with the behavior of consumption of goods and services as a function of environmental circumstances. The behavioral–economic framework has been a useful heuristic for generating experimental analyses that have expanded the understanding of reinforcement in several ways. In a token economy on a psychiatric ward (Allyon & Azrin, 1968), for example, consumption of a nominal reinforcer was reduced if the same item was available at a lower cost, or no cost, elsewhere. Thus, for example, if visitors brought desirable food items onto the ward from outside, demand for those food items within the token economy diminished. Hursh (1980) captured the essential features of this scenario by distinguishing open economies, in which items are available from multiple sources, from closed economies, in which those items are only available in a defined context. Hall and Lattal (1990) 44

directly compared the effects of reinforcement rate on VI schedule performance in open and closed economies. In the open economy, sessions were terminated before the pigeon earned its daily food allotment; postsession feeding was provided so the animal was maintained at a target weight. In the closed economy, the pigeons earned all of their food by key pecking. The functions relating response rate to reinforcement rate differed for the two economic contexts. In the open economy, response rates decreased with decreasing reinforcement rate, whereas in the closed economy, response rates increased with decreasing reinforcement rate. Such findings suggest that the functional relations that obtain between reinforcement parameters and behavior are not universal, but depend on the context in which they occur. Consumption also differs as a function of the reinforcer context. The interaction between reinforcers lies on a continuum. At one extreme, one reinforcer is just as effective as another in maintaining behavior (perfect substitutes). At the other extreme, reinforcers do not substitute for one another at all. Instead, as consumption of one increases (decreases) with price changes, so does the other despite its price being unchanged. Such a relation reveals that the reinforcers function as perfect complements. Between the extremes, reinforcers substitute for one another to varying degrees (for a review, see L. Green & Freed, 1998). Behavioral–economic analyses have shown that qualitatively different reinforcers may vary differentially in the extent to which they sustain behavior in the context of different environmental challenges, often expressed as cost in economic analyses. Some reinforcers continue to sustain behavior even as cost, measured, for example, by the number of responses required for reinforcement, increases, and the behavior sustained by others diminishes with such increased cost. This difference in response sustainability across increasing cost distinguishes inelastic (fixed sustainability regardless of cost) from elastic (sustainability varies with cost) reinforcers. Other ways of increasing the cost of a reinforcer are by increasing the delay between the reinforcer and the response that produced it or by reinforcing a

The Five Pillars of the Experimental Analysis of Behavior

response with decreasing probability. These two techniques of changing reinforcer cost have been combined with different magnitudes of reinforcement to yield what first was called a self-control paradigm (Rachlin & Green, 1972) but now generally is described as delay (or probability, as appropriate) discounting. A choice is arranged between a small or a large (or a less or more probable) reinforcer, delivered immediately after the choice response. Not surprisingly, the larger (or more probable) reinforcer almost always is selected. Next, a delay is imposed between the response and delivery of the larger reinforcer (or the probability of the larger reinforcer is decreased), and the magnitude of the small, immediate (or small but sure thing) reinforcer is varied systematically until an indifference point is reached at which either choice is equally as likely (e.g., Mazur, 1986; Richards, Mitchell, de Wit, & Seiden, 1997). Using the delay or probability discounting procedure, a function can be created relating indifference points to the changing cost. Steep discounting of delayed reinforcers correlates with addictions such as substance use disorder and pathological gambling (see Madden & Bickel, 2010, for a review). A final, cautionary note that applies to economic concepts and terms as well as more generally to verbal labels commonly used in TEAB, such as contrast or even reinforcement: Terms such as elastic or inelastic or complementary or substitutable are descriptive labels, not explanations. Reinforcers do not have their effects because they are inelastic, substitutable, or discounted. Behavior dynamics. The emphasis in TEAB on steady-state performance sometimes obscures the centrality of the environment–behavior dynamic that characterizes virtually every contingency of reinforcement. Dynamics implies change, and change is most immediately apparent when behavior is in transition, as in the acquisition of a response previously in the repertoire in only primitive form, transitions from one set of reinforcement conditions to another, or behavioral change from reinforcement to extinction. Steady-state performance, however, also reveals a dynamic system. As Marr (personal communication, November 2010) observed, “[All] contingencies engender and manifest systems

dynamics.” The imposed contingency is not necessarily the effective one because that imposed contingency constantly interacts with responding, and the resulting dynamic is what is ultimately responsible for behavioral modulation and control. This distinction was captured by Zeiler (1977b), who distinguished between direct and indirect variables operating in reinforcement schedules. In a ratio schedule, for example, the response requirement, n, is a direct, specified variable (e.g., as in FR n), but responding takes time, so as one varies the ratio requirement, one is also indirectly varying the time between reinforcer deliveries. Another way of expressing this relation is by the feedback function of a ratio schedule: Reinforcement rate is determined directly by response rate. The feedback function is a quantitative description of the contingency specifying the dynamic interplay between responding and reinforcement. Using the methods developed by quantitative analysis of dynamical systems, a few behavior analysts have explored some properties of reinforcement contingencies (see Marr, 1992). For example, in an elegant analysis Palya (1992) examined the dynamic structure among successive IRTs in interval schedules, and Hoyert (1992) applied nonlinear dynamical systems (chaos) theory in an attempt to describe the cyclical interval-to-interval changes in response output that characterize steady-state FI schedule performance. Indeed, much of the response variability that is often regarded as a nuisance to be controlled (e.g., Sidman, 1960) may, from a dynamic systems perspective, be the inevitable outcome of the dynamic nature of any reinforcement contingency. Reinforcement in biological context. Skinner, (1981; see also Donahoe, 2003; Staddon & Simmelhag, 1971) noted the parallels between the selection of traits in evolutionary time and the selection of behavior over the organism’s lifetime (i.e., ontogeny). Phylogeny underlies the selection of behavior by reinforcement in at least two ways. First, certain kinds of events may come to function as reinforcers or punishers in part because of phylogeny. Food, water, and drugs of various sorts, for example, may function as reinforcers at 45

Kennon A. Lattal

least in part because of their relation to the organism’s evolved physiology. It is both retrograde and false, however, to suggest that reinforcers reduce to physiological needs. Reinforcers have been more productively viewed functionally; however, the organism’s phylogeny certainly cannot be ignored in discussions of reinforcement (see also Breland & Breland, 1961). Second, the mere fact that organism’s behavior is determined to a considerable extent by consequences is prima facie evidence of evolutionary processes at work. Thus, Skinner (1981) proposed that some responses are selected (reinforced) and therefore tend to recur, and those that are not selected or are selected against tend to disappear from the repertoire. The research of Neuringer (2002; see Chapter 22, this volume) on variability as an operant sheds additional light on the interplay between behavioral variation and behavioral selection. The analysis of foraging also has been a focal point of research attempting to place reinforcement in biological perspective. Foraging, whether it is for food, mates, or other commodities such as new spring dresses, involves choice. Lea (1979; see also Fantino, 1991) described an operant model for foraging based on concurrent chained schedules (see Response-Dependent Stimuli Correlated With Previously Established Reinforcers section later in this chapter) in which foraging is viewed as consisting of several elements, beginning with search and ending in consumption (including buying the new spring dress). Such an approach holds out the possibility of integrating ecology and TEAB (Fantino, 1991). A related integrative approach is found in parallels between foraging in natural environments and choice as described by the matching law and its variants (see Choice and Matching section earlier in this chapter). Optimal foraging theory, for example, posits that organisms select those alternatives in such a way that costs and benefits of the alternatives are weighed in determining choices (as contrasted, e.g., with maximizing theory, which posits that choices are made such that reinforcement opportunities are maximum). Optimal foraging theory is not, however, without its critics. Zeiler (1992), for example, has suggested that “optimality theory ignores the 46

fact that natural selection works on what it has to work with, not on ideals” (p. 420) and optimizing means to do the best conceivable. However, natural selection need not maximize returns. . . . What selection must do is follow a satisficing principle. . . . To satisfice means to do well enough to get by, not necessarily to do the best possible. (p. 420) Zeiler thus concluded that optimization in fact may be rare in natural settings. He distinguished evolutionary and immediate function of behavior, noting that the former relates to the fitness enhancement of behavior and the latter to its more immediate effects. In his view, optimal foraging theory errs in using the methods of immediate function to address questions of evolutionary function. Pillar 3: Punishment An outcome of some interactions between an organism’s responses and environmental events is that the responses become less likely than they would be in the absence of those events. As with reinforcement, this outcome is particularly effective when there is a dependency between such environmental events and behavior, a two-term contingency involving responding and what comes to function as a punisher. Such a process of punishment constitutes the third pillar of TEAB.

Positive Punishment Positive punishment is the suppression or elimination of a response resulting from the response-dependent, time-limited presentation of a stimulus or event (i.e., a negative reinforcer). In the laboratory, electric shock is a prototypical positive punisher because, at the parameters used, it produces no injury to the organism, it is precisely initiated and terminated, and it is easily specified in physical terms (e.g., its intensity, frequency, and duration). Furthermore, parameters of shock can be selected that minimize sensitization (overreactivity to a stimulus) and habituation (adaptation or underreactivity to a stimulus). Punishment always is investigated in the context of reinforcement because responding must be

The Five Pillars of the Experimental Analysis of Behavior

maintained before it can be punished. As a result, the effects of punishers always are relative to the prevailing reinforcement conditions. Perhaps the most important of these is the schedule of reinforcement. Punishment exaggerates postreinforcement pausing on FR and FI schedules, and it decreases response rates on VI schedules (Azrin & Holz, 1966). Because responding on DRL schedules is relatively inefficient (i.e., responses are frequently made before the IRT > t criterion has elapsed), by suppressing responding punishment actually increases the rate of reinforcement. Even so, pigeons will escape from punishment of DRL responding to a situation in which the DRL schedule is in effect without punishment (Azrin, Hake, Holz, & Hutchinson, 1965). Although most investigations of punishment have been conducted using baselines involving positive reinforcement, negative reinforcement schedules also are effective baselines for the study of punishment (e.g., Lattal & Griffin, 1972). Punishment effects vary as a function of parameters of both the reinforcer and the punisher. With respect to reinforcement, punishment is less effective when the organism is more deprived of the reinforcer (Azrin, Holz, & Hake, 1963). The effects of reinforcement rate on the efficacy of punishment are less clear. Church and Raymond (1967) reported that punishment efficacy increased as the rate of reinforcement decreased. When, however, Holz (1968) punished responding on each of two concurrently available VI schedules arranging different rates of reinforcement, the functions relating punishment intensity and the percentage of response reduction from a no-punishment baseline were virtually identical. Holz’s results suggest a similar relative effect of punishment independent of the rate of reinforcement. Perhaps other tests, such as those suggested by behavioral momentum theory (see Response Strength and Behavioral Momentum section earlier in this chapter) could prove useful in resolving these seemingly different results. Parameters of punishment include its immediacy with respect to the target response, intensity, duration, and frequency. Azrin (1956) showed that punishers dependent on a response were more suppressive of responding than were otherwise equivalent punishers delivered independently of

responding at the same rate. Punishers that are more intense (e.g., higher amperage in the case of electric shock) and more frequent have greater suppressive effects, assuming the conditions of reinforcement are held constant (see Azrin & Holz, 1966, for a review). The effects of punisher duration are complicated by the fact that longer duration punishers may adventitiously reinforce responses contiguous with their offset, thereby potentially confounding the effect of the response-dependent presentation of the punisher.

Negative Punishment Negative punishment is the suppression or elimination of a response resulting from the responsedependent, time-limited removal of a stimulus or event. Both negative punishment and conventional extinction involve removing the opportunity for reinforcement. Negative punishment differs from extinction in three critical ways. In conventional extinction, the removal of the opportunity for reinforcement occurs independently of responding, is relatively permanent (or at least indefinite), and is not correlated with a stimulus change. In negative punishment, the removal of the opportunity for reinforcement is response dependent, time limited, and sometimes (but not necessarily) correlated with a distinct stimulus. These latter three characteristics are shared by three procedures: DRO schedules (sometimes also called differential reinforcement of pausing or omission training), timeout from positive reinforcement, and response cost. Under a DRO schedule, reinforcers depend on the nonoccurrence of the target response for a predetermined interval. Responses during the interreinforcer interval produce no stimulus change, but each response resets the interreinforcer interval. Despite the label of reinforcement, the response-dependent, time-limited removal of the opportunity for reinforcement typically results in substantial, if not total, response suppression, that is, punishment. With DROs, both the amount of time that each response delays the reinforcer and the time between successive reinforcers in the absence of intervening responding can be varied. Neither parameter seems to make much difference once responding is reduced. They may, however, affect the speed with which responding 47

Kennon A. Lattal

is reduced and the recovery after termination of the contingency (Uhl & Garcia, 1969). As with other punishment procedures, a DRO contingency may be superimposed on a reinforcement schedule maintaining responding (Zeiler, 1976, 1977a), allowing examination of the effects of punishers, positive or negative, on steady-state responding. Lattal and Boyer (1980), for example, reinforced key pecking according to an FI 5-min schedule. At the same time, reinforcers were available according to a VI schedule for pauses in pecking of x−s or more. Pecking thus postponed any reinforcers that were made available under the VI schedule. No systematic relation was obtained between required pause duration and response rate. With a constant 5-s pause required for reinforcement of not pecking, however, the rate of key pecking was a negative function of the frequency of DRO reinforcement. That is, the more often a key peck postponed food delivery, the lower the response rates were and thus the greater the punishment effect was. Timeouts are similar to DROs in that, when used as punishers, they occur as response-dependent, relatively short-term periods of nonreinforcement. They differ from DROs because the periods of nonreinforcement are not necessarily resetting with successive responses, and they are accompanied by a stimulus change. Timeout effects are relative to the prevailing conditions of reinforcement. As was described earlier, periods of timeout from negative reinforcement function as reinforcers, as do periods of timeout from extinction (e.g., Azrin, 1961). Timeouts from situations correlated with reinforcement, however, suppress responding when they are response dependent (Kaufman & Baron, 1968). With response cost, each response or some portion of responses, depending on the punishment schedule, results in the immediate loss of reinforcers, or some portion of reinforcers. Response cost is most commonly used in laboratory and applied settings in which humans earn points or tokens according to some schedule of reinforcement. Weiner (1962), for example, subtracted one previously earned point for each response made by adult human participants earning points by responding under VI schedules. The effect was considerable suppression of responding. Response cost, similar to 48

timeout, can entail a concurrent loss of reinforcement as responding is suppressed. Pietras and Hackenberg (2005), however, showed that response cost has a direct suppressive effect on responding, independent of changes in reinforcement rate.

Frameworks Punishment has been conceptualized by different investigators as either a primary or a secondary process. Punishment as a primary or direct process. Thorndike (1911) proposed that punishment effects are equivalent and parallel to those of reinforcement but opposite in direction. Thus, the responsestrengthening effects defining reinforcement were mirror-image effects of the response-suppressing effects defining punishment. Schuster and Rachlin (1968) suggested three examples: (a) the suppressive and facilitative effects of following a conditioned stimulus (CS) with, respectively, shock or food; (b) mirror-image stimulus generalization gradients around stimuli associated with reinforcement and punishment; and (c) similar indifference when given a choice of response-dependent and responseindependent food or between response-dependent shock and response-independent shock (Shuster & Rachlin, 1968). Although (c) has held up to experimental analysis (Brinker & Treadway, 1975; Moore & Fantino, 1975; Schuster & Rachlin, 1968), (a) and (b) have proven more difficult to confirm. For example, precise comparisons of generalization gradients based on punishment and reinforcement are challenging to interpret because of the complexities of equating the food and shock stimuli on which the gradients are based. The research described in the Response-Independent Stimuli Correlated With Reinforcers and Punishers section later in this chapter illustrates the complexities of interpreting (a). Punishment as a secondary or indirect process. Considered a secondary process, the response suppression obtained when responding is punished comes about indirectly as the result of negative reinforcement of other responses. Thus, punishment is interpreted as a two-stage (factor) process whereby, first, the stimulus becomes aversive and, second, responses that result in its avoidance are then

The Five Pillars of the Experimental Analysis of Behavior

negatively reinforced. Hence, as unpunished responses are reinforced because they escape or avoid punishers, punished ones decrease, resulting in what appears as target-response suppression (cf. Arbuckle & Lattal, 1987). A variation of punishment as a secondary process is the competitive suppressive view that responding decreases because punishment degrades or devalues the reinforcer (e.g., Deluty, 1976). Thus, the suppressive effect of punishment is seen as an indirect effect of a less potent reinforcer for punished responses, thereby increasing the potency of reinforcers for nonpunished responses. Contrary to Deluty (1976), Farley’s (1980) results, however, supported a direct suppressive interpretation of punishment. Critchfield, Paletz, MacAleese, and Newland (2003) compared the direct and competitive suppression interpretations of punishment. Using human subjects in a task in which responding was reinforced with points and punished by the loss of a portion of those same points, a quantitative model based on the direct suppression interpretation yielded better fits to the data. Furthermore, Rasmussen and Newland (2008) suggested that the negative law of effect may not be symmetrical. Using a procedure similar to Critchfield et al.’s, they showed that single punishers subtract more value than single reinforcers add. Pillar 4: Control by Stimuli Correlated with Reinforcers and Punishers Reinforcers and punishers often are presented in the context of other stimuli that are initially without discernable effect on behavior. Over time and with continued correlation with established reinforcers or punishers, these other events come to have behavioral effects similar to the events with which they have been correlated. Such behavioral control by these other events is what places them as the fourth pillar of TEAB.

Response-Independent Stimuli Correlated With Previously Established Reinforcers and Punishers In the typical operant arrangement for studying the effects of conditioned stimuli, responding is maintained according to some schedule of

r einforcement, onto which the stimuli and their correlated events are superimposed. In the first such study, Estes and Skinner (1941) trained rats’ lever pressing on an FI food reinforcement schedule and periodically imposed a 3-min tone (a warning stimulus or conditional stimulus [CS] followed by a brief electric shock). Both the CS and the shock occurred independently of responding, and the FI continued to operate during the CS (otherwise, the response would simply extinguish during the CS). Lever pressing was suppressed during the tone relative to no-tone periods. This conditioned suppression effect occurs under a variety of parameters of the reinforcement schedule, the stimulus at the end of the warning stimulus, and the warning stimulus itself (see Blackman, 1977, for a review). When warning stimuli that precede an unavoidable shock are superimposed during avoidancemaintained responding (see the earlier Avoidance section), responses during the CS may either increase or decrease relative to those during the no-CS periods. The effect appears to depend on whether the shock at the end of the CS period is discriminable from those used to maintain avoidance. If the same shocks are used, responding is facilitated during the CS; if the shocks are distinct, the outcome is often suppression during the warning stimulus (Blackman, 1977). A similar arrangement has been studied with positive reinforcement–maintained responding, in which a reinforcer is delivered instead of a shock at the end of the CS. The effects of reinforcers at the end of the CS that are the same as or different from that arranged by the baseline schedule have been investigated. Azrin and Hake (1969) labeled this procedure positive conditioned suppression when they found that VI-maintained lever pressing of rats during the CS was generally suppressed relative to the no-CS periods. Similar suppression occurred whether the event at the end of the CS was the same as or different from the reinforcer used to maintain responding. LoLordo (1971), however, found that pigeons’ responding increased when the CS ended with the same reinforcer arranged by the background schedule, a result he related to autoshaping (Brown & Jenkins, 1968). Facilitation or suppression during a CS followed by a positive reinforcer 49

Kennon A. Lattal

seems to depend on both CS duration and the schedule of reinforcement. For example, Kelly (1973) observed both suppression and facilitation during a CS as a function of whether the baseline schedule was DRL or VR.

Response-Dependent Stimuli Correlated With Previously Established Punishers Despite being commonplace in everyday life, conditioned punishment has only rarely been studied in the laboratory. In an investigation of positive conditioned punishment, Hake and Azrin (1965) first established responding on a VI 120-s schedule of positive reinforcement in the presence of a white key light. The key-light color irregularly changed from white to red for 30 s. Reinforcement continued to be arranged according to the VI 120-s schedule during the red key light; however, at the end of the redkey-light period, a 500-ms electric shock was delivered independently of responding. This shock suppressed but did not eliminate responding when the key light was red. To this conditioned suppression procedure, Hake and Azrin then added the following contingency. When the white key light was on, each response produced a 500-ms flash of the red key light. Responding in the presence of the white key light was suppressed. When white-keylight responses produced a 500-ms flash of a green key light, when green previously had not been paired with shock, responding during the white key light was not suppressed. In addition, the degree of response suppression was a function of the intensity of the shock after the red key light. A parallel demonstration of negative conditioned punishment based on stimuli correlated with the onset of a timeout was conducted by Gibson (1968). Gibson used Hake and Azrin’s (1965) procedure, except that instead of terminating the red key light with an electric shock, it terminated with a timeout during which the chamber was dark and reinforcement was precluded. The effect was to facilitate key pecking by pigeons during the red key light relative to rates during the white key light. When, however, responses during the white key light produced 500-ms flashes of the red key light, as in Hake and Azrin, responding during the white key light was suppressed. This demonstration of conditioned 50

egative punishment is of particular interest n because it occurred despite the fact that stimuli preceding timeouts typically facilitate rather than suppress responding. Thus, the results cannot be interpreted in terms of the red key light serving as a discriminative stimulus for a lower rate of reinforcement (as they could be in Hake and Azrin’s experiment).

Response-Dependent Stimuli Correlated With Previously Established Reinforcers Stimuli correlated with an already-established reinforcer maintain responses that produce them (Williams, 1994). This is the case for stimuli correlated with either the termination or postponement of a negative reinforcer (Siegel & Milby, 1969) or with the onset of a positive reinforcer. In the latter case, early research on conditioned reinforcement used chained schedules (e.g., Kelleher & Gollub, 1962), higher order schedules (Kelleher, 1966), and a two– response-key procedure (Zimmerman, 1969). Interpretational limitations of each of these methods (Williams, 1994) led to the use of two other procedures in the analysis of conditioned reinforcement. The observing–response procedure (Wyckoff, 1952) involves first establishing a discrimination between two stimuli by using a multiple schedule in which one stimulus is correlated with a schedule of reinforcement and the other with extinction (or a schedule arranging a different reinforcement rate). Once discriminative control is established, a third stimulus is correlated with both components to yield a mixed schedule. An observing response on a second operandum converts the mixed schedule to a multiple schedule for a short interval (typically 10–30 s with pigeons). Observing responses are maintained on the second operandum. These responses, of nonhumans at least, are maintained primarily, if not exclusively, by the positive stimulus (S+) and not by the stimulus correlated with extinction (see Fantino & Silberberg, 2010; Perone & Baron, 1980; for an alternative interpretation of the role of the negative stimulus [or S−], see Escobar & Bruner, 2009). The general interpretation has been that the stimulus correlated with reinforcement functions as a conditioned reinforcer maintaining the observing response.

The Five Pillars of the Experimental Analysis of Behavior

Fantino (1969, 1977) proposed that stimuli function as conditioned reinforcers to the extent that they represent a reduction in the delay (time) to reinforcement relative to the delay operative in their absence. Consider the observing procedure described earlier, assuming that the two components alternate randomly every 2 min. Observing responses convert, for 15-s periods, a mixed VI 2-min extinction schedule to a multiple VI 2-min extinction schedule. Because components are 2 min each, the mixedschedule stimulus is correlated with a 4-min period (delay) between successive reinforcers. In the presence of the stimulus correlated with the VI schedule, the delay between reinforcers is 2 min, a 50% reduction in delay time relative to that in the mixed-schedule stimulus. Fantino’s delay reduction hypothesis asserts that this signaled reduction in delay to reinforcement maintains observing responses. In other tests of the delay reduction hypothesis, concurrent chained schedules have been used (in a chained schedule, distinct stimuli accompany the different links). In such tests, two concurrently available identical VI schedules serve as the initial links of the chained schedules. The equivalent VI initial links ensure that either terminal link is accessed approximately equally as often. The terminal links are mutually exclusive: The first initial link requirement met leads to its terminal link and simultaneously cancels the alternative chained schedule for that cycle. When the terminal link requirement is met, the response is reinforced, and the concurrent initial links recur. Thus, for example, if the two terminal links are VI 1 min and VI 5 min and the initial links are both VI 1 min, the average time to reinforcement achieved by responding on both alternatives is 3.5 min (0.5 min in the initial link + 3 min in the terminal link). Responding exclusively on the operandum leading to the VI 1-min terminal link produces a reinforcer on average every 2 min, a reinforcement delay reduction of 1.5 min from the average for responding on both. Responding exclusively on the operandum leading to the VI 5-min terminal link produces a reinforcer once every 6 min, yielding a reinforcement delay increase of 2.5 min relative to the average for responding on both. The greater delay reduction for responding on

the operandum leading to the VI 1-min terminal link predicts an exclusive preference for this alternative, a prediction confirmed by experimental analysis. Before leaving the topic of conditioned reinforcement, it should be noted that there is not uniform agreement as to the significance of conditioned reinforcement (and, by extrapolation, conditioned punishment) as a concept. Although it has strong proponents (Dinsmoor, 1983; Fantino, 1977; Williams, 1994), among its critics are Davison and Baum (2006), who suggested that the concept had outlived its usefulness and that conditioned reinforcement is more usefully considered in terms of discriminative stimulus control. This suggestion harkens back to an earlier one that conditioned reinforcers must first be established as discriminative stimuli (e.g., Keller & Schoenfeld, 1950), but Davison and Baum called for the abandonment of the concept (see also Chapter 17, this volume). Pillar 5: Contextual and Stimulus Control Interactions between responding and reinforcers and punishers occur in broader environmental contexts, both distal or historical and proximal or contemporary. These contexts define what is sometimes called antecedent control of behavior. The term is something of a misnomer because under such circumstances, the behavior is controlled by the twoterm contingency in the context of the third, antecedent event. Such joint control by reinforcement contingencies in context is what constitutes the final pillar of TEAB.

Behavioral History Perhaps the broadest context for the two-term contingency is historical. Although the research is not extensive, several experiments have examined with some precision functional relations between past experiences and present behavior. These analyses began with the work of Weiner (e.g., 1969), who showed that different individuals responded differently on contemporary reinforcement schedules as a function of their previous experience responding on other schedules. Freeman and Lattal (1992) 51

Kennon A. Lattal

investigated the effects of different histories under stimulus control within individual pigeons. The pigeons were first trained on FR and DRL schedules, equated for reinforcement rate and designed to generate disparate response rate, in the presence of distinct stimuli. When subsequently exposed to FI schedules in the presence of both stimuli, they responded for many sessions at higher rates in the presence of the stimuli previously associated with the FR (high response rate) schedule. This effect not only replicated Weiner’s human research, but it showed within individual subjects that an organism’s behavioral history could be controlled by the stimuli with which that history was correlated. Other experiments have elaborated such behavioral history effects. Ono (2004), for example, showed how preferences for forced versus free choices were determined by the organism’s past experiences with the two alternatives.

Discriminative Stimulus Control By correlating distinct stimuli with different conditions of reinforcement or punishment, responding typically comes to be controlled by those stimuli. Such discriminative stimulus control occurs when experimental variations in the stimuli lead to correlated variations in behavior. Discriminative stimulus control can be established with both reinforcement and punishment. The prototypical example of discriminative stimulus control is one in which responding is reinforced in the presence of one stimulus, the S+ or SD, and not in the presence of another, the S− or SΔ. Stimulus control, however, can involve different stimuli correlated with different conditions of reinforcement, in which case there would be two positive stimuli, or it can involve conditions in which punishment is present or absent in the presence of different stimuli. The lack of overriding importance of the form of the stimulus in establishing positive discriminative stimulus control was illustrated by Holz and Azrin (1961). They first punished each of a pigeon’s key responses otherwise maintained by a VI schedule of reinforcement. Then, both punishment and reinforcement were discontinued, allowing responding to drop to low, but nonzero, levels. At that point, 52

each response again was punished, but reinforcement was not reinstated. Because of its prior correlation with reinforcement, punishment functioned as an S+, thereby resulting in considerable responding, at least in the short term. This result underlines the functional definition of discriminative stimuli: They are defined in terms of their effect, not by their form. Two prerequisites for stimulus control are (a) that the stimuli be different from both the absence of stimulation (i.e., above the absolute threshold) and that they be discriminable from one another (i.e., above the difference threshold) and (b) that the stimuli be correlated with different reinforcement or punishment contingencies. Thresholds are in part physiological and phylogenic. Human responding, for example, cannot be brought under control of visual stimuli outside the range of physical detection of the human eye. Signal detection theory (D. M. Green & Swets, 1966) posits that the discriminable dimension of a stimulus can be separated from the reinforcement contingencies that bias choices in assessments of threshold measurements. Researchers in TEAB have used signal detection methods to not only complement other methods of assessing control by conventional sensory modality stimuli (i.e., visual and auditory stimuli; cf. Nevin, 1969) but also to isolate the discriminative and reinforcing properties of a host of other environmental events, including reinforcement contingencies themselves (e.g., Davison & Tustin, 1978; Lattal, 1979). Another issue that receives considerable attention in the analysis of stimulus control is attention (see Chapter 17, this volume). Although from some perspectives, attention is considered a prerequisite to stimulus control, in TEAB attention is stimulus control (e.g., Ray, 1969). The behavioral index of attention is whether the organism is responding to the nominal stimuli being presented. Thus, attending to a stimulus means responding in its presence and not in its absence, and such differential responding also defines stimulus control. According to this analysis, an instructor does not get the class’s attention to start the day’s activities; responding to the day’s activities is what having the class’s attention means. Correlating stimuli with different reinforcement or punishment contingencies establishes discriminative stimulus control. It sometimes is labeled

The Five Pillars of the Experimental Analysis of Behavior

discrimination, although care is taken in TEAB to ensure that the term describes an environment– behavior relation and not an action initiated by the organism. Discriminations have been established in two ways. The conventional technique is simply to expose the organism to the discrimination task until the behavior comes under the control of the different discriminative stimuli. Terrace (1963) reported differences in the number of responses made to a stimulus correlated with extinction (S−) as a function of how the discrimination was trained, specifically, how the S− and correlated period of nonreinforcement was introduced. The typical, sudden introduction of the S− after responding had been well established in the presence of another stimulus resulted in many (unreinforced) responses during the S− presentations, responses that Terrace labeled as errors. Introducing the S− in a different way changes the behavior it controls. Terrace introduced the S− simultaneously with the commencement of S+ training, but at low intensity (the S− was a colored light transilluminating the response key, initially for very brief time periods. Over successive sessions, both the intensity and the duration of the S− were increased gradually as a function of the pigeon’s behavior in the presence of the S−. This procedure yielded few responses to the S− throughout training and during the steady-state S+–S− discriminative performance. Terrace (1966) suggested that the S− functioned differently when established with, as opposed to without, errors; however, it was unclear whether the fading procedure or the absence of responses to the S− was responsible for these differences. Subsequent research qualified some of Terrace’s suggestions (e.g., Rilling, 1977).

Stimulus Generalization Stimulus generalization refers to changes or gradations in responding as a function of changes or gradations in the stimulus with which the reinforcement or punishment contingency originally was correlated. In a typical procedure, a discrimination is established (generalization gradients are more reliable when a discriminative training procedure is used) between S+ (e.g., a horizontal line projected on a response key) and S− (e.g., a vertical line). In a

test of stimulus generalization, typically conducted in the absence of reinforcement, lines differing in degree of tilt are presented in mixed order, and responding to each is recorded. The result is a gradient, with responding relatively high in the presence of stimuli most like the S+ in training and lowest in the presence of stimuli most like the S−. The shape of the gradient indexes stimulus generalization. A flat gradient suggests all stimuli are responded to similarly, that is, significant generalization or minimal discrimination. A steep gradient (that drops sharply between the S+ and the next-most-similar stimuli, e.g.) indicates that the stimuli differentially control responding, that is, significant discrimination or minimal generalization. The peak, that is, the highest point, of the gradient, is often not at the original training stimulus but rather shifted to the next stimulus in the direction opposite the S−. This peak shift, as it is labeled, has been suggested to reflect aversive properties of the S− in that it did not occur when discriminations were trained without errors (Terrace, 1966). Stimulus generalization gradients also can be observed around the S−. Their assessment poses a difficulty if the S+ and S− are on the same continuum: The gradients around both the S+ and the S− are confounded by the fact that movement away from the S− constitutes movement toward the S+ and vice versa. The solution is to use as the S+ and S− orthogonal stimuli, that is, stimuli that are on different stimulus dimensions, for example, color and line tilt. This way, changes away from, for example, the line tilt correlated with S−, are not changes toward the key color correlated with the S+. Inhibitory generalization gradients typically are V shaped, with the lowest responding in the presence of the S− and increasing with increasing disparity between the test stimulus and the S−. These gradients, sometimes labeled inhibitory generalization gradients, have been interpreted to indicate that extinction, or nonreinforcement, involves the learning of other behavior rather than simply eliminating nonreinforced responding (Hearst, Besley, & Farthing, 1970).

Conditional Stimulus Control The three-term contingency discussed in the preceding sections can itself be brought under stimulus 53

Kennon A. Lattal

control, giving rise to a four-term contingency. Here, the stimuli defining the three-term contingency are conditional on another, superordinate set of stimuli, defining the fourth term. Conditional stimulus control has been studied widely using a procedure sometimes called matching to sample (see Sidman & Tailby, 1982). The procedure consists of three-element trials separated from one another by an ITI. In a typical arrangement, a pigeon is confronted by a three–response-key array. In the presence of a preparatory stimulus, a response turns on a sample stimulus, say a red or green key light, each with a probability of 0.5 on a given trial. A response to the transilluminated key turns on the two side stimuli (comparison component), one red and the other green. A peck to the key colored the same as the sample stimulus results in food access for 3 s, whereas a peck to the other key terminates the trial. After an intertrial interval, the cycle repeats. Thus, the red and green lights in the comparison component can be either an S+ or an S− conditional on the stimulus in the sample component. The percentage of choices of colors corresponding to the sample increases with exposure, reaching an asymptote near 100% correct. Variations on the procedure include (a) turning off the sample light during the choice component (zero-delay matching), (b) using sample and choice stimuli that differ in dimension (symbolic matching to sample), (c) using topographically different responses to the different sample stimuli (e.g., a key peck to one and a treadle press to the other), (d) using qualitatively different reinforcers for correct responses to either of the stimuli (differential outcomes procedure), and (e) imposing delays between the response to the sample and onset of the choice component (delayed matching to sample; see MacKay, 1991, for a review). There are myriad possibilities for sample stimuli. Everything from simple colors to astonishingly complex visual arrays has been used to establish conditional stimulus control of responding. A particularly fruitful area of research involving conditional stimulus control is that of delayed matching to sample (see Chapter 18, this volume). Generally speaking, choice accuracy declines as delays increase. Indifference between the choices is reached with pigeons at around 30-s delays. The appropriate description 54

of these gradients is a matter of interpretation. Those who are more cognitively oriented consider the gradients to reflect changes in memory, whereas those favoring a behavior-analytic interpretation generally describe them using action terms such as remembering or forgetting. The conditional discrimination procedure involving both delayed presentations of choice components and the use of complex visual arrays as samples has given rise in part to the study of animal cognition, which has in turn led to often unfounded, and sometimes inexplicable, speculations about cognitive mechanisms underlying conditional stimulus control (and other behavioral phenomena) in nonhuman animals. The conceptual issues related to the interpretation of behavioral processes involving stimulus control in terms of memory or other cognitive mechanisms is beyond the scope of this chapter. Branch (1977), Watkins (1990), and many others have offered perspectives on these issues that are consistent with TEAB.

Temporal Discriminative Stimulus Control of Responding and Timing Every schedule of reinforcement, positive or negative, involves time. It is a direct variable in interval schedules and an indirect one in ratio schedules. Early research on FI schedules suggested that the passage of time functioned as an S+ (e.g., Dews, 1970). The discriminative properties of time were also borne out in conditional discrimination experiments in which the reinforced response was conditional on the passage of one time interval versus another (e.g., Stubbs, 1968) and in experiments involving the peak interval procedure, in which occasional reinforcers are deleted from a series of FIs to reveal where responding peaks before waning. Research on temporal control in turn has given rise to different quantitative theories of timing, notably scalar expectancy theory (Gibbon, 1977) and the behavioral theory of timing (Killeen & Fetterman, 1988). Both theories integrate significant amounts of data generated using the aforementioned procedures, and both have had considerable heuristic value. The behavioral theory of timing focuses more directly on environmental and behavioral events in accounting for the discriminative properties of time.

The Five Pillars of the Experimental Analysis of Behavior

Concept Learning One definition of a concept is in terms of stimulus control. A concept may be said to exist when a similar response is controlled by common elements of otherwise dissimilar stimuli. Human concepts often are verbal in nature, for example, abstract configurations of stimuli that evoke words such as love, esoteric, liberating, and so forth. Despite their complexity, concepts are considered in TEAB to be on a continuum with other types of stimulus control of behavior. The classic demonstration of stimulus control of responding by an abstract stimulus was that of Herrnstein, Loveland, and Cable (1976). An S+–S− discrimination was established using a multiple schedule in which responses in one component were reinforced according to a VI 30-s schedule and extinguished in the other. The S+ in each of three experiments was, respectively, one of a variety of slides (more than 1,500 different ones—half positive and half negative in terms of containing the concept under study—were used in each of the three experiments) that were pictures of trees, water, or a particular person. In each experiment, the S− was the absence of these features in an otherwise parallel set of slides. Response rates were higher in the presence of the concept under investigation than in its absence. The basic results of Herrnstein et al. have been replicated systematically many times, using a variety of types of visual stimuli (e.g., Wasserman, Young, & Peissig, 2002). The general topic of concepts and concept learning as instances of stimulus control has been approached in a different, but equally fruitful way by Sidman (e.g., 1986; Sidman & Tailby, 1982). Consider three groups of unrelated stimuli, A, B, and C, presented on a computer screen. Different patterns make up A; different shapes, B; and nonsense syllables, C. The question posed by Sidman and Tailby (1982) was how these structurally different groups of stimuli might all come to control similar responses to them, that is, become equivalent to one another—to function as a stimulus controlling the same response; that is, as a concept. Sidman and Tailby (1982) turned to mathematics for a definition of equivalence and to the conditional discrimination procedure (outlined earlier) for its analysis. An equivalence relation in mathematics

requires a demonstration of three properties: reflexivity, symmetry, and transitivity. Reflexivity is established by showing, in the absence of reinforcement, generalized identity matching (i.e., selecting the comparison stimulus that is identical to the sample). In normally developing humans, the tests for symmetry and transitivity often are combined. One such test consists of teaching the relation between A and B and that between A and C, using the conditional discrimination procedure described previously (e.g., given Sample A, select B from among the available comparison stimuli). In subsequent no-feedback test trials, if C is selected after a B sample and B is selected after a C sample, then these emergent (untrained) transitive relations require that the trained A–B and A–C relations be symmetric (B–A and C–A, respectively). Stimulus equivalence suggests a mechanism whereby new stimulus relations can develop in the absence of direct reinforcement. Sidman (1986) suggested that these emergent relations could address criticisms of the inflexibility of a behavior-analytic approach that relies on direct reinforcement to establish new responses. In addition, if different sets of equivalence relations are themselves equated through training a connecting relation (Sidman, Kirk, & Wilson-Morris, 1985), then the number of equivalence relations established without training increases exponentially. As Sidman observed, both of these outcomes of stimulus equivalence are important advancements in accounting for the acquisition of verbal behavior within a behavioranalytic framework. By expanding the analysis of stimulus equivalence to a five-term contingency, Sidman also attempted to account for meaning in context (see Volume 2, Chapters 1, 6, and 18, this handbook).

Rules and Instructions An important source of discriminative stimulus control of human behavior is verbal behavior (Skinner, 1957). The analysis of this discriminative function of verbal behavior has most frequently taken the form in TEAB of an analysis of how rules and instructions (the two terms are used interchangeably here, but see also Catania [1998] for suggestions concerning the rules for describing such control of 55

Kennon A. Lattal

behavior) act in concert with contingencies to control human behavior. The control by instructions or rules is widely, but not universally, considered a type of discriminative control over responding (e.g., Blakely & Schlinger, 1987). In human interactions, instructions may be spoken, written, or both. The effectiveness of instructions in controlling behavior varies in part as a function of their specificity and their congruency with the contingencies to which they refer. Galizio (1979), for example, elegantly showed how congruent instructions complement contingencies and incongruent ones can lead to ignoring the instruction. Incongruence does not universally have this effect, however. In some experiments, inaccurate instructions have been found to exert control over responding at the expense of the actual contingencies in effect. Galizio’s (1979) results, however, may be more a qualification than the rule in explicating the role of instructions in controlling human behavior. In many situations in which explicit losses are not incurred for following rules, humans often behave in stereotyped ways that suggest they are following either an instruction or their interpretation of an instruction. This outcome is not surprising given the long extraexperimental history of reinforced rule following. Even in the absence of explicit rules, some observers have postulated that humans construct their own rules. Such an analysis, however, is a quagmire— once private events such as self-generated rules are postulated to control responding in some situations, it becomes difficult to exclude their possible role in every situation. Nonetheless, in one study of how self-generated rules might control behavior, Catania, Matthews, and Shimoff (1982) had college students respond on two buttons; one reinforced high rate responding, and the other reinforced lower rate responding. The task was interrupted from time to time, and the students were asked to guess (by completing a series of structured sentences) what the schedule requirements were on the buttons. Stating the correct rule was shaped by reinforcing approximations to the correct description with points. Of interest was the relation between the shaped rule and responding in accord with the schedule in effect on either button. In general, the shaped rules 56

functioned as a discriminative stimulus controlling responding under the two schedules. The Five Pillars Redux Methods, reinforcement, punishment, control by stimuli correlated with reinforcers and punishers, and contextual and stimulus control—these are the five pillars of TEAB, the foundation on which the analyses and findings described in other chapters of this handbook are constructed. A review of these pillars balanced several factors with one another. The first was differing views as to what is fundamental. As with any science, inconsistencies in findings and differences in interpretation are commonplace. They are, however, the fodder for further growth in the science. The second was depth versus breadth of the topics. The relative space devoted to the four pillars representing empirical findings in TEAB reflects, more or less, the relative research activity making up each of those pillars. Each pillar is, of course, deeper than can be developed within the space constraints assigned. Important material was truncated to attain the breadth of coverage expected of an overview. The third was classic and contemporary research. Both have a role in defining foundations; the former lay the groundwork for contemporary developments, which in turn herald the future of TEAB. Finally, the metaphorical nature of the five pillars needs to be taken a step further, for these are not pillars of stone. Rather, the material of these pillars is organic, subject to the same contingencies that they seek to describe. Research areas and problems come and go for a host of reasons. They are subject to the vicissitudes of life: Researchers come into their own, move, change, retire, or die (physically or metaphorically; sometimes both, but sometimes at different times); agency funding and university priorities change. Dead ends are reached. Marvelous discoveries captivate entire generations of scientists, or maybe just one scientist. Changes in TEAB will change both the content and, over time, perhaps the very pillars themselves. Indeed, it is highly likely that the research described in this handbook eventually will rewrite this chapter. As TEAB and the pillars that support it continue to

The Five Pillars of the Experimental Analysis of Behavior

evolve, TEAB will contribute even more to an understanding of the behavior of organisms.

References Allyon, T., & Azrin, N. H. (1968). The token economy. New York, NY: Appleton-Century-Crofts. Arbuckle, J. L., & Lattal, K. A. (1987). A role for negative reinforcement of response omission in punishment? Journal of the Experimental Analysis of Behavior, 48, 407–416. doi:10.1901/jeab.1987.48-407 Azrin, N. H. (1956). Some effects of two intermittent schedules of immediate and non-immediate punishment. Journal of Psychology, 42, 3–21. doi:10.1080/00 223980.1956.9713020 Azrin, N. H. (1961). Time-out from positive reinforcement. Science, 133, 382–383. doi:10.1126/science. 133.3450.382 Azrin, N. H., & Hake, D. F. (1969). Positive conditioned suppression: Conditioned suppression using positive reinforcers as the unconditioned stimuli. Journal of the Experimental Analysis of Behavior, 12, 167–173. doi:10.1901/jeab.1969.12-167 Azrin, N. H., Hake, D. F., Holz, W. C., & Hutchinson, R. R. (1965). Motivational aspects of escape from punishment. Journal of the Experimental Analysis of Behavior, 8, 31–44. doi:10.1901/jeab.1965.8-31 Azrin, N. H., & Holz, W. C. (1966). Punishment. In W. K. Honig (Ed.), Operantbehavior: Areas of research and application (pp. 380–447). New York, NY: Appleton-Century-Crofts. Azrin, N. H., Holz, W. C., & Hake, D. F. (1963). Fixedratio punishment. Journal of the Experimental Analysis of Behavior, 6, 141–148. doi:10.1901/ jeab.1963.6-141 Azrin, N. H., Holz, W. C., Hake, D. F., & Allyon, T. (1963). Fixed-ratio escape reinforcement. Journal of the Experimental Analysis of Behavior, 6, 449–456. doi:10.1901/jeab.1963.6-449 Azrin, N. H., Hutchinson, R. R., & Hake, D. F. (1966). Extinction-induced aggression. Journal of the Experimental Analysis of Behavior, 9, 191–204. doi:10.1901/jeab.1966.9-191 Bachrach, A. (1960). Psychological research: An introduction. New York, NY: Random House. Baer, D. M. (1981). The imposition of structure on behavior and the demolition of behavioral structures. In D. J. Bernstein (Ed.), Response structure and organization (pp. 217–254). Lincoln: University of Nebraska Press. Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91–97. doi:10.1901/jaba.1968.1-91

Baron, A. (1991). Avoidance and punishment. In I. Iversen & K. A. Lattal (Eds.), Experimental analysis of behavior: Part1 (pp. 173–217). Amsterdam, the Netherlands: Elsevier. Baron, A. (1999). Statistical inference in behavior analysis: Friend or foe? Behavior Analyst, 22, 83–85. Baron, A., & Galizio, M. (2005). Positive and negative reinforcement: Should the distinction be preserved? Behavior Analyst, 28, 85–98. Baum, W. M. (1973). The correlation-based law of effect. Journal of the Experimental Analysis of Behavior, 20, 137–153. doi:10.1901/jeab.1973.20-137 Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242. doi:10.1901/jeab.1974.22-231 Baum, W. M. (1989). Quantitative description and molar description of the environment. Behavior Analyst, 12, 167–176. Blackman, D. (1968). Conditioned suppression or facilitation as a function of the behavioral baseline. Journal of the Experimental Analysis of Behavior, 11, 53–61. doi:10.1901/jeab.1968.11-53 Blackman, D. E. (1977). Conditioned suppression and the effects of classical conditioning on operant behavior. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 340–363). New York, NY: Prentice Hall. Blakely, E., & Schlinger, H. (1987). Rules: Functionaltering contingency-specifying stimuli. Behavior Analyst, 10, 183–187. Blakely, E., & Schlinger, H. (1988). Determinants of pausing under variable-ratio schedules: Reinforcer magnitude, ratio size, and schedule configuration. Journal of the Experimental Analysis of Behavior, 50, 65–73. doi:10.1901/jeab.1988.50-65 Bonem, M., & Crossman, E. K. (1988). Elucidating the effects of reinforcer magnitude. Psychological Bulletin, 104, 348–362. doi:10.1037/00332909.104.3.348 Boren, J. J., & Devine, D. D. (1968). The repeated acquisition of behavioral chains. Journal of the Experimental Analysis of Behavior, 11, 651–660. doi:10.1901/ jeab.1968.11-651 Branch, M. N. (1977). On the role of “memory” in behavior analysis. Journal of the Experimental Analysis of Behavior, 28, 171–179. doi:10.1901/jeab.1977.28-171 Breland, K., & Breland, M. (1961). The misbehavior of organisms. American Psychologist, 16, 681–684. doi:10.1037/h0040090 Brinker, R. P., & Treadway, J. T. (1975). Preference and discrimination between response-dependent and response-independent schedules of reinforcement. 57

Kennon A. Lattal

Journal of the Experimental Analysis of Behavior, 24, 73–77. doi:10.1901/jeab.1975.24-73 Brown, P. L., & Jenkins, H. M. (1968). Autoshaping the pigeon’s key-peck. Journal of the Experimental Analysis of Behavior, 11, 1–8. doi:10.1901/ jeab.1968.11-1 Bruzek, J. L., Thompson, R. H., & Peters, L. C. (2009). Resurgence of infant caregiving responses. Journal of the Experimental Analysis of Behavior, 92, 327–343. doi:10.1901/jeab.2009-92-327 Catania, A. C. (1963). Concurrent performances: A baseline for the study of reinforcement magnitude. Journal of the Experimental Analysis of Behavior, 6, 299–300. doi:10.1901/jeab.1963.6-299 Catania, A. C. (1998). The taxonomy of verbal behavior. In K. A. Lattal & M. Perone (Eds.), Handbook of research methods in human operant behavior (pp. 405–433). New York, NY: Plenum Press. Catania, A. C., Matthews, B. A., & Shimoff, E. (1982). Instructed versus shaped human verbal behavior: Interactions with nonverbal responding. Journal of the Experimental Analysis of Behavior, 38, 233–248. doi:10.1901/jeab.1982.38-233 Catania, A. C., & Reynolds, G. S. (1968). A quantitative analysis of responding maintained by interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 11, 327–383. doi:10.1901/ jeab.1968.11-s327 Church, R. M., & Raymond, G. A. (1967). Influence of the schedule of positive reinforcement on punished behavior. Journal of Comparative and Physiological Psychology, 63, 329–332. doi:10.1037/h0024382 Critchfield, T. S., Paletz, E. M., MacAleese, K. R., & Newland, M. C. (2003). Punishment in human choice: Direct or competitive suppression? Journal of the Experimental Analysis of Behavior, 80, 1–27. doi:10.1901/jeab.2003.80-1 Das Graças de Souza, D. D., de Moraes, A. B. A., & Todorov, J. C. (1984). Shock intensity and signaled avoidance responding. Journal of the Experimental Analysis of Behavior, 42, 67–74. doi:10.1901/ jeab.1984.42-67 Davison, M. (1999). Statistical inference in behavior analysis: Having my cake and eating it. Behavior Analyst, 22, 99–103. Davison, M., & Baum, W. M. (2006). Do conditional reinforcers count? Journal of the Experimental Analysis of Behavior, 86, 269–283. doi:10.1901/jeab.2006.56-05 Davison, M. C., & Tustin, R. D. (1978). The relation between the generalized matching law and signal detection theory. Journal of the Experimental Analysis of Behavior, 29, 331–336. doi:10.1901/jeab.1978.29-331 DeGrandpre, R. J., Bickel, W. K., Hughes, J. R., Layng, M. P., & Badger, G. (1993). Unit price as a useful metric in 58

analyzing effects of reinforcer magnitude. Journal of the Experimental Analysis of Behavior, 60, 641–666. doi:10.1901/jeab.1993.60-641 de Lorge, J. (1971). The effects of brief stimuli presented under a multiple schedule of second-order schedules. Journal of the Experimental Analysis of Behavior, 15, 19–25. doi:10.1901/jeab.1971.15-19 Deluty, M. Z. (1976). Choice and the rate of punishment in concurrent schedules. Journal of the Experimental Analysis of Behavior, 25, 75–80. doi:10.1901/ jeab.1976.25-75 Dews, P. B. (1970). The theory of fixed-interval responding. In W. N. Schoenfeld (Ed.), The theory of reinforcement schedules (pp. 43–61). New York, NY: Appleton-Century-Crofts. Dinsmoor, J. A. (1962). Variable-interval escape from stimuli accompanied by shocks. Journal of the Experimental Analysis of Behavior, 5, 41–47. doi:10.1901/jeab. 1962.5-41 Dinsmoor, J. A. (1983). Observing and conditioned reinforcement. Behavioral and Brain Sciences, 6, 693–728. doi:10.1017/S0140525X00017969 Donahoe, J. W. (2003). Selectionism. In K. A. Lattal & P. N. Chase (Eds.), Behavior theory and philosophy (pp. 103–128). New York, NY: Kluwer Academic. Eckerman, D. A., Hienz, R. D., Stern, S., & Kowlowitz, V. (1980). Shaping the location of a pigeon’s peck: Effect of rate and size of shaping steps. Journal of the Experimental Analysis of Behavior, 33, 299–310. doi:10.1901/jeab.1980.33-299 Escobar, R., & Bruner, C. A. (2009). Observing responses and serial stimuli: Searching for the reinforcing properties of the S−. Journal of the Experimental Analysis of Behavior, 92, 215–231. doi:10.1901/jeab.2009.92-215 Estes, W. K., & Skinner, B. F. (1941). Some quantitative properties of anxiety. Journal of Experimental Psychology, 29, 390–400. doi:10.1037/h0062283 Fantino, E. (1969). Conditioned reinforcement, choice, and the psychological distance to reward. In D. P. Hendry (Ed.), Conditioned reinforcement (pp. 163–191). Homewood, IL: Dorsey Press. Fantino, E. (1977). Conditioned reinforcement: Choice and information. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 313–339). New York, NY: Prentice Hall. Fantino, E. (1991). Behavioral ecology. In I. Iversen & K. A. Lattal (Eds.), Experimental analysis of behavior: Part 2 (pp. 117–153). Amsterdam, the Netherlands: Elsevier. Fantino, E., & Silberberg, A. (2010). Revisiting the role of bad news in maintaining human observing behavior. Journal of the Experimental Analysis of Behavior, 93, 157–170. doi:10.1901/jeab.2010.93-157

The Five Pillars of the Experimental Analysis of Behavior

Farley, J. (1980). Reinforcement and punishment effects in concurrent schedules: A test of two models. Journal of the Experimental Analysis of Behavior, 33, 311–326. doi:10.1901/jeab.1980.33-311

Herrnstein, R. J., & Hineline, P. N. (1966). Negative reinforcement as shock-frequency reduction. Journal of the Experimental Analysis of Behavior, 9, 421–430. doi:10.1901/jeab.1966.9-421

Felton, M., & Lyon, D. O. (1966). The post-reinforcement pause. Journal of the Experimental Analysis of Behavior, 9, 131–134. doi:10.1901/jeab.1966.9-131

Herrnstein, R. J., Loveland, D. H., & Cable, C. (1976). Natural concepts in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 2, 285–302. doi:10.1037/0097-7403.2.4.285

Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. New York, NY: Appleton-Century-Crofts. doi:10.1037/10627-000 Freeman, T. J., & Lattal, K. A. (1992). Stimulus control of behavioral history. Journal of the Experimental Analysis of Behavior, 57, 5–15. doi:10.1901/jeab. 1992.57-5 Galizio, M. (1979). Contingency-shaped and rulegoverned behavior: Instructional control of human loss avoidance. Journal of the Experimental Analysis of Behavior, 31, 53–70. doi:10.1901/jeab.1979.31-53 Gibbon, J. (1977). Scalar expectancy theory and Weber’s law in animal timing. Psychological Review, 84, 279–325. doi:10.1037/0033-295X.84.3.279 Gibson, D. A. (1966). A quick and simple method for magazine training the pigeon. Perceptual and Motor Skills, 23, 1230. doi:10.2466/pms.1966.23.3f.1230 Gibson, D. A. (1968). Conditioned punishment by stimuli signalling time out from positive reinforcement. Unpublished doctoral dissertation, University of Alabama, Tuscaloosa. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York, NY: Wiley. Green, L., & Freed, D. E. (1998). Behavioral economics. In W. O’Donohue (Ed.), Learning and behavior therapy (pp. 274–300). Needham Heights, MA: Allyn & Bacon. Hake, D. F., & Azrin, N. H. (1965). Conditioned punishment. Journal of the Experimental Analysis of Behavior, 8, 279–293. doi:10.1901/jeab.1965.8-279 Hall, G. A., & Lattal, K. A. (1990). Variable-interval schedule performance under open and closed economies. Journal of the Experimental Analysis of Behavior, 54, 13–22. doi:10.1901/jeab.1990.54-13 Hearst, E., Besley, S., & Farthing, G. W. (1970). Inhibition and the stimulus control of operant behavior. Journal of the Experimental Analysis of Behavior, 14, 373–409. doi:10.1901/jeab.1970.14-s373 Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267–272. doi:10.1901/jeab.1961.4-267

Hoffman, H. S. (1966). The analysis of discriminated avoidance. In W. K. Honig (Ed.), Operant behavior: Areas of research and application (pp. 499–530). New York, NY: Appleton-Century-Crofts. Holz, W. C. (1968). Punishment and rate of positive reinforcement. Journal of the Experimental Analysis of Behavior, 11, 285–292. doi:10.1901/jeab.1968.11-285 Holz, W. C., & Azrin, N. H. (1961). Discriminative properties of punishment. Journal of the Experimental Analysis of Behavior, 4, 225–232. doi:10.1901/ jeab.1961.4-225 Hoyert, M. S. (1992). Order and chaos in fixed-interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 57, 339–363. doi:10.1901/ jeab.1992.57-339 Hull, C. L. (1943). Principles of psychology. New York, NY: Appleton-Century-Crofts. Hursh, S. R. (1980). Economic concepts for the analysis of behavior. Journal of the Experimental Analysis of Behavior, 34, 219–238. doi:10.1901/jeab.1980.34-219 Kaufman, A., & Baron, A. (1968). Suppression of behavior by timeout punishment when suppression results in loss of positive reinforcement. Journal of the Experimental Analysis of Behavior, 11, 595–607. doi:10.1901/jeab.1968.11-595 Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. New York, NY: Oxford University Press. Kelleher, R. T. (1966). Chaining and conditioned reinforcement. In W. K. Honig (Ed.), Operant behavior: Areas of research and application (pp. 160–212). New York, NY: Appleton-Century-Crofts. Kelleher, R. T., & Gollub, L. R. (1962). A review of conditioned reinforcement. Journal of the Experimental Analysis of Behavior, 5, 543–597. doi:10.1901/ jeab.1962.5-s543 Keller, F. S., & Schoenfeld, W. N. (1950). Principles of psychology. New York, NY: Appleton-Century-Crofts.

Herrnstein, R. J. (1969). Method and theory in the study of avoidance. Psychological Review, 76, 49–69.

Kelly, D. D. (1973). Suppression of random-ratio and acceleration of temporally spaced responding by the same prereward stimulus in monkeys. Journal of the Experimental Analysis of Behavior, 20, 363–373. doi:10.1901/jeab.1973.20-363

Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243–266. doi:10.1901/jeab.1970.13-243

Killeen, P. R., & Fetterman, J. G. (1988). A behavioral theory of timing. Psychological Review, 95, 274–295. doi:10.1037/0033-295X.95.2.274 59

Kennon A. Lattal

Kupfer, A. S., Allen, R., & Malagodi, E. F. (2008). Induced attack during fixed-ratio and matchedtime schedules of food presentation. Journal of the Experimental Analysis of Behavior, 89, 31–48. doi:10.1901/jeab.2008.89-31

Leander, J. D. (1973). Shock intensity and duration interactions on free-operant avoidance behavior. Journal of the Experimental Analysis of Behavior, 19, 481–490. doi:10.1901/jeab.1973.19-481

Lattal, K. A. (1974). Combinations of responsereinforcer dependence and independence. Journal of the Experimental Analysis of Behavior, 22, 357–362. doi:10.1901/jeab.1974.22-357

Lieving, G. A., & Lattal, K. A. (2003). Recency, repeatability, and reinforcer retrenchment: An experimental analysis of resurgence. Journal of the Experimental Analysis of Behavior, 80, 217–233. doi:10.1901/ jeab.2003.80-217

Lattal, K. A. (1975). Reinforcement contingencies as discriminative stimuli. Journal of the Experimental Analysis of Behavior, 23, 241–246. doi:10.1901/ jeab.1975.23-241

Logue, A. W., & de Villiers, P. A. (1978). Matching in concurrent variable-interval avoidance schedules. Journal of the Experimental Analysis of Behavior, 29, 61–66. doi:10.1901/jeab.1978.29-61

Lattal, K. A. (1979). Reinforcement contingencies as discriminative stimuli: II. Effects of changes in stimulus probability. Journal of the Experimental Analysis of Behavior, 31, 51–22.

LoLordo, V. M. (1971). Facilitation of food-reinforced responding by a signal for response-independent food. Journal of the Experimental Analysis of Behavior, 15, 49–55. doi:10.1901/jeab.1971.15-49

Lattal, K. A. (1987). Considerations in the experimental analysis of reinforcement delay. In M. L. Commons, J. Mazur, J. A. Nevin, & H. Rachlin (Eds.), Quantitative studies of operant behavior: The effect of delay and of intervening events on reinforcement value (pp. 107–123). New York, NY: Erlbaum.

Lund, C. A. (1976). Effects of variations in the temporal distribution of reinforcements on interval schedule performance. Journal of the Experimental Analysis of Behavior, 26, 155–164. doi:10.1901/jeab.1976.26-155

Lattal, K. A. (1989). Contingencies on response rate and resistance to change. Learning and Motivation, 20, 191–203. doi:10.1016/0023-9690(89)90017-9 Lattal, K. A. (1991). Scheduling positive reinforcers. In I. H. Iversen & K. A. Lattal (Eds.), Experimental analysis of behavior: Part I (pp. 87–134). Amsterdam, the Netherlands: Elsevier. Lattal, K. A. (2010). Delayed reinforcement of operant behavior. Journal of the Experimental Analysis of Behavior, 93, 129–139. doi:10.1901/jeab.2010.93-129 Lattal, K. A., & Boyer, S. S. (1980). Alternative reinforcement effects on fixed-interval responding. Journal of the Experimental Analysis of Behavior, 34, 285–296. doi:10.1901/jeab.1980.34-285 Lattal, K. A., & Bryan, A. J. (1976). Effects of concurrent response-independent reinforcement on fixedinterval schedule performance. Journal of the Experimental Analysis of Behavior, 26, 495–504. doi:10.1901/jeab.1976.26-495 Lattal, K. A., Freeman, T. J., & Critchfield, T. (1989). Dependency location in interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 51, 101–117. doi:10.1901/jeab.1989.51-101 Lattal, K. A., & Griffin, M. A. (1972). Punishment contrast during free-operant avoidance. Journal of the Experimental Analysis of Behavior, 18, 509–516. doi:10.1901/jeab.1972.18-509 Lea, S. E. G. (1986). Foraging and reinforcement schedules in the pigeon: Optimal and non-optimal aspects of choice. Animal Behaviour, 34, 1759–1768. doi:10.1016/S0003-3472(86)80262-7 60

Mackay, H. A. (1991). Conditional stimulus control. In I. Iversen & K. A. Lattal (Eds.), Experimental analysis of behavior: Part 1 (pp. 301–350). Amsterdam, the Netherlands: Elsevier. Madden, G. J. (2000). A behavioral economics primer. In W. Bickel & R. K. Vuchinich (Eds.), Reframing health behavior change with behavioral economics (pp. 3–26). Mahwah, NJ: Erlbaum. Madden, G. J., & Bickel, W. K. (Eds.). (2010). Impulsivity: The behavioral and neurological science of discounting. Washington, DC: American Psychological Association. doi:10.1037/12069-000 Marr, M. J. (1992). Behavior dynamics: One perspective. Journal of the Experimental Analysis of Behavior, 57, 249–266. doi:10.1901/jeab.1992.57-249 Marr, M. J. (2006). Through the looking glass: Symmetry in behavioral principles? Behavior Analyst, 29, 125–128. Mazur, J. E. (1986). Choice between single and multiple delayed reinforcers. Journal of the Experimental Analysis of Behavior, 46, 67–77. doi:10.1901/ jeab.1986.46-67 McDowell, J. J. (1986). On the falsifiability of matching theory. Journal of the Experimental Analysis of Behavior, 45, 63–74. doi:10.1901/jeab.1986.45-63 McDowell, J. J. (1989). Two modern developments in matching theory. Behavior Analyst, 12, 153–166. McKearney, J. W. (1972). Maintenance and suppression of responding under schedules of electric shock presentation. Journal of the Experimental Analysis of Behavior, 17, 425–432. doi:10.1901/jeab.1972.17-425 Michael, J. (1974). Statistical inference for individual organism research: Mixed blessing or curse? Journal

The Five Pillars of the Experimental Analysis of Behavior

of Applied Behavior Analysis, 7, 647–653. doi:10.1901/ jaba.1974.7-647 Michael, J. (1975). Positive and negative reinforcement: A distinction that is no longer necessary; or a better way to talk about bad things. Behaviorism, 3, 33–44. Michael, J. (1982). Distinguishing between discriminative and motivational functions of stimuli. Journal of the Experimental Analysis of Behavior, 37, 149–155. doi:10.1901/jeab.1982.37-149 Moore, J., & Fantino, E. (1975). Choice and response contingencies. Journal of the Experimental Analysis of Behavior, 23, 339–347. doi:10.1901/jeab.1975. 23-339 Morse, W. H., & Kelleher, R. T. (1977). Determinants of reinforcement and punishment. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 98–124). New York, NY: Prentice Hall. Neuringer, A. J. (1970). Superstitious key pecking after three peck-produced reinforcements. Journal of the Experimental Analysis of Behavior, 13, 127–134. doi:10.1901/jeab.1970.13-127 Neuringer, A. (2002). Operant variability: Evidence, functions, and theory. Psychonomic Bulletin and Review, 9, 672–705. doi:10.3758/BF03196324 Nevin, J. A. (1969). Signal detection theory and operant behavior: A review of David M. Green and John A. Swets’Signal detection theory and psychophysics. Journal of the Experimental Analysis of Behavior, 12, 475–480. doi:10.1901/jeab.1969.12-475 Nevin, J. A. (1974). Response strength in multiple schedules. Journal of the Experimental Analysis of Behavior, 21, 389–408. doi:10.1901/jeab.1974.21-389 Nevin, J. A., Mandell, C., & Atak, J. R. (1983). The analysis of behavioral momentum. Journal of the Experimental Analysis of Behavior, 39, 49–59. doi:10.1901/jeab.1983.39-49 Nevin, J. A., Tota, M. E., Torquato, R. D., & Shull, R. L. (1990). Alternative reinforcement increases resistance to change: Pavlovian or operant contingencies? Journal of the Experimental Analysis of Behavior, 53, 359–379. doi:10.1901/jeab.1990.53-359 Ono, K. (2004). Effects of experience on preference between forced and free choice. Journal of the Experimental Analysis of Behavior, 81, 27–37. doi:10.1901/jeab.2004.81-27 Palya, W. L. (1992). Dynamics in the fine structure of schedule-controlled behavior. Journal of the Experimental Analysis of Behavior, 57, 267–287. doi:10.1901/jeab.1992.57-267 Pear, J. J., & Legris, J. A. (1987). Shaping by automated tracking of an arbitrary operant response. Journal of the Experimental Analysis of Behavior, 47, 241–247. doi:10.1901/jeab.1987.47-241

Peele, D. B., Casey, J., & Silberberg, A. (1984). Primacy of interresponse time reinforcement in accounting for rate differences under variable-ratio and variable-interval schedules. Journal of Experimental Psychology: Animal Behavior Processes, 10, 149–167. doi:10.1037/0097-7403.10.2.149 Perone, M., & Baron, A. (1980). Reinforcement of human observing behavior by a stimulus correlated with extinction or increased effort. Journal of the Experimental Analysis of Behavior, 34, 239–261. doi:10.1901/jeab.1980.34-239 Perone, M., & Galizio, M. (1987). Variable-interval schedules of time out from avoidance. Journal of the Experimental Analysis of Behavior, 47, 97–113. doi:10.1901/jeab.1987.47-97 Pietras, C. J., & Hackenberg, T. D. (2005). Responsecost punishment via token loss with pigeons. Behavioural Processes, 69, 343–356. doi:10.1016/j. beproc.2005.02.026 Platt, J. R. (1973). Percentile reinforcement: Paradigms for experimental analysis of response shaping. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in theory and research (Vol. 7, pp. 271–296). New York, NY: Academic Press. Premack, D. (1959). Toward empirical behavior laws: 1. Positive reinforcement. Psychological Review, 66, 219–233. doi:10.1037/h0040891 Rachlin, H., & Green, L. (1972). Commitment, choice and self-control. Journal of the Experimental Analysis of Behavior, 17, 15–22. doi:10.1901/jeab.1972.17-15 Rasmussen, E. B, & Newlin, C. (2008). Asymmetry of reinforcement and punishment in human choice. Journal of the Experimental Analysis of Behavior, 89, 157–167. doi:10.1901/jeab.2008.89-157 Ray, B. A. (1969). Selective attention: The effects of combining stimuli which control incompatible behavior. Journal of the Experimental Analysis of Behavior, 12, 539–550. doi:10.1901/jeab.1969.12-539 Rescorla, R. A., & Skucy, J. C. (1969). Effect of responseindependent reinforcers during extinction. Journal of Comparative and Physiological Psychology, 67, 381–389. doi:10.1037/h0026793 Richards, J. B., Mitchell, S. H., de Wit, H., & Seiden, L. S. (1997). Determination of discount functions in rats with an adjusting-amount procedure. Journal of the Experimental Analysis of Behavior, 67, 353–366. doi:10.1901/jeab.1997.67-353 Rilling, M. (1977). Stimulus control and inhibitory processes. In W. K. Honing & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 432–480). Englewood Cliffs, NJ: Prentice-Hall. Schuster, R., & Rachlin, H. (1968). Indifference between punishment and free shock: Evidence for the negative 61

Kennon A. Lattal

law of effect. Journal of the Experimental Analysis of Behavior, 11, 777–786. doi:10.1901/jeab.1968.11-777 Shnidman, S. R. (1968). Extinction of Sidman avoidance behavior. Journal of the Experimental Analysis of Behavior, 11, 153–156. doi:10.1901/jeab.1968.11-153 Sidman, M. (1953). Two temporal parameters of the maintenance of avoidance behavior by the white rat. Journal of Comparative and Physiological Psychology, 46, 253–261. doi:10.1037/h0060730 Sidman, M. (1960). Tactics of scientific research. New York, NY: Basic Books. Sidman, M. (1966). Avoidance behavior. In W. Honig (Ed.), Operant behavior: Areas of research and application (pp. 448–498). New York, NY: AppletonCentury-Crofts. Sidman, M. (1986). Functional analysis of emergent verbal classes. In T. Thompson & M. D. Zeiler (Eds.), Analysis and integration of behavioral units (pp. 213–245). Hillsdale, NJ: Erlbaum. Sidman, M., Kirk, B., & Wilson-Morris, M. (1985). Six-member stimulus classes generated by conditional-discrimination procedures. Journal of the Experimental Analysis of Behavior, 43, 21–42. doi:10.1901/jeab.1985.43-21 Sidman, M., & Tailby, W. (1982). Conditional discrimination vs. matching to sample: An expansion of the testing paradigm. Journal of the Experimental Analysis of Behavior, 37, 5–22. doi:10.1901/jeab.1982.37-5 Siegel, P. S., & Milby, J. B. (1969). Secondary reinforcement in relation to shock termination: Second chapter. Psychological Bulletin, 72, 146–156. doi:10.1037/ h0027781 Skinner, B. F. (1935). The generic nature of the concepts of stimulus and response. Journal of General Psychology, 12, 40–65. doi:10.1080/00221309.1935.9920087 Skinner, B. F. (1938). Behavior of organisms. New York, NY: Appleton-Century-Crofts. Skinner, B. F. (1948). “Superstition” in the pigeon. Journal of Experimental Psychology, 38, 168–172. doi:10.1037/h0055873 Skinner, B. F. (1953). Science and human behavior. New York, NY: Macmillan. Skinner, B. F. (1956). A case history in scientific method. American Psychologist, 11, 221–233. doi:10.1037/ h0047662 Skinner, B. F. (1957). Verbal behavior. New York, NY: Appleton-Century-Crofts. doi:10.1037/11256-000

Staddon, J. E. R. (1968). Spaced responding and choice: A preliminary analysis. Journal of the Experimental Analysis of Behavior, 11, 669–682. doi:10.1901/jeab.1968.11-669 Staddon, J. E. R. (1979). Operant behavior as adaptation to constraint. Journal of Experimental Psychology: General, 108, 48–67. doi:10.1037/0096-3445.108.1.48 Staddon, J. E. R., & Simmelhag, V. (1971). The “superstition” experiment: A re-examination of its implications for the principles of adaptive behavior. Psychological Review, 78, 3–43. doi:10.1037/h0030305 Stubbs, A. (1968). The discrimination of stimulus duration by pigeons. Journal of the Experimental Analysis of Behavior, 11, 223–238. doi:10.1901/jeab.1968.11-223 Terrace, H. S. (1963). Discrimination learning with and without “errors.” Journal of the Experimental Analysis of Behavior, 6, 1–27. doi:10.1901/jeab.1963.6-1 Terrace, H. S. (1966). Stimulus control. In W. Honig (Ed.), Operant behavior: Areas of research and application (pp. 271–344). New York, NY: AppletonCentury-Crofts. Thorndike, E. L. (1911). Animal intelligence. New York, NY: Macmillan. Timberlake, W., & Allison, J. (1974). Response deprivation: An empirical approach to instrumental performance. Psychological Review, 81, 146–164. doi:10.1037/h0036101 Timberlake, W., & Lucas, G. A. (1985). The basis of superstitious behavior: Chance contingency, stimulus substitution, or appetitive behavior? Journal of the Experimental Analysis of Behavior, 44, 279–299. doi:10.1901/jeab.1985.44-279 Uhl, C. N., & Garcia, E. E. (1969). Comparison of omission with extinction in response elimination in rats. Journal of Comparative and Physiological Psychology, 69, 554–562. doi:10.1037/h0028243 Verhave, T. (1962). The functional properties of a time out from an avoidance schedule. Journal of the Experimental Analysis of Behavior, 5, 391–422. doi:10.1901/jeab.1962.5-391 Wasserman, E. A., Young, M. E., & Peissig, J. J. (2002). Brief presentations are sufficient for pigeons to discriminate arrays of same and different stimuli. Journal of the Experimental Analysis of Behavior, 78, 365–373. doi:10.1901/jeab.2002.78-365 Watkins, M. J. (1990). Mediationism and the obfuscation of memory. American Psychologist, 45, 328–335. doi:10.1037/0003-066X.45.3.328

Skinner, B. F. (1966). What is the experimental analysis of behavior? Journal of the Experimental Analysis of Behavior, 9, 213–218. doi:10.1901/jeab.1966.9-213

Weiner, H. (1962). Some effects of response cost upon human operant behavior. Journal of the Experimental Analysis of Behavior, 5, 201–208. doi:10.1901/ jeab.1962.5-201

Skinner, B. F. (1981). Selection by consequences. Science, 213, 501–504. doi:10.1126/science.7244649

Weiner, H. (1969). Conditioning history and the control of human avoidance and escape responding. Journal

62

The Five Pillars of the Experimental Analysis of Behavior

of the Experimental Analysis of Behavior, 12, 1039–1043. doi:10.1901/jeab.1969.12-1039 Williams, B. A. (1983). Revising the principle of reinforcement. Behaviorism, 11, 63–88.

Zeiler, M. D. (1977a). Elimination of reinforced behavior: Intermittent schedules of not-responding. Journal of the Experimental Analysis of Behavior, 27, 23–32. doi:10.1901/jeab.1977.27-23

Williams, B. A. (1994). Conditioned reinforcement: Experimental and theoretical issues. Behavior Analyst, 17, 261–285.

Zeiler, M. D. (1977b). Schedules of reinforcement: The controlling variables. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 201–232). New York, NY: Prentice Hall.

Wyckoff, L. B. (1952). The role of observing responses in discrimination learning. Psychological Review, 59, 431–442. doi:10.1037/h0053932

Zeiler, M. D. (1984). Reinforcement schedules: The sleeping giant. Journal of the Experimental Analysis of Behavior, 42, 485–493. doi:10.1901/jeab.1984.42-485

Zeiler, M. D. (1968). Fixed and variable schedules of response-independent reinforcement. Journal of the Experimental Analysis of Behavior, 11, 405–414. doi:10.1901/jeab.1968.11-405

Zeiler, M. D. (1992). On immediate function. Journal of the Experimental Analysis of Behavior, 57, 417–427. doi:10.1901/jeab.1992.57-417

Zeiler, M. D. (1976). Positive reinforcement and the elimination of reinforced responses. Journal of the Experimental Analysis of Behavior, 26, 37–44. doi:10.1901/jeab.1976.26-37

Zimmerman, J. (1969). Meanwhile . . . back at the key: Maintenance of behavior by conditioned reinforcement and response-independent primary reinforcement. In D. P. Hendry (Ed.), Conditioned reinforcement (pp. 91–124). Homewood, IL: Dorsey Press.

63

Chapter 3

Translational Research in Behavior Analysis William V. Dube

As the 21st century began, the National Institutes of Health delineated a road map for accelerating biomedical research progress with increased attention to more quickly translating basic research into human studies and then into tests and treatments that improve clinical practice with direct benefits to patients (Zerhouni, 2003). A consensus definition of the term translational research adopted by several of the institutes and other organizations is “the process of applying ideas, insights, and discoveries generated through basic scientific inquiry to the treatment or prevention of human disease” (World Health Organization, 2004, p. 141). The National Institutes of Health recognized that a reengineering effort was needed to support the development of translational science. The National Institutes of Health established the Clinical and Translational Science Awards Consortium in October 2006 to assist institutions in creating integrated academic homes for multi- and interdisciplinary research teams to apply new knowledge and techniques to patient care. This consortium began with 12 academic health centers located throughout the nation, has increased to 55 centers as of this writing, and is expected to expand to approximately 60 institutions by 2012. Woolf (2008) noted that two definitions of translational research exist. The term most commonly refers to the bench-to-bedside activity of using the knowledge gained from basic biological sciences to

produce new drugs, devices, and treatments, which has been described as from bench to bedside, from Petri dish to people, from animal to human. This is what has been typically meant by translational research. Scientists work at the molecular, then cellular level to test “basic research,” then proceed to applications for animals, and on to humans. (upFRONT, 2006, p. 8) The end point for this first stage of translational research (often referred to as T1) is the production of a promising new treatment with clinical and commercial potential. The second stage of translational research (T2) addresses the gap between basic science and clinical medicine by improving access to treatment, systems of care, point-of-care decision support tools, and so forth. “The ‘laboratory’ for T2 research is the community and ambulatory care settings, where populationbased interventions . . . bring the results of T1 research to the public . . . [an] ‘implementation science’ of fielding and evaluating interventions in real world settings” (Woolf, 2008, p. 211). Westfall, Mold, and Fagnan (2007) noted that the design of implementation systems requires a different skill set than that of the typical practicing physician. For this reason, they proposed that the final stage of translation

Preparation of this chapter was supported in part by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, primarily Grants HD004147, HD046666, and HD055456. The contents of this chapter are solely the responsibility of the author and do not necessarily represent the official views of the National Institute of Child Health and Human Development. I acknowledge the truly impressive efforts of the authors of the chapters I discuss. It was a pleasure working with them. DOI: 10.1037/13937-003 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

65

William V. Dube

involves a distinct third step (T3) with a focus on solving the problems encountered by primary care physicians as they attempt to incorporate new discoveries into clinical practice. The distinction between T2 and T3 is that between translation to patients and translation to practice. To this end, Mold and Peterson (2005) described emerging “primary care practice-based research networks [that] are challenging traditional distinctions between research and quality improvement” (p. S12). To summarize the biomedical perspective, T1 is the translation from bench to bedside, or dish to human; T2, from bedside to community; and T3, from dissemination to practice. To most behavior analysts, this process sounds very familiar. From the beginning, modern behavior analysis has sought not only to treat behavior as the subject matter of a natural science, but also to “apply [the methods of science] to human affairs” (Skinner, 1953, p. 5). The goal of translational research in biomedical science is the treatment or prevention of human disease. From the perspective of behavior analysis, human disease encompasses a range of maladaptive behavior. Examples include educational failures; behavior with high risk for health problems such as substance abuse or unsafe sex; disruptive, aggressive, and self-injurious behavior in individuals with developmental disabilities; and unhappiness and loss of functioning in daily life related to depression, anxiety, or similar conditions. As behavior analysts look at human behavioral diseases, the Petri dish is the pigeon in the operant chamber and similar arrangements; the bedside includes the human operant laboratory, analogue environments of functional analysis (Iwata, Dorsey, Slifer, Bauman, & Richman, 1982/1994), and so forth; and practice settings abound for behavior analytically inspired treatments in schools, clinics, businesses, and virtually anywhere where significant human activity occurs. The two volumes of this handbook describe much of this basic, translational, and applied research. Translational Behavior Analysis Basic and applied behavior-analytic research have proceeded in parallel for many years, with varying degrees of cross-fertilization; for a recent summary of the basic–applied interface, see Mace and 66

Critchfield (2010). One question that arises when considering the basic–applied distinction is whether the two classifications provide sufficient description of activity in the field. Is a useful distinction able to be made between translational and applied behavior analysis? McIlvane et al. (2011) pointed out several dimensions along which the two areas differ. For example, the participants in applied research are selected because they have ongoing behavioral issues and will derive immediate-term benefit from participation, usually as the resolution of a clinical problem. In contrast, the participants in translational research are selected because they are representatives of clinical or social groups; they may receive some immediate benefit from participation, but the primary goal of the research is a longer term search for processes, principles, and procedures that apply beyond the individual case. Other distinctions include differences in typical research environments, publication outlets, and funding models. At the T1 stage of translational behavior analysis, an important goal is to provide support for applications by validating underlying principles. Every chapter in Volume 2, Part I, of this handbook includes relevant T1 research findings that document continuity in underlying behavioral processes along the continuum from the behavioral bench to bedside. Why is this important? One reason is that behavioral interventions do not always produce the expected results, and a complete understanding of relevant underlying processes may improve the clinician’s or teacher’s ability to make informed procedural adjustments. In terms that Overmier and Burke (1992) used to describe animal models for human diseases, a translational research pathway builds confidence that the relation between a set of basic research findings and a set of applied procedures is one of true material equivalence (i.e., homology of behavioral processes) rather than mere conceptual equivalence (behavioral similarity). I adapt one example from Volume 2, Chapter 7, this handbook, by Jacobs, Borrero, and Vollmer. Suppose that the results of a reinforcer preference test for a student receiving special education services showed that Reinforcer A was highly preferred to several others. Yet when Reinforcer A was provided for completing increasing numbers of arithmetic problems, it became

Translational Research in Behavior Analysis

ineffective at sustaining the behavior. A teacher might not even consider trying other reinforcers that were less preferred, based on the preference assessment results. An understanding of translational research in behavioral economics, however, might lead the teacher to question whether the problem was related to elasticity of demand—the degree of decrease in the value of a reinforcer as its price (the amount of work required to obtain it) increases (for additional information, see the discussion of essential value in Volume 2, Chapter 8, this handbook). The typical reinforcer assessment evaluates relative preference at very low behavioral prices; the required response is usually merely reaching or pointing. Translational research in behavioral economics may suggest a reevaluation of relative reinforcer preferences with response requirements that more closely match those of the arithmetic task. Reinforcers less preferred than Reinforcer A at low prices may become more preferred at higher prices. A second example may be drawn from Volume 2, Chapter 6, this handbook, on discrimination learning and stimulus control. Relatively early research on stimulus equivalence called into question whether language ability was necessary to show positive results on equivalence tests (e.g., Devany, Hayes, & Nelson, 1986; for an introduction to stimulus equivalence, see Chapter 16, this volume and Volume 2, Chapter 1, this handbook). The translational research reviewed in Volume 2, Chapter 6, this handbook, by McIlvane has shown, however, that the typical conditional discrimination procedures used in equivalence research may engender unsuspected forms of stimulus control that meet the reinforcement contingencies (and thus produce high accuracy scores during initial training) but are nevertheless incompatible with equivalence relations because they do not require discrimination of all of the stimuli presented (for details, see Volume 2, Chapter 6, this handbook). The susceptibility to these unsuspected forms of stimulus control may be more prevalent in individuals at lower developmental levels. With procedural enhancements and variations designed to reveal or control for these possibilities, equivalence relations have been shown in humans with minimal language (D. Carr, Wilkinson, Blackman, & McIlvane, 2000; Lionello-DeNolf, McIlvane, Canovas, & Barros, 2008).

Behavior analysis has also addressed issues analogous to those of the T2 and T3 stages of biomedical translational research, and the chapters in Volume 2 of this handbook include descriptions of these efforts. Particular emphasis on T2 and T3 stages is found in Volume 2, Chapters 1 and 18, this handbook. Volume 2, Chapter 10, this handbook, with its emphasis on the application of known behavioral principles to bring about population-wide changes in behavior and cultural evolution, is a particularly good example of the T3 stage of behavior analysis. Bidirectional Translation As McIlvane et al. (2011) noted, translational behavior analysis need not be thought of as a one-way street. Topics, methods, and strategies for basic research may be selected expressly to foster translation, and translational goals may be influenced by the practical realities of application research and intervention practice. To list a few examples, in Volume 2, Chapter 5, this handbook, Nevin and Wacker explicitly call for reverse translation from application to basic research to provide an empirical base to aid in interpretation of factors influencing long-term treatment outcomes for problem behavior. One impetus for the research in stimulus control that McIlvane describes in Volume 2, Chapter 6, this handbook, comes from the problems encountered in special education classrooms in the course of discrete-trials teaching. In their chapter on acceptance and commitment therapy (ACT; Volume 2, Chapter 18, this handbook), Levin et al. move from the clinic to the human operant laboratory for their evaluations of the impact of various ACT components. This approach allows them to evaluate the components with nonclinical populations and using laboratory measures such as task persistence or recovery from mood inductions. The reader will find other examples in Volume 2, Part I, of the handbook. Overview of Volume 2, Part I: Translational Research in Behavior Analysis

Arranging Reinforcement Contingencies In Volume 2, Chapter 3, this handbook, DeLeon, Bullock, and Catania describe findings from basic 67

William V. Dube

and translational research that can inform the design of reinforcement systems in applied settings. The chapter is divided into four major sections. In the first section, The Contingencies: Which Schedule Should We Use? DeLeon et al. consider reinforcement schedules and include descriptions of the basic ratio and interval schedules and their characteristics as well as several other types of schedules useful in application. These schedules include differential reinforcement, in which contingencies may specify certain defined response classes or low or high rates of responding; response-independent schedules; and concurrent schedules, which operate simultaneously and usually independently for two or more different response classes. In the second section, The Response: What Response Should We Reinforce? DeLeon et al. focus on the response and include discussions of research using percentile schedules to shape response forms (e.g., increasing the duration of working on a task), lag schedules to increase response diversity and variability, and embedding reinforcement contingencies within prompted response sequences. They also include an interesting treatment of the punished-byrewards issue, in which some critics of behavior analysis have claimed that the use of extrinsic reinforcers reduces creativity and destroys intrinsic motivation (e.g., Kohn, 1993), and a summary of research findings that refute this notion. In the third section, The Reinforcer: Which Reinforcer Should We Choose? DeLeon et al. consider the clinician’s choice of which reinforcer to use. They include thorough discussions of research on preference assessment procedures and the design and implementation of token reinforcement systems. Thus far, the translational research on token exchange schedules has indicated that token reinforcers may maintain their effectiveness when reinforcement is delayed better than the directly consumable reinforcers for which they are exchanged, which is identified as a topic for future research in applied settings. In the fourth section, Reinforcer Effectiveness and the Relativity of Reinforcement, DeLeon et al. consider the role of context in reinforcer effectiveness and include an introduction to behavioral economics, in which reinforcers are treated as commodities 68

and the behavioral requirements of the reinforcement contingency (e.g., response effort) are treated as the price of that commodity. Changes in price may affect demand for a commodity (typically determined by measuring consumption), and DeLeon et al. describe ways in which the relations between effort and reinforcer effectiveness may have implications for reinforcer use and selection in applied settings. DeLeon et al. conclude a discussion of a couple of tools for the applied researcher and clinician: a decision tree (Figure 3.1) to guide the selection of reinforcers for applied settings and a list of suggestions for troubleshooting in situations in which arranged reinforcement contingencies do not have the desired effect.

Operant Extinction: Elimination and Generation of Behavior Procedurally, extinction refers to the discontinuation of reinforcement for a response. In Volume 2, Chapter 4, this handbook, Lattal, St. Peter, and Escobar review research on extinction and its effects in both the elimination and the generation of behavior. Those readers unfamiliar with behavior analysis may at first be surprised that extinction procedures and response generation are related, and Lattal et al. provide an interesting introduction to this relation. All of the chapters in Volume 2, Part I, of this handbook describe both basic research in non– human animal laboratories and applied research in clinical settings. Volume 2, Chapter 4, this handbook features a very tight integration of the two, often in the same paragraph. The chapter begins with a brief review of the history of the study of extinction, from the early 20th century through Skinner’s early work. This review is followed by clear procedural definitions of relevant terms and an introduction to the types of functional outcomes that may result from procedural extinction, including both the reduction in the response that no longer generates reinforcement and its response-generating or response-inducing effects on other classes of behavior. The remainder of the chapter is divided into two sections, one on the eliminative effects of extinction and one on the generative effects. In the former section, Lattal et al. review the interactions of extinction with schedules of reinforcement and other

Translational Research in Behavior Analysis

parameters of reinforcement, the effects of repeated exposures to extinction, and the effects of sudden versus gradual introduction; the latter section is relevant to the gradual reduction in reinforcement frequency (“schedule thinning”) that is an important component of many clinical treatment interventions. In this section, Lattal et al. also review research on several response-elimination procedures with high relevance to applied behavior analysis. These procedures include differential reinforcement for other behavior, response-produced time outs (i.e., a signaled period in which a reinforcer is not available), and procedures that remove the response–reinforcer dependency, often termed noncontingent reinforcement in applied work. In this section in particular, Lattal et al.’s close integration of basic and applied research provides a valuable resource for the clinician. The section on generative effects of extinction begins by reviewing research on the extinction burst, a period of increased response rate that sometimes— but not always—follows the onset of extinction. Most research on generative effects is reviewed in terms of increased variability in the topography of the target response, increased rate of responses that were previously unreinforced but topographically related to the reinforced response (e.g., behavior that did not meet a defined response criterion), and schedule-induced behavior, which includes behavior topographically unrelated to the target response that occurs during the periods of extinction in intermittent reinforcement schedules. Research with humans has documented schedule-induced behavior, including drinking and stereotypy, and schedule-induced responding has been suggested as a mechanism related to drug taking and smoking. In this section, Lattal et al. also review research on resurgence, which is the recurrence of previously reinforced responding when a more recently reinforced response is extinguished and the recovery of behavior after periods of extinction by the presentation of discriminative, contextual, or reinforcing stimuli.

Simple and Complex Discrimination Learning In Volume 2, Chapter 6, this handbook, McIlvane systematically lays out the complexities involved in

analyzing stimulus control in discrimination learning. His approach makes the topic accessible in part because it incorporates principles of programmed instruction, including a careful analysis and presentation of prerequisites for the reader and a systematic progression from simple to complex. This chapter will be of interest to students and more experienced readers alike, including many who do not consider themselves to be behavior analysts. In fact, one of McIlvane’s explicit goals is to provide a reference for students of cognitive neuroscience who use discrimination procedures as behavioral preparations to study correlations between behavior and underlying neurobiological processes. His message is that even seemingly straightforward procedures may bring unsuspected analytical complexities, and failure to address these complexities will introduce unmeasured sources of variability in the data. McIlvane adopts Sidman’s (1986) analytical units of behavior analysis as a consistent framework for the discussion. The initial section describes simple discrimination procedures in terms of three-term analytical units that correspond to the three terms of the operant contingency: antecedent stimulus, behavior, and consequence. When conditional discrimination, which requires relational stimulus control by two (or more) stimuli (e.g., sample and comparison stimuli in a matching-to-sample procedure) is considered, the analytic unit is expanded to four terms that include two antecedent stimuli. At each level of analysis, McIlvane carefully explains and illustrates analyses in terms of select versus reject stimulus control (e.g., as exerted by the correct and incorrect choices in a discrimination task) and configural versus relational stimulus control (e.g., as exerted by two or more stimuli as a unitary compound in the former and by independent stimulus elements in the latter). Notably, McIlvane always makes a careful distinction between the terms of the procedure, as defined by the experimenter, and the actual analytical units of behavior that those procedures engender in the organism. One of the most important points of the chapter is that these two sets of specifications need not always correspond. I use an oversimplified example here for brevity: A special education teacher may assume that accurate performance on a 69

William V. Dube

matching-to-sample task in which printed-word samples BALL and BAT are matched to corresponding pictures indicates that two word–picture relations have been learned. As McIlvane points out, however, the student may have learned to select the ball picture when the sample was BALL and reject the ball picture when the sample was BAT, a performance that does not include any stimulus control at all by the specific features of the bat picture. If performance that depends on control by those features of the bat picture is poor in subsequent testing for more advanced relational learning (e.g., stimulus equivalence), the teacher may erroneously conclude that the student is not capable of the more advanced performance, when in fact the problem was the teacher’s failure to fully analyze the stimulus– control basis for the initial training. McIlvane discusses situations such as this one in terms of a lack of coherence between the teacher’s (or experimenter’s) assumptions about stimulus control and the actual controlling stimuli and relations. The final sections of McIlvane’s chapter include a brief description of the current state of behavioranalytic theory on the acquisition of stimulus control via differential reinforcement and some ideas about how theory might be advanced. This discussion touches on the issue of improving the coherence between the desired stimulus control (by the experimenter, teacher, etc.) and the stimulus control that is actually captured by the contingencies. McIlvane concludes the chapter with a consideration of two of the current areas of translational research in stimulus control: stimulus control shaping (e.g., by gradual stimulus change procedures) and the analysis of relational learning processes as seen, for example, in stimulus equivalence.

Response Strength and Behavioral Persistence Behavioral momentum theory makes an analogy between behavior and physical momentum: Response rate is analogous to the velocity of a moving body, and an independent aspect of behavior analogous to inertial mass is inferred from the persistence of response rate to disruption by some challenge analogous to an external force applied to a moving body. In Volume 2, Chapter 5, this handbook, 70

Nevin and Wacker open with a discussion of the concept of response strength and basic research, showing that response rate and persistence (resistance to change) are independent aspects of behavior. Response rate is determined by response–reinforcer contingencies (schedules of reinforcement), and resistance to change is determined by the stimulus– reinforcer relation—that is, the characteristic reinforcer rate within a given stimulus situation— independent of response contingencies. This latter point, that persistence is determined by the stimulus– reinforcer (Pavlovian) relation and independent of response–reinforcer (operant) contingencies, has important implications for applied behavior analysis, and Nevin and Wacker include a clear explanation of the basic research supporting it (e.g., Nevin, Tota, Torquato, & Shull, 1990). Why is the distinction between rate-governing and persistence-governing environmental relations important? One answer is that applied behavior analysis has developed several successful approaches for the treatment of problem behavior in populations with developmental limitations. In such populations, verbal behavior is often unreliable, and so the interventions depend on direct manipulation of reinforcement contingencies. Procedures such as differential reinforcement of other behavior, differential reinforcement of alternative behavior, and response-independent reinforcer deliveries (often termed noncontingent reinforcement) are all designed to reduce the frequency of a problem behavior, and they all accomplish it by increasing the overall rate of reinforcement in treatment settings. A wealth of applied research has shown that these procedures have been broadly effective in reducing the rates of problem behavior. Evidence, however, including some presented in Nevin and Wacker’s chapter, has shown that they may do so at the cost of increasing the longer term persistence of the problem behavior when the inevitable challenges to treatment (e.g., brief periods of extinction) are encountered over time (e.g., Mace et al., 2010). That longer term persistence may be manifested in a posttreatment resurgence of problem behavior. Long-term maintenance of treatment gains is a relatively understudied area in applied behavior analysis, and the latter portion of Nevin and Wacker’s

Translational Research in Behavior Analysis

chapter outlines a new approach to the issue, inspired by behavioral momentum theory. In current practice, maintenance is often defined in terms of continuing the desired behavior change over time and under the prevailing conditions of treatment. Nevin and Wacker propose that this step toward maintenance is necessary but not sufficient. They redefine maintenance as the persistence of treatment gains when confronted by changes in antecedent stimuli (people, tasks, prompts, etc.) and the consequences of behavior: “Rather than focusing almost exclusively on behavior occurring under stable treatment conditions, researchers should also consider how various treatment conditions produce or inhibit persistence during challenges to the treatment” (p. 124). Nevin and Wacker present a longitudinal analysis of the effects of extinction challenges to treatment over an average of 35 weeks for eight children. For example, after differential reinforcement of alternative behavior had replaced one child’s destructive behavior with appropriate requesting, the requests were ignored during occasional brief extinction periods. The data from these extinction challenges show gradually increasing persistence of adaptive behavior and decreasing destructive behavior over a 7-month intervention period. Nevin and Wacker conclude the chapter with some suggestions for further research, which include some interesting reverse translation possibilities for the basic research laboratory to examine strengthening and weakening of behavioral persistence over long time courses and the effects of variation in the stimulus situation. Another goal for further research is to determine the extent to which the underlying behavioral processes for “high-p” procedures (Mace et al., 1988), used in applied research to increase compliance, can be related to those of behavioral momentum theory.

Translational Applications of Quantitative Choice Models Jacobs, Borrero, and Vollmer’s goal in Volume 2, Chapter 7, this handbook is “to foster appreciation of quantitative analyses of behavior for readers whose primary interests are in applied research and practice” (p. 165). Quantitative analysis may at first seem a bit opaque to some whose primary interests

are in applied research and practice. As Commons (2001) has noted, however, quantitative analysis “is not primarily a matter of fitting arbitrary functions to data points. Rather, each parameter and variable in a model represents part of a process that has theoretical, empirical, and applied interpretations” (p. 275). Throughout the chapter, Jacobs et al. discuss many translational research studies that illustrate the relationships between the analyses’ parameters and variables and events in socially relevant human behavior, both in and outside of the laboratory. The first of two major sections focuses on the matching law, which relates the relative allocation of behavior between and among concurrently available options to the relative value of the obtained reinforcement for those behavioral options, where value is the product of rate, quality, amount (magnitude), and immediacy (delay). One of the major contributions of the matching law is that it puts behavior in context—the behaving organism always has a choice between engaging in some behavior of interest to the behavior analyst or doing something else. Seen from the perspective of the behavior analyst, therefore, there is also a choice: To increase the behavior of interest, one may increase relative reinforcement contingent on that behavior or decrease reinforcement available for alternative behavior, or both. The matching law section opens with a brief account of the development of the mathematical model, accompanied by text relating the mathematical terms to aspects of behavior. Helpful figures illustrate the effects of changing the values of the terms in the corresponding equations. This introduction is followed by a review of research showing the applicability of the matching law to such diverse human activities as choice between conversational partners among college students, between academic tasks among students with academic delays, between problem and appropriate behavior among children with intellectual and developmental disabilities, and even between 2- and 3-point shots among basketball players. Areas identified for further research on choice include the effects of delay to reinforcement for responses that closely follow a shift from one behavioral option to another (changeover delay), delay to reinforcement for responses 71

William V. Dube

during extended periods of behavior, and analytic tools to better account for ratio- versus interval-like aspects of reinforcement schedules in uncontrolled environments. The second major section of the chapter covers temporal discounting, which describes the relationships between the impact of a reinforcer and the amount of time that will pass before that reinforcer is obtained. This research area is relevant to issues involving human impulsivity and self-control and maladaptive choices to engage in behavior that produces a relatively smaller immediate reinforcer (e.g., second helping of chocolate cake) at the cost of foregoing a relatively larger delayed reinforcer (weight control for better health and appearance). The text and accompanying figure explain how temporal discounting is well described by a hyperbolic function, how this function helps to account for impulsive choice of the smaller, more immediate reinforcer, and how the methods can be used to determine a quantitative value describing the degree to which the individual will discount a reinforcer (i.e., how steeply the value of a reinforcer will decrease as the delay to obtain it increases). Jacobs et al. then go on to review research in assessment of impulsivity in adults and in children, both those who are typically developing and those with developmental disabilities. Research on strategies for decreasing impulsivity is also reviewed. The section concludes with a discussion of the potential for future research to develop discounting assessments that could identify developmental markers and risk for problems associated with impulse control. The remainder of the chapter provides brief overviews of several other quantitative approaches and models relevant to translational research in behavior analysis: behavioral economics (addressed in Volume 2, Chapter 8, this handbook), behavioral ecology, behavioral momentum (addressed in Volume 2, Chapter 5, this handbook), and behavioral detection models based on signal detection theory. Each of these sections provides helpful introductory material.

Behavioral Economics The first half of Volume 2, Chapter 8, this handbook includes an introduction and overview of several important concepts. In the introduction, Hursh et al. 72

describe how a common interest in value and choice in the fields of economics and behavioral psychology provide a context for (a) the extension of microeconomic theory to the consumption of reinforcers in organisms and (b) the application of operant conditioning principles to the economics of demand for commodities. I highly recommend the overview of behavioral economic concepts, regardless of the reader’s level of familiarity with the topics. There is something here for everyone. The material is presented in a clear and balanced discussion that covers areas of consistency and agreement as well as apparent exceptions and thus possible issues for further research. The discussion is illustrated with examples from both basic and translational research. Among the most important concepts reviewed are demand, value, and discounting. As noted earlier, behavioral economics treats reinforcers as commodities and the behavior required to obtain a commodity as its price. Of primary interest is how changes in price affect demand. Demand is measured in terms of consumption, and Hursh et al. discuss the distinction between total consumption as a fundamental dependent variable and other common behavior-analytic dependent variables such as response rate. As price increases, consumption decreases. Hursh et al. describe the essential value of a commodity (reinforcer) in terms of the rate of change in consumption as price increases, that is, “an organism’s defense of consumption in the face of constraint. Commodities that are most vociferously defended are the most essential” (pp. 196–197). In the Quantifying Value From Demand Curve Data section, Hursh et al. discuss methods for obtaining quantitative estimates of value based on rate of change in consumption or, alternatively, the price point that supports peak responding. Also covered are ways in which the availability and price of alternate commodities may affect essential value. When the delivery of a reinforcer is delayed, its value decreases in relation to the duration of the delay; that is, the value is discounted. Quantitative analyses of discounting have revealed a very interesting and significant difference between the discounting function predicted by normative economic theory and that actually obtained by the experimental

Translational Research in Behavior Analysis

analysis of behavior. The former is exponential, in which a reinforcer is devalued at a constant rate over time, and the latter is hyperbolic, in which the devaluation rate is more extreme in the immediate future than in the more distant future. This exponential versus hyperbolic difference is illustrated in the top portion of Hursh et al.’s Figure 8.9. Because of the acceleration in value as a reinforcer becomes more immediate, choice may suddenly shift from a larger– later reinforcer (e.g., losing weight for better health and appearance) to a smaller–sooner one (e.g., an imminent piece of apple pie). Included in the chapter is a very accessible discussion of how the discovery of hyperbolic temporal discounting provides a scientifically grounded explanation for seemingly irrational choices and preference reversals. The second half of the chapter reviews translational research in behavioral economics with attention to areas with potential for further research and development. Much of the translational activity thus far has been related to analyses and treatment for addictions. One set of questions asks whether analyses of demand characteristics and essential value can be used to predict response to treatment interventions by behavioral therapy, response to treatment by medication, and the transition from inpatient to outpatient treatment. In the Translating Delay Discounting section, Hursh et al. look at some potentially promising intervention approaches that address the person with addiction’s discounting of the long-term benefits that would result from recovery. One approach is to teach tolerance for delay; Hursh et al. judge this approach to be at the proof-of-concept stage, with much work remaining to be done. Another approach has been termed reward bundling (e.g., Ainslie, 2001): Rather than a choice between two isolated reinforcers (one smaller–sooner and one larger– later), the choice is between two series (bundles) of reinforcers distributed in time. The theory is that the bundles will affect choice in relation to the sums of the discounted values of all reinforcers in the series. That is, the difference in value of the bundles will be greater than the difference in value of two single reinforcers, and this greater disparity will more than offset the immediacy of the first reward in the smaller–sooner series (see Hursh et al.’s Figure 8.14).

(Although not discussed in the chapter, this approach seems at least conceptually similar to the bundling of distributed social reinforcers in 12-step programs.) Research on reward bundling has yet to be accomplished in applied settings. A third approach, training in skills related to executive functioning (problem solving, strategic planning, working memory), is also in the earliest stages of translational research. In the final section of the chapter, Translating Behavioral Economic Principles to Other Applied Settings, Hursh et al. describe translational research with individuals who have autism, intellectual and developmental disabilities, or both. Hursh et al. cover the assessment of value in such populations and also describe some approaches to scheduling reinforcers in ways that may help to maintain demand as price rises. The section concludes with some behavioral economic considerations of the marketplace in which problem behavior occurs and some issues to be considered when introducing therapeutic reinforcers into this marketplace.

Applied Behavior Analysis and Neuroscience In Volume 2, Chapter 2, this handbook, Travis Thompson examines the pathway from basic laboratory research to application “at the interface of behavior analysis and neuroscience” (p. 33). The first section of the chapter, Behavior Analysis in Physiological Psychology, focuses on behavioral pharmacology, with an emphasis on drug addiction. The historical material discussed in this section includes eyewitness accounts of the early work with nonhuman primates from one of the pioneers in the field. There is an emerging understanding that addiction involves not only the biochemical and physiological effects of drugs but also their interactions with reinforcement contingencies. The translational path proceeds to the merger of behavioral pharmacology and applied behavior analysis in the treatment of substance abuse, and the chapter includes a review of some recent research in this area. In the Establishing Operations: Food Motivation section, Thompson examines research on the effects of neuropeptide Y, which increases food intake but 73

William V. Dube

apparently via an underlying mechanism distinct from that of food deprivation. In this section, Thompson reviews functional magnetic resonance imaging (MRI) research of brain activity associated with food motivation and evidence that typical patterns of neural activity begin in childhood. Also discussed is research comparing pre- and postmeal activation while viewing food pictures in typical control participants and individuals with PraderWilli syndrome, the most common known genetic cause of life-threatening obesity in children. The research points to distinct neural mechanisms associated with food intake in Prader-Willi syndrome. In three of the remaining sections, Thompson reviews translational research in three areas: (a) treatments for some forms of self-injurious behavior in developmental disabilities that are maintained by the release of endogenous opioids; (b) changes in the motor regions of the brain associated with constraint-induced rehabilitation, in which operant contingencies are used to encourage people with stroke to use affected limbs; and (c) functional MRI studies of the effects of exposure to operant contingencies of reinforcement on subsequent viewing of discriminative stimuli and members of stimulus equivalence classes. Thompson also includes an intriguing section on a possible relation between synaptogenesis and success in early intensive behavioral intervention (EIBI) in autism. Although EIBI has been very successful in some cases, the gains are rather modest for approximately half of treated children. Thompson reviews evidence that synapse formation can be activity dependent and raises the question of whether individual differences in synaptogenesis in response to the operant reinforcement contingencies of EIBI— differences that may have a genetic basis—could be related to the ultimate degree of success. This area seems to be a very fertile one for future research.

Environmental Health and Behavior Analysis Newland begins his chapter (Volume 2, Chapter 9, this handbook) with a few sobering facts. For example, of the tens of thousands of chemicals in production, only a small fraction have been subjected to rigorous tests for neurotoxicity. Even among 74

high-priority chemicals whose structure suggests significant potential for toxicity and for which commercial use exceeds 500 tons per year, fewer than one third have undergone any neurotoxicity testing. Newland goes on to outline behavioral neurotoxicological testing methods and analysis criteria using both operant and Pavlovian contingencies. These methods are illustrated with examples that include high levels of exposure to manganese (e.g., in unsafe mining conditions), solvents, ozone, electrical fields, and others. The value of individual subject data is illustrated with Cory-Slechta’s (1986) results on the effects of developmental exposure to lead on fixedinterval responding (see Newland’s Figure 9.3 and accompanying text). Given the nature of the subject matter, toxicity, testing in nonhuman animals is a necessity. One of the important contributions of this chapter is Newland’s Human Testing section, in which he explores problems with and solutions for comparing the results of studies with nonhuman laboratory animals with those conducted with human participants. One solution to this problem is to develop tests that can be used among both populations. This approach is particularly useful when the test results in humans correlate well with more general measures of functioning; for example, the correlation of performance on an incremental repeated acquisition test with that on a standardized intelligence (IQ) test, described by Newland. Another approach involves the elimination or minimization of verbal instruction with humans, which also facilitates comparisons among a diverse array of human populations and cultural groups (e.g., migrant laborers with occupational exposure to pesticides). In the Mechanisms and Interventions section, Newland considers deficits in motor function and the role of reinforcement contingencies in recovery of function (a topic Thompson also addresses in his chapter). Research on the development of tolerance to neurotoxicants and adjustment to impairment is also described as well as disruption of behavioral allocation in behavioral choice procedures (which are described in more detail in Volume 2, Chapter 7, this handbook) and behavioral flexibility as measured by discrimination reversal procedures after gestational exposure to lead or methylmercury.

Translational Research in Behavior Analysis

Newland goes on to describe how these effects may be understood in terms of distortion in the impact of reinforcers and disrupted dopamine function. In the remainder of the chapter, Newland provides an education in the scientific and organizational problems that must be solved to conduct an evidence-based risk assessment. He describes a process for deriving estimates of tolerable human exposures from controlled laboratory studies of animals and from epidemiological studies of exposed humans, in an open and transparent manner. To help meet the need to advance the pace of testing, a current emphasis in the area of in vitro testing (formation of proteins, activity of cells, etc.) is on “highthroughput” techniques for rapidly identifying and characterizing neurotoxicity in large-scale efforts. An important challenge identified for the next generation of behavioral toxicologists is the development of meaningful high-throughput behavioral tests; as Newland notes, “One cannot reproduce the [fixed-interval] schedule in a dish” (p. 245).

From Behavioral Research to Clinical Therapy Clinical behavior analysis (CBA) refers to that branch of applied behavior analysis that addresses what are commonly known as mental health issues in verbally competent adult outpatients: anxiety disorders, depression, adjustment disorders, and so forth. In Volume 2, Chapter 1, this handbook, Guinther and Dougher provide an overview of the historical development of CBA and describe translational research in relation to specific CBA therapies. The historical overview begins with the behavior therapy movement of the 1950s and 1960s and the rise to prominence of cognitive–behavioral therapy in the 1970s, including a discussion of why the mentalistic aspects of cognitive behavior therapy removed it from behavior analysis. Goldiamond’s (1974/2002) constructional approach is credited as the first fully articulated behavior-analytic outpatient therapy, and the development of the more modern CBA therapies over the next 30 years is briefly noted. The Translational Research Relevant to CBA section is preceded by a thoughtful introduction to the conceptual basis of CBA in the Skinnerian analyses

of private events (such as thoughts and feelings) as behavior itself rather than as causes of behavior and the functional (as opposed to formal) analysis of verbal behavior as operant behavior controlled by its audience-mediated consequences. This part of the chapter is divided into three sections: Rule Governance, Equivalence Relations and the Transfer of Stimulus Functions, and Other Stimulus Relations and the Transformation of Functions. Rule governance is the study of interactions in behavioral control between antecedent stimuli in the form of verbal rules or instructions that describe contingencies (including self-generated) and the actual consequences for behavior; the rules may or may not accurately reflect the contingencies. The implications for CBA of self-generated rules at odds with the actual contingencies of daily life seem evident. Stimulus equivalence refers to stimulus–stimulus relations of mutual substitutability (for a basic research review, see Chapter 16, this volume). For example, the spoken word dog and the printed word DOG may be related by equivalence if both control pointing to a picture of a dog (but see Volume 2, Chapter 6, this handbook, for some of the complexities of a thorough analysis). In the Equivalence Relations and the Transfer of Stimulus Functions section, Guinther and Dougher point out that the most clinically interesting feature of stimulus equivalence is transfer of function: Stimulus functions acquired by direct conditioning with one stimulus will also be manifested by other equivalent stimuli in the absence of direct experience. Equivalence relations and transfer of function in verbal stimuli help to explain why words can provoke emotional responses, and the research reviewed in this section documents how stimuli can come to elicit emotions or evoke avoidance behavior via equivalence relations and in the absence of any direct conditioning. The subsequent section, Other Stimulus Relations and the Transformation of Functions, on relational frame theory (RFT) provides an exceptionally clear description of the relation between stimulus equivalence and RFT. RFT expands the study of stimulus–stimulus relations beyond those of equivalence to include opposition, more than–less than, before–after, and many others. The research reviewed has indicated that the transfer of functions 75

William V. Dube

seen in stimulus equivalence may be modified accordingly by other types of stimulus–stimulus relations and is thus termed transformation of function. For example, when initial training established more-than–less-than relations among stimuli such that A < B < C, and B subsequently predicted electric shock in a Pavlovian conditioning procedure, experimental participants showed smaller skin conductance changes to A and larger skin conductance changes to C than to B, even though there was no direct conditioning with the A and C stimuli (Dougher, Hamilton, Fink, & Harrington, 2007). The section that follows, Verbal Behavior and the Clinical Relevance of RFT, considers the wideranging implications for clinical relevance of RFT applied to verbal behavior. In the remainder of the chapter, Guinther and Dougher review the major CBA therapies in current practice. In the CBA Therapies section, they begin by discussing Goldiamond’s (2002) constructional approach, which is foundational in both historical and conceptual senses. As an editorial note, I think that the importance of Goldiamond’s contributions to applied behavior analysis in general cannot be overemphasized. To list a few examples, the diagnostic technique of descriptive analysis (e.g., McComas, Vollmer, & Kennedy, 2009), treatment interventions such as functional communication training that focus on building appropriate repertoires that produce the same class of reinforcers as problem behavior (e.g., E. G. Carr & Durand, 1985), and the focus on repertoire building rather than symptom reduction in ACT (e.g., Hayes, Strosahl, & Wilson, 1999) owe conceptual debts to Goldiamond. The subsections describing the therapies are very clearly labeled, and the reader is referred to Guinther and Douglas’s chapter itself for clear introductions to each one and related efficacy research. These therapies include functional analytic psychotherapy, which is helpful for improving the relationship between therapist and patient; integrative behavioral couples therapy for couples involved in romantic relationships; dialectical behavior therapy for the treatment of borderline personality disorder and other severe problems; behavioral activation therapy, with a primary emphasis on treatment of 76

depression; and a brief introduction to ACT for the treatment of a wide variety of quality-of-life issues (brief because ACT is the subject of Volume 2, Chapter 18, this handbook).

Acceptance and Commitment Therapy In Volume 2, Chapter 18, this handbook, Levin, Hayes, and Vilardaga present an in-depth look at ACT, which is arguably on the leading edge of clinical behavior analysis. ACT developed in concert with RFT, and in the introductory sections of the chapter, Levin et al. present a detailed account of its development. They describe functional contextualism, the philosophical foundation of RFT and ACT, along with a related development strategy termed contextual behavioral science. A key aspect of this approach that seems distinct from mainstream behavior analysis is described as “user-friendly analyses . . . that practitioners can access without fully grasping the details of the basic account, while a smaller number of scientists work out technical accounts” (p. 462). The unifying conceptual system for ACT at an applied level is the psychological flexibility model, which is based on six key processes, each of which is defined and explained: defusion interventions, acceptance interventions, contact with the present moment, self-as-context (perspective taking), values (motivational context), and committed action. A case example is presented that illustrates how these processes are applied to case conceptualization and treatment in ACT, with the goal of significantly improved psychological flexibility. In the next section of the chapter, Expanded Use of Methodologies to Test the Theoretical Model and Treatment Technology, Levin et al. describe a body of evidence supporting the efficacy of ACT. As guided by the contextual behavioral science approach, the treatment technologies and underlying theoretical models have been evaluated by a variety of methodologies, including group designs. Levin et al. describe microcomponent studies that are typically small scale and laboratory based and often use nonclinical populations and focus on “broadly relevant features of behavior such as task persistence and recovery from mood inductions” (p. 470). The results of more than 40 such

Translational Research in Behavior Analysis

microcomponent studies “have suggested that many of these [ACT] components are psychologically active” (p. 470). Also described is research examining the processes of change, and results in this area show that ACT affects theoretically specified processes and that changes in these processes predict changes in treatment outcomes. One of the most salient characteristic of ACT is its broad applicability. This section on evaluative research concludes with a summary of research showing ACT’s impact on an impressive array of problem areas including depression, anxiety, psychosis, chronic pain, substance use, coping with chronic illness, weight loss and maintenance, burnout, sports performance, and others. In the final section of the chapter, Effectiveness, Dissemination, and Training Research, Levin et al. describe the active role of contextual behavioral science researchers in studying issues related to the effective implementation of ACT. The results of this research have shown that training in ACT improves patient outcomes, and these results are also shaping the development of methods for training clinicians in the intervention approach.

Prosocial Behavior and Environment in a Public Health Framework In Volume 2, Chapter 10, this handbook, Biglan and Glenn propose a framework to relate known behavioral principles as they affect individuals to the cultural practices of society as a whole, for the purpose of making general improvements in human wellbeing. In the opening section of the chapter, Behavior Analysis and Cultural Change, they define macrobehavior as the similar but independent behavior of many individuals that has a cumulative effect on the environment. A fundamental difference between behavior and macrobehavior is that the former describes a lineage of recurring responses of an individual that is shaped and maintained by its consequences over time. In contrast, macrobehavior is an aggregate of the operant behavior of many individuals and not controlled by consequences at the macrobehavioral level. The behavioral components of a social system are described as interlocking behavioral contingencies that produce a measurable outcome; for example,

within the social system of a school, one product of educational interlocking behavioral contingencies is students’ academic repertoires. The term metacontingencies describes the contingent relations between interlocking behavioral contingencies and their products and the consequent actions of the external environment on the social system. Biglan and Glenn argue that when the cumulative effects of macrobehavior are detrimental to society and human well-being, the macrobehavior should be treated as a behavioral public health problem. Examples of such detrimental macrobehaviors include substance abuse, academic failure, and crime. Because research from several disciplines has related a small set of non-nurturing conditions to problems of public health and behavior, Biglan and Glenn propose an approach that focuses on increasing the prevalence of nurturing environments. They describe ways to promote such environments that include (among others) minimizing toxic and aversive conditions, reinforcing prosocial behavior, and setting limits on opportunities for problem behavior. Biglan and Glenn then present two extended examples of planned interventions that operate at the macrobehavioral level to make changes in social systems by promoting nurturing environments. The first is the schoolwide Positive Behavior Supports (PBS) program, which as of 2010 had been adopted by more than 9,500 U.S. schools. They delineate the foundational constructs of PBS and emphasize the importance of multilevel organizational support at state, district, and school levels. From a macrobehavioral perspective, they point out the need for research on the specific factors that influence the spread of PBS. The second example of an organized effort to change macrobehavior is the tobacco control movement. This section includes an absorbing look at the interlocking behavioral contingencies of the tobacco industry as a social system and how tobacco marketing resulted in smoking by approximately half of men and one third of women by the middle of the 20th century. Biglan and Glenn also describe the activities of the tobacco control movement, resulting in a macrobehavioral change that reduced smoking behavior by approximately half during the second half of the century. 77

William V. Dube

Biglan and Glenn conclude with a review of evidence-based programs and practices that may be marshaled to produce system-level changes in public health and related policies. In their words, The next challenge is to develop an empirical science of intentional cultural evolution. . . . We argue that this effort will strengthen with a focus on (a) how to influence the spread of a behavior by increasing the incidence of reinforcement for that behavior and (b) how to alter the metacontingencies for organizations that select practices contributing to a more nurturing society. (p. 270) Summary During his term as editor of the Journal of Applied Behavior Analysis, Wacker (1996) pointed out the “need for studies that bridge basic and applied research” (p. 11). The research reviewed in the chapters I have discussed is certainly responsive to that need. It encompasses the T1, T2, and T3 range of translations described in biomedical science, and the bridge has firm abutments in both the basic and the applied research literatures.

References Ainslie, G. (2001). Breakdown of will. Cambridge, England: Cambridge University Press. Carr, D., Wilkinson, K. M., Blackman, D., & McIlvane, W. J. (2000). Equivalence classes in individuals with minimal verbal repertoires. Journal of the Experimental Analysis of Behavior, 74, 101–114. doi:10.1901/jeab.2000.74-101 Carr, E. G., & Durand, V. M. (1985). Reducing behavior problems through functional communication training. Journal of Applied Behavior Analysis, 18, 111–126. doi:10.1901/jaba.1985.18-111 Commons, M. L. (2001). A short history of the society for quantitative analyses of behavior. Behavior Analyst Today, 2, 275–279. Cory-Slechta, D. A. (1986). Vulnerability to lead at later developmental stages. In N. Krasgenor, D. B. Gray, & T. Thompson (Eds.), Advances in behavioral pharmacology: Developmental behavioral pharmacology (pp. 151–168). Hillsdale, NJ: Erlbaum. Devany, J. M., Hayes, S. C., & Nelson, R. O. (1986). Equivalence class formation in language-able 78

and language-disabled children. Journal of the Experimental Analysis of Behavior, 46, 243–257. doi:10.1901/jeab.1986.46-243 Dougher, M. J., Hamilton, D. A., Fink, B. C., & Harrington, J. (2007). Transformation of the discriminative and eliciting functions of generalized relational stimuli. Journal of the Experimental Analysis of Behavior, 88, 179–197. doi:10.1901/jeab.2007.45-05 Goldiamond, I. (2002). Toward a constructional approach to social problems: Ethical and constitutional issues raised by applied behavior analysis. Behavior and Social Issues, 11, 108–197. (Original work published 1974) Hayes, S. C., Strosahl, K., & Wilson, K. G. (1999). Acceptance and commitment therapy: An experiential approach to behavior change. New York, NY: Guilford Press. Iwata, B. A., Dorsey, M. F., Slifer, K. J., Bauman, K. E., & Richman, G. S. (1994). Toward a functional analysis of self-injury. Journal of Applied Behavior Analysis, 27, 197–209. (Original work published 1982) Kohn, A. (1993). Punished by rewards. Boston, MA: Houghton Mifflin. Lionello-DeNolf, K. M., McIlvane, W. J., Canovas, S. D. G., & Barros, R. S. (2008). Reversal learning set and functional equivalence in children with and without autism. Psychological Record, 58, 15–36. Mace, F. C., & Critchfield, T. S. (2010). Translational research in behavior analysis: Historical traditions and imperative for the future. Journal of the Experimental Analysis of Behavior, 93, 293–312. doi:10.1901/jeab.2010.93-293 Mace, F. C., Hock, M. L., Lalli, J. S., West, P. J., Belfiore, P., Pinter, E., & Brown, D. K. (1988). Behavioral momentum in the treatment of noncompliance. Journal of Applied Behavior Analysis, 21, 123–141. doi:10.1901/jaba.1988.21-123 Mace, F. C., McComas, J. J., Mauro, B. C., Progar, P. R., Taylor, B., Ervin, R., & Zangrillo, A. N. (2010). Differential reinforcement of alternative behavior increases resistance to extinction: Clinical demonstration, animal modeling, and clinical test of one solution. Journal of the Experimental Analysis of Behavior, 93, 349–367. doi:10.1901/jeab.2010.93-349 McComas, J. J., Vollmer, T., & Kennedy, C. (2009). Descriptive analysis: Quantification and examination of behavior–environment interactions. Journal of Applied Behavior Analysis, 42, 411–412. doi:10.1901/ jaba.2009.42-411 McIlvane, W. J., Dube, W. V., Lionello-DeNolf, K. M., Serna, R. W., Barros, R. S., & Galvão, O. F. (2011). Some current dimensions of translational behavior analysis: From laboratory research to intervention for persons with autism spectrum disorders. In

Translational Research in Behavior Analysis

E. A. Mayville & J. A. Mulick (Eds.), Behavioral foundations of effective autism treatment (pp. 155–181). Cornwall-on-Hudson, NY: Sloan. Mold, J. W., & Peterson, K. A. (2005). Primary care practice-based research networks: Working at the interface between research and quality improvement. Annals of Family Medicine, 3(Suppl. 1), S12–S20. doi:10.1370/afm.303 Nevin, J. A., Tota, M. E., Torquato, R. D., & Shull, R. L. (1990). Alternative reinforcement increases resistance to change: Pavlovian or operant contingencies? Journal of the Experimental Analysis of Behavior, 53, 359–379. doi:10.1901/jeab.1990.53-359 Overmier, J. B., & Burke, P. D. (1992). Animal models of human pathology: A quarter century of behavioral research. Washington, DC: American Psychological Association. Sidman, M. (1986). Functional analysis of emergent verbal classes. In T. Thompson & M. D. Zeiler (Eds.), Analysis and integration of behavioral units (pp. 213–245). Hillsdale, NJ: Erlbaum.

Skinner, B. F. (1953). Science and human behavior. New York, NY: Macmillan. upFRONT. (2006). Translational science. Philadelphia: Penn Nursing, University of Pennsylvania. Retrieved from http://www.nursing.upenn.edu/about/ Documents/UpFront_8.30.pdf Wacker, D. P. (1996). Behavior analysis research in JABA: A need for studies that bridge basic and applied research. Experimental Analysis of Human Behavior Bulletin, 14, 11–14. Westfall, J. M., Mold, J., & Fagnan, L. (2007). Practicebased research—“Blue highways” on the NIH roadmap. JAMA, 297, 403–406. doi:10.1001/jama.297.4.403 Woolf, S. H. (2008). The meaning of translational research and why it matters. JAMA, 299, 211–213. doi:10.1001/jama.2007.26 World Health Organization. (2004). World report on knowledge for better health. Geneva, Switzerland: Author. Zerhouni, E. (2003). The NIH roadmap. Science, 302, 63–72. doi:10.1126/science.1091867

79

Chapter 4

Applied Behavior Analysis Dorothea C. Lerman, Brian A. Iwata, and Gregory P. Hanley

Applied behavior analysis (ABA) differs from other areas of applied psychology in many respects, but two are especially prominent. First, ABA is not an eclectic enterprise, borrowing theory and method from varied persuasions; it is grounded in the theoretical and experimental orientation of behavior analysis. Second, whereas most applied fields are distinguished by their emphasis on a particular clientele, problem, or setting, ABA is constrained only by its principles and methods. ABA focuses on any aspect of human and sometimes nonhuman behavior, regardless of who emits it or where it occurs, crossing professional boundaries typically used to define clinical, educational, and organizational psychology as well as generational cohorts and diagnostic categories. Thus, the subject matter of ABA is not tied to any specific area of application. Other chapters in this handbook present summaries of applied research organized along more traditional lines. In this chapter, we emphasize ABA’s distinctive features and summarize its major themes on the basis of the behavioral processes of interest. Origins of Applied Behavior Analysis The official beginning of the field of ABA is easy to pinpoint because the term was coined in 1968 when the Journal of Applied Behavior Analysis (JABA) was founded. However, that date represents the culmination of events initiated many years prior. Tracing the emergence of ABA before 1968 is an arbitrary

process because the borders separating basic, translational, and applied research are fluid. Nevertheless, Fuller’s (1949) study, the first published report of operant conditioning with a human, serves as a good starting point. The response that Fuller shaped—a simple arm movement in an individual with profound intellectual disability—had little adaptive value, but it was significant in demonstrating the influence of operant contingencies on human behavior. Although many will find the article’s similarities to current ABA work almost nonexistent, it is worth noting that Boyle and Greer (1983) published an extension of Fuller’s work 35 years later in JABA in which they used similar methods to shape similar responses among comatose patients. Soon after Fuller’s (1949) article appeared, other reports of human operant conditioning followed during the 1950s (Azrin & Lindsley, 1956; Bijou, 1955; Gewirtz & Baer, 1958; Lindsley, 1956). At the end of that decade, Ayllon and Michael (1959) published what many consider the first example of ABA because it contained multiple single-case analyses of different interventions (extinction, differential reinforcement, avoidance training, noncontingent reinforcement) with a range of target behaviors (excessive visitation to the nursing station, aggression, refusal to self-feed, and hoarding) exhibited by psychiatric patients. Similar reports were published throughout the 1960s (see Kazdin, 1978, for a more extensive discussion) in various journals, including the Journal of the Experimental Analysis of Behavior. The board of directors of the Journal of the Experimental Analysis of Behavior, recognizing the need for

DOI: 10.1037/13937-004 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

81

Lerman, Iwata, and Hanley

a journal devoted to applications of operant conditioning, initiated the publication of JABA in 1968. The first issue of JABA contained an article that essentially defined the field. D. M. Baer, Wolf, and Risley (1968) proposed seven defining characteristics—dimensions—of ABA: 1. Applied: The focus is on problems of social importance. 2. Behavioral: Dependent variables reflect direct measurement of the behaviors of interest. 3. Analytic: Demonstrations of behavior change include convincing evidence of experimental control, favoring repeated measures of individual behavior and replication of treatment effects. 4. Technological: Procedural descriptions specify operational features of intervention for all relevant responses. 5. Conceptually systematic: Techniques used to change behavior are related to the basic principles from which they are derived. 6. Effectiveness: The extent of behavior change is sufficient to be of practical value. 7. Generality: Effects of intervention strategies can be programmed across time, behavior, and setting. These dimensions have served as useful guides in planning and evaluating ABA research since the field’s inception. It is a mistake to assume, however, that any particular study or even most of those published in JABA or similar journals illustrates each of the characteristics of ABA described by D. M. Baer et al. (1968). The most obvious example is the dimension of generality: A study with two or three participants cannot establish the external validity of a therapeutic intervention; multiple, systematic replication and extension accomplish that goal (see Chapter 7, this volume). Similarly, although applied may suggest exclusive emphasis on problematic human behavior (deficit or excess), a case can be made for studying an arbitrary response such as eye movement (Schroeder & Holland, 1968) to identify procedures for improving attention or even nonhuman behavior when it is difficult to conduct certain demonstrations with humans, as in the shaping of self-injury (Schaefer, 1970). 82

Distinctive Features of Applied Behavior Analysis Methodological differences between ABA and other therapeutic endeavors are so numerous that discussion is beyond the scope of this chapter; it would also be partially redundant with a description of the experimental analysis of behavior contained in Chapter 2 of this volume. Instead, we describe here some features of applied research that distinguish it from basic laboratory work.

Nature of the Response (Dependent Variable) Basic research focuses primarily on the fundamental learning process. As a result, structural aspects of a response are usually of secondary concern, and dependent variables often consist of arbitrary, discrete responses such as a rat’s lever presses or a pigeon’s key pecks. The response per se assumes greater importance in applied research because the goal of intervention is usually to change some topographical aspect of behavior. In fact, much of ABA research involves establishing a new response that is not in an individual’s repertoire or modifying a current response topography from one that is deficient, socially unacceptable, or even dangerous to one that is deemed more appropriate by the larger community. For example, eating can be accomplished in any number of ways, but the form that eating takes may be a determinant of social acceptability (O’Brien & Azrin, 1972). Other examples include cadence of speech (R. J. Jones & Azrin, 1969), structural aspects of composition (Brigham, Graubard, & Stans, 1972), and nuances of social interaction (Serna, Schumaker, Sherman, & Sheldon, 1991). Sometimes the goal of applied research is not to establish a specific response but, rather, to increase topographical variability, as in the development of creativity (Glover & Gary, 1976; Goetz & Baer, 1973). Applied researchers also focus more often on the measurement of larger units of behavior because most adaptive performances are made up of response chains, such as self-care sequences, academic work, and vocational behavior. Some chains involve repetition of a similar topography, as in building a construction out of blocks (Bancroft,

Applied Behavior Analysis

Weiss, Libby, & Ahearn, 2011); others, however, can be extremely complex. For example, Allison and Ayllon (1980) evaluated a behavioral approach to athletics coaching and measured varied behaviors such as blocks (football), handsprings (gymnastics), and serves (tennis) as response chains consisting of five to 11 members. A third distinction between basic and applied research is the multidimensional nature of measurement in applied work. Basic researchers typically use response rate or response allocation as a dependent variable because both are convenient, standard, and highly sensitive measures (Skinner, 1966; see Chapter 10, this volume). Although they occasionally study other response dimensions such as intensity or force, they do so to determine how these aspects of behavior are influenced by environmental manipulation (Hunter & Davison, 1982; Sumpter, Temple, & Foster, 1998), not because changing these response dimensions is a goal of the research. Applied researchers, by contrast, will study response allocation because, for example, decreasing responses allocated to drug taking is of societal importance (see Volume 2, Chapter 19, this handbook). Likewise, they will study response intensity because it is either too low, as in inaudible speech (Jackson & Wallace, 1974), or too high, as in noisy behavior on the school bus (Greene, Bailey, & Barber, 1981). Furthermore, many human performances are such that clinical improvement requires change in more than one dimension of a response. For example, effective treatment of a child’s sleep disturbance should decrease not only the frequency but also the duration of nighttime wakening (France & Hudson, 1990; see Volume 2, Chapter 17, this handbook). Quantifying changes in academic performance also requires multiple measures, such as the duration of work, the amount of work completed, and accuracy (J. C. McGinnis, Friman, & Carlyon, 1999). In addition to focusing on multiple dimensions of a response (or a response chain), applied researchers often measure several different responses that, in combination, reflect improvement in the problem of interest. The delivery of courteous service (Johnson & Fawcett, 1994), for example, requires the correct performance of a number of responses and response chains that differ depending on the context of the

employee–customer interaction. In a similar way, contemporary research on response suppression attempts not only to eliminate problematic behavior but also to establish more socially appropriate alternatives. Finally, some dependent variables in applied research are not based on direct measurement of any response but rather on some observable outcome or product of a response. At the individual level, response products might consist of carbon monoxide level as an index of smoking (Dallery & Glenn, 2005) or pounds lost as a measure of diet or exercise (Mann, 1972). Aggregate products such as amount of litter (Bacon-Prue, Blount, Pickering, & Drabman, 1980) or community water usage (Agras, Jacob, & Lebedeck, 1980) have also been used occasionally in applied research when measurement of individual behavior is difficult or impossible.

Treatment Characteristics (Independent Variable) Much of ABA research involves the extension of well-established basic principles such as reinforcement, stimulus control, extinction, and punishment, including parametric and comparative analysis. A more distinctive feature of applied work is that operational features of independent variables take on special characteristics determined by the target behavior of interest or practical considerations during implementation. These characteristics include greater emphasis on supplementary cues, procedural variation as a means of enhancing acceptability, and the use of intervention packages. Supplementary cues. The aim of research on response acquisition is to produce new behavior, and efficiency is often an important consideration. Although many adaptive performances could be shaped eventually (as they are in the nonhuman basic research laboratory) through differential reinforcement of successive approximations, applied researchers typically rely heavily on supplementary methods. Common procedures include simple instructions (Bates, 1980), static cues such as pictures (Wacker & Berg, 1983), in vivo or video modeling (Geiger, LeBlanc, Dillon, & Bates, 2010), and physical prompting (R. H. Thompson, McKerchar, & Dancho, 2004). 83

Lerman, Iwata, and Hanley

Procedural variation. Interventions evaluated in a research context are designed for implementation under more naturalistic conditions. This fact makes procedural aspects of treatment, although perhaps incidental to their effects, potential determinants of adoption by parents, teachers, and other therapeutic agents. As such, the applied researcher will vary different components of the procedure depending on the consumer of the intervention. For example, interventions designed to teach new behaviors will experiment with components of the procedure that are anticipated to produce more efficient response acquisition. Social acceptability is a major consideration for response suppression procedures, and several studies have explored ways to maintain the intervention’s efficacy while improving the procedure’s acceptability. A good example of the latter is time out from positive reinforcement. Aside from variations in duration (White, Nielsen, & Johnson, 1972) and schedule (Clark, Rowbury, Baer, & Baer, 1973), the form of time out has included seclusion in a room (Budd, Green, & Baer, 1976), physical holding (Rolider & Van Houten, 1985), removal of a ribbon worn around the neck combined with loss of access to ongoing activities (Foxx & Shapiro, 1978), and placement on the periphery of activity so that one could watch but not participate (Porterfield, Herbert-Jackson, & Risley, 1976). Intervention packages. When the goal of intervention is to simply change behavior without identifying the necessary components of treatment, ABA researchers may combine two or more distinct independent variables into a package. Perhaps the simplest example of such a package is a teacher’s use of praise, smiles, physical contact (a pat on the back), and feedback as consequences for appropriate student behavior. Although a component analysis of treatment effects may be informative because it is unclear which teacher responses actually functioned as positive reinforcement, it may not matter much in a practical sense because none was costly or time consuming. Often, an intervention package, if costly, will be subject to a component analysis after it has proven effective (Yeaton & Bailey, 1983). Similarly, interventions requiring a great deal of effort to implement may benefit from attempts to isolate 84

necessary versus incidental treatment components. For example, the continence training program developed by Azrin and Foxx (1971), although highly effective and efficient, is extremely labor intensive and includes procedures to increase the frequency of elimination, reinforce the absence of toileting accidents as well as the occurrence of correct toileting, teach dressing and undressing skills, and punish toileting accidents with a series of corrective behaviors and time out.

Instrumentation and Design Measurement represents a significant challenge to applied researchers for at least three reasons. First, most research is conducted outside of controlled laboratory settings where recording of responses is not easily automated. Second, dependent variables are often multidimensional in nature and may consist of several distinct response topographies. Finally, when interventions consist of a series of procedures, the consistency of implementing independent variables becomes an additional concern (Peterson, Homer, & Wonderlich, 1982). As a result of these factors, human observers collect most data in applied research because of their versatility in detecting a wide range of responses. This practice introduces the potential for measurement variability that is not found in most basic research. When designing observation procedures, the applied researcher must consider several important factors, including how behavior will be categorically coded by the observers (Meany-Daboul, Roscoe, Bourret, & Ahearn, 2007), how the observers may be most efficiently trained (Wildman, Erickson, & Kent, 1975), how to compute interobserver agreement (Mudford, Taylor, & Martin, 2009), and variables that influence reliability and accuracy of measurement (Kazdin, 1977; Lerman et al., 2010). Experimental control in basic research is typically achieved through a demonstration of reversibility (Cumming & Schoenfeld, 1959); that is, within-subject replication of behavior change resulting from systematic application and removal of the independent variable. Applied researchers often use a similar approach with a variety of reversal-type procedures such as A-B-A, B-A-B, A-B-A-B, and multielement designs (see Chapter 5, this volume).

Applied Behavior Analysis

Occasionally, however, the effects of an applied intervention are irreversible. One situation arises when the independent variable consists of instruction or training whose effects cannot be removed. A second arises when behavior, once changed, encounters other sources of control. For example, a teacher may find it necessary to use prompting, praise, and tangible reinforcers to increase a socially withdrawn child’s rate of peer interaction. Once interaction increases, however, the child may obtain a new source of reinforcement derived from playing with peers, which maintains the child’s social behavior independent of the teacher’s actions. In both of these situations, a multiple-baseline design, involving staggered introduction of treatment across two or more baselines (representing behaviors, participants, or situations), can be used to achieve replication without the necessity of a reversal. The traditional multiple-baseline design involves concurrent measurement in which data are collected across all baselines at the same points in calendar time. Because the logistics of clinical work or the inability to simultaneously recruit participants having unusual problems limits the versatility of this design, a notable variation—the nonconcurrent multiple baseline—has emerged in ABA research and may often represent the design of choice. First described by Watson and Workman (1981) and subsequently illustrated by Isaacs, Embry, and Baer (1982), the nonconcurrent multiple baseline retains the staggered feature of intervention (treatment is applied after baselines of different length); however, measurement across baselines takes place at different points in time. As noted by D. M. Baer et al. (1968), although this design does not contain an explicit control for the confounding effects of a historical accident, differences in both the baseline length and the temporal context of measurement make it extremely unlikely that the same historical accident would coincide with the implementation of treatment on more than one baseline.

Social Validity The final characteristic of ABA that distinguishes it from basic research is a by-product of application. As noted by D. M. Baer et al. (1968), “A society willing to consider a technology of its own behavior

apparently is likely to support that application when it deals with socially important behaviors” (p. 91). Stated another way, consumers are the final judges of applied work, and the term social validity refers to those judgments about three aspects of ABA: (a) the significance of goals, (b) the acceptability of procedures, and (c) the importance of effects (Kazdin, 1977; Wolf, 1978). A variety of methods have been used to examine the social validity of applied research, including evaluations of the relevance of measures by professionals (R. T. Jones, Kazdin, & Haney, 1981), ratings of behavior change by independent judges (Bornstein, Bach, McFall, Friman, & Lyons, 1980), measures of consumer satisfaction (Bourgeois, 1990), treatment choices made by clients (Hanley, Piazza, Fisher, Contrucci, & Maglieri, 1997), and cost–benefit analysis of the effects of intervention (Van Houten, Nau, & Marini, 1980). Although important in concept, the assessment of social validity raises many questions related to subjective measurement (Hawkins, 1991), bias (Fuqua & Schwade, 1986), and the selection of appropriate reference groups as the basis for comparison (Van Houten, 1979) that have yet to be resolved. Perhaps for this reason, social validation measures continue to be rarely used (J. E. Carr, Austin, Britton, Kellum, & Bailey, 1999). Major Themes in Applied Research

Response Acquisition Relying on the basic principles of learning derived from the experimental analysis of behavior (see Chapters 2 and 3, this volume), applied behavior analysts have developed a technology for teaching new behavior and refining aspects of existing behavior. Much of the teaching technology has been discovered with people diagnosed with developmental disabilities, in part because developing practical, age-appropriate skills is a primary treatment goal for these individuals. Nevertheless, this same technology has been applied in early childhood, regular education, and college classrooms and in workplaces, homes, and playing fields. Independent of the context in which it is applied, ABA approaches to teaching usually progress from (a) establishing 85

Lerman, Iwata, and Hanley

simple and observable responses; to (b) shaping more complex, independent, or fluent responses and response chains; and finally to (c) synthesizing complex or chained responses into socially relevant repertoires such as independently completing self-care skills (T. J. Thompson, Braam, & Fuqua, 1982), reading and comprehending text (Clarfield & Stoner, 2005; Volume 2, Chapter 16, this handbook), solving algebraic and trigonometric problems (Ninness et al., 2006), playing football (J. V. Stokes, Luiselli, Reed, & Fleming, 2010), or socializing with friends (Haring & Breen, 1992). Operant contingency. The most fundamental tools of the applied behavior analyst are operant contingencies. It follows that an essential aspect of ABA teaching procedures is maximizing the effects of reinforcing consequences on target behaviors. Thus, efforts are taken to ensure that the reinforcers are valuable to the learner at the time of teaching; these antecedent events that establish the value of the reinforcing consequences are referred to as establishing operations (Laraway, Snycerski, Michael, & Poling, 2003). Efforts are also made to use salient events that signal the availability of the reinforcers to the learner; when these events occasion the target response, they are referred to as discriminative stimuli. Often, experience with a properly designed contingency is not sufficient to generate target responses, especially those that the learner has never emitted previously, so careful consideration is often given to prompts that will greatly increase the probability of the target response. So that the adaptive behavior will be emitted under more natural conditions, efforts are taken to transfer control from the prompt to more naturally occurring establishing operations and discriminative stimuli. These prompt-fading procedures will yield independent and generalized performances. We discuss each of these universal elements of ABA approaches to response acquisition in greater detail and in the context of published empirical examples. Selecting a target response. ABA procedures for developing new behavior share several common characteristics independent of the deficit being addressed or the population with whom it is being addressed. Selection of an objective target behavior 86

is the universal first step. Specific target behaviors are usually selected because they will improve a person’s quality of life in both the short and the long term by allowing the person access to new reinforcers and additional reinforcing contexts. Responses targeted with ABA procedures often allow people to live more independently or behave more effectively and efficiently or in more socially acceptable ways. Rosales-Ruiz and Baer (1977) emphasized the importance of selecting behavior cusps, which they defined as changes in behavior that have important consequences for the person beyond the change itself. For example, learning to walk or talk provides the learner with unprecedented access to important materials, social interactions, and sensory experiences that then occasion other, more complex behavior, such as dancing or reading, allowing access to even richer events and interactions (Bosch & Fuqua, 2001). Task analysis. Once selected, target behaviors are then examined to identify their components and any relevant sequences to the components. This process is often referred to as task analysis (e.g., Cronin & Cuvo, 1979; Resnick, Wang, & Kaplan, 1973; Williams & Cuvo, 1986). The process of task analysis was nicely illustrated in a study by Cuvo, Leaf, and Borakove (1978), who taught janitorial skills to 11 young adults with intellectual disabilities. Six general bathroom-cleaning steps were identified (mirror, sink, urinal, toilet, floor, miscellaneous); each was then subdivided into 13 to 56 component steps before training, which consisted of instructions, modeling, prompting, and differential reinforcement. Task analysis is also an important initial step when teaching almost any athletic skill. For instance, J. V. Stokes, Luiselli, Reed, and Fleming (2010) task analyzed offensive line-blocking skills of starting high school varsity football players before using video and auditory feedback to teach these same skills to other lineman on the team. Much early ABA research identified aspects of task analyses that resulted in better learning. A general finding by Williams and Cuvo (1986) is that that greater specificity of a task analysis results in better generalized performance. Performance assessment. A direct assessment of the learner’s baseline performance is conducted to

Applied Behavior Analysis

determine which components of a target skill need to be taught (e.g., Lerman, Vorndran, Addison, & Kuhn, 2004; T. J. Thompson et al., 1982). Teaching is then implemented for those components that the learner has not yet mastered. While assessing tackling performance by high school football players, for example, J. V. Stokes, Luiselli, and Reed (2010) noticed that most players did not place their face mask on the opponent’s chest or wrap their arms around the opponent, so these specific skills were targeted for teaching. T. J. Thompson et al. (1982) sought to teach laundry skills (sorting, washing, and drying clothes) to three young adults with intellectual disabilities. The skill set included 74 distinct behaviors. To enhance the efficiency of the teaching and the evaluation of the teaching procedures, they assessed all 74 skills in a baseline probe and then focused their teaching only on those skills that were not evident during the probe. Selecting reinforcers. The success of behavior change programs is heavily dependent on consequences that are delivered as reinforcers, which are typically selected on the basis of an assessment of the learner’s preference for a range of events or activities. These assessments can be indirect, involving an interview of teachers or parents who attempt to describe the learner’s preferences (Cautela & Kastenbaum, 1967; Cote, Thompson, Hanley, & McKerchar, 2007; Fisher, Piazza, Bowman, & Amari, 1996), but in more cases, direct assessments are conducted (e.g., DeLeon & Iwata, 1996; Fisher et al., 1992; Pace, Ivancic, Edwards, Iwata, & Page, 1985; Sundby, Dickinson, & Michael, 1996). Direct assessments of preference usually involve the actual presentation of potentially reinforcing items in one of several formats, measurement of a person’s selections, and brief access to an item after its selection. Items that are selected or interacted with most often are then delivered as consequences for some behavior of interest (see Volume 2, Chapter 12, this handbook, for more detail on preference assessment and reinforcer evaluation as they are often applied among children diagnosed with autism). In fact, a common example of the use of arbitrary responses in ABA is as tests for the efficacy of the preferred items before their selection as reinforcers in teaching

programs for socially important behavior (J. E. Carr, Nicolson, & Higbee, 2000; Fisher et al., 1992; Pace et al., 1985). Establishing reinforcers. Once reinforcers are identified, tactics to establish and maintain their value throughout the teaching period are considered. One common procedure is to reserve the use of items delivered as consequences for correct responding to formal teaching periods. Vollmer and Iwata (1991) showed that scheduling brief periods without access to events such as food, music, or social interaction increased their effectiveness as reinforcers during later teaching sessions. Roane, Call, and Falcomata (2005) found similar positive effects of limiting access to leisure items outside of the teaching context, and Hanley, Iwata, and Roscoe (2006) showed that access to preferred items for periods as long as 24 to 72 hours before their use could decrease their value. Because mere access to preferred items can diminish their motivational efficacy in teaching programs, other tactics—such as varying the amount and type of reinforcers (Egel, 1981), allowing the learner opportunities to choose among different reinforcers (Fisher, Thompson, Piazza, Crosland, & Gotjen, 1997; Tiger, Hanley, & Hernandez, 2006), and using token reinforcers that can be traded in for a variety of back-up reinforcers— are often arranged in ABA-based teaching programs (Kazdin & Bootzin, 1972; Moher, Gould, Hegg, & Mahoney, 2008). For a discussion of these issues in translational research, see Volume 2, Chapter 8, this handbook. Contingency, contiguity, and timing. Ensuring that reinforcers are delivered only for particular target responses during the teaching session is another hallmark of ABA programs. When the target response is reinforced and all other responses are extinguished, it is referred to as differential reinforcement (see Vladescu & Kodak, 2010, for a recent review). Differential reinforcement, and the immediacy with which the reinforcing event is delivered after target responses (Gleeson & Lattal, 1987; Vollmer & Hackenberg, 2001), form strong contingencies that result in rapid learning. Some evidence has suggested that learning via reinforcement contingencies will proceed more 87

Lerman, Iwata, and Hanley

efficiently when the rate of learning opportunities is kept high. For instance, Carnine (1976) showed that both correct responding and task participation of typically developing children were higher when instructions were presented with very brief rather than longer intertrial intervals. Similar results have been reported with a wide variety of skills training among participants with intellectual disabilities (e.g., Koegel, Dunlap, & Dyer, 1980). The mechanism of this effect is, at present, unknown but may be the result of the inability of a low reinforcement rate to maintain adequate levels of participant attention. Gaining stimulus control. Strong contingencies of reinforcement not only aid in the teaching of new responses but also make it easier to ensure that responding occurs in the presence of a specific stimulus condition; this correlation between antecedent stimuli and reinforced responding is derived from basic research on stimulus control (Pilgrim, Jackson, & Galizio, 2000; Sidman & Stoddard, 1967). Stimulus control is developed by differentially reinforcing a response in the presence of certain stimulus properties but not others (e.g., saying “dog” in the presence of the letters d-o-g and not in the presence of the letters g-o-d). The formation of stimulus control is an important part of the development of effective behavioral repertoires. Successful reading, for example, is entirely dependent on specific responses (usually vocalizations) coming under control of specific letter combinations (Mueller, Olmi, & Saunders, 2000; Sidman & Willson-Morris, 1974). Accurate responses to instructions occur when specific vocal or motor responses come under control of particular combinations of written or spoken words (Cuvo, Davis, O’Reilly, Mooney, & Crowley, 1992). Developing stimulus control is also important for social behaviors like saying “please” (Halle & Holt, 1991), making a request of a person who appears busy (Kuhn, Chirighin, & Zelenka, 2010), or engaging in important independent living skills, such as an efficient exiting response to a fire alarm (Bannerman, Sheldon, & Sherman, 1991). Prompting and prompt fading. Because the arrangement of contingencies per se may not be sufficient to produce new behavior, a great deal of ABA research has been devoted to the use of prompting 88

and prompt-fading procedures (Demchak, 1990; Gast, VanBiervliet, & Spradlin, 1979; Godby, Gast, & Wolery, 1987; Odom, Chandler, Ostrosky, McConnell, & Reaney, 1992; Schreibman, 1975; Wolery & Gast,1984; Wolery et al., 1992). Two general types of prompts have been the focus of many applied studies. One is a response prompt, which involves the use of a supplementary cue to occasion a correct response, for example, providing vocal instructions, models, or physical guidance to perform the target behavior. These prompts are often eliminated by delaying the prompt across successive trials until the correct response occurs before (i.e., without) the prompt (Schuster, Gast, Wolery, & Guiltinan, 1988). R. H. Thompson, McKerchar, and Dancho (2004) illustrated an example of response prompts by using physical prompts to teach three infants to emit the manual signs “please” and “more” in the presence of food. The prompts were gradually delayed after the visible presentation of the food until the infants were independently signing for the food. The second type of prompt is a stimulus prompt, in which some aspect of the discriminative stimulus is modified to more effectively occasion a correct response. For instance, Duffy and Wishart (1987) taught children with Down syndrome and typically developing toddlers to point to particular shapes when the corresponding shape was named. Prompting consisted of initially representing the named shape with an object much larger than those representing the incorrect comparison shapes. Fading was accomplished by gradually increasing the sizes of the incorrect shapes until all shapes were the same size. Shaping. Certain responses such as early speech sounds are difficult to prompt, in which case shaping may be required to initiate behavior. Shaping involves slight changes in a differential reinforcement contingency such that closer approximations to the target behavior are reinforced over time, and previous approximations are extinguished. Bourret, Vollmer, and Rapp (2004) initially used vocal and model prompts to teach vocalizations to two children diagnosed with autism. The experimenters instructed the participants to emit target utterances

Applied Behavior Analysis

(e.g., say “tune”) and reinforced successful utterances with access to music. When children did not emit the target utterance, imitation of simpler models was reinforced (e.g., changing say “tune” to say “tuh”). When the children began imitating the shorter phonemes, the criterion for reinforcement was reapplied to the complete spoken word. Error reduction. Differential reinforcement of correct responses is a universal aspect of acquisition programs and may account for changes in the frequency of incorrect as well as correct responses. That is, errors may decrease simply as a result of extinction (Charlop & Walsh, 1986). More often, however, instructors deliver explicit consequences after errors, which can include a simple statement such as “no” (Bennett & Ling, 1972), a prompt to engage in the correct response (Clarke, Remington, & Light, 1986), or a remedial trial consisting of repetition of the trial on which an error was made (Nutter & Reid, 1978), an easier trial (Kirby & Holborn, 1986), or a more difficult trial (Repp & Karsh, 1992). Numerous variations on the remedial strategy have been reported in the literature, and Worsdell et al. (2005) conducted a series of comparative studies on quantitative and qualitative characteristics of remediation. Their results showed that (a) multiple repetitions of the correct response were more effective than a single response repetition, (b) correction for every error made was superior to intermittent error correction, and (c) repetition of relevant training words was slightly superior to mere repetition of irrelevant words. An interesting aspect of the data was that error correction involving presentation of irrelevant material also enhanced learning, implicating negative reinforcement (avoidance of remedial trials) as a source of influence during response acquisition. Further variations. Aside from the many ways in which a particular aspect of the teaching situation may be arranged, multiple components may be programmed simultaneously to either increase the efficiency of acquisition or enhance generalization of the effects of the teaching. For instance, Hart and Risley (1968, 1974, 1975, 1980) published a series of studies on the use of incidental teaching—a milieu-based approach to teaching language in

which trials were initiated only when a child showed interest in an object or topic. The key features of their procedures were described as follows: Whenever a child selected a preschool play material, they were prompted and required to ask for it, first by name (noun), then by the name plus a word that described the material (adjectivenoun combination), then by use of a color adjective-noun combination, and finally by requesting the material and describing how they were going to use it (compound sentence). As each requirement was made, the children’s general use of that aspect of language markedly increased. (Hart & Risley, 1974, p. 243) The changing criterion for reinforcement inherent to shaping procedures was evident in incidental teaching; more significant was the fact that instruction (a) occurred intermittently and (b) capitalized on a child-initiated response that identified both the task (name the object) and the reinforcer (access to the named object). These were novel and important features of the instructional program. Although many ABA-based procedures are applied to individual learners, many situations arise in which individuals perform or could perform as part of a group, which serve as the occasion for implementing contingencies on group behavior. Group contingencies involve several arrangements in which the performance of one, some, or all members of a group determines the delivery of reinforcement. Perhaps the best early examples of research on group contingencies can be found in studies conducted at Achievement Place, a home-style setting for predelinquent boys (Bailey, Wolf, & Phillips, 1970; Fixsen, Phillips, & Wolf, 1973; Phillips, 1968; Phillips, Phillips, Fixsen, & Wolf, 1971; Phillips, Phillips, Wolf, & Fixsen, 1973). For instance, Phillips et al. (1971) arranged group contingencies to increase promptness, room cleaning, money saving, and watching the news. In a follow-up study, Phillips et al. (1973) showed that a contingency in which a democratically elected peer manager had the authority both to give and to take away points for peers’ performances was more effective and 89

Lerman, Iwata, and Hanley

preferred by the adolescents than an individual contingency. One of the unique features of group contingencies is that they can create a context for unprogrammed peer-mediated contingencies ranging from praise and offers of assistance to criticism and sabotage. Although some of these side effects have been reported (Frankosky & Sulzer-Azaroff, 1978; Hughes, 1979; Speltz, Shimamura, & McReynolds, 1982), a thorough analysis of the types of social interaction generated by different group contingencies and their role in changing behavior has not been conducted.

Maintenance and Generalization A behavioral technology would have limited clinical value if it failed to produce durable changes in responding. Furthermore, behavior analysts would like performance to transfer across relevant (nontraining) environments and perhaps even to other (untrained) ways of behaving. The term maintenance refers to the persistence of behavior change across time, and the term generalization refers to the persistence of behavior change across settings, people, responses, and other nontraining stimuli. Maintenance and generalization are said to occur if performance persists despite the absence of ancillary antecedents (e.g., prompts) or consequences (e.g., delivery of a token after each response) that originally produced learning. For example, suppose an instructor uses model prompts and reinforcement (e.g., praise plus a piece of candy) to teach a child to say “thank you” when the instructor hands the child a snack in the kitchen. Generalization is said to occur if the child (unprompted) says “thank you” when handed (a) other preferred items (e.g., a toy), (b) a snack in locations other than the kitchen area, or (c) a snack by people other than the instructor, and the response is followed by praise only, candy intermittently, or no consequence at all (T. F. Stokes & Baer, 1977).1 The changed conditions under which “thank you” was said are typically considered examples of stimulus generalization. By contrast, response generalization usually refers to changes in responses that were not directly taught, such as the

child saying, “thanks a lot” instead of “thank you” when handed a snack. Continuing with this example, maintenance is said to occur if the child persists in saying “thank you” without the use of prompts and continuous reinforcement. The persistence and transfer of behavior change is also desirable for behaviors that have been targeted for reduction (Shore, Iwata, Lerman, & Shirley, 1994). Maintenance and generalization are commonly treated as separate areas of concern, but they are necessarily intertwined. A single occurrence of a behavior in a nontraining context might constitute generalized responding. However, behavior change must persist long enough to be detected and to satisfy clinical goals. Koegel and Rincover (1977) clearly illustrated this distinction between maintenance and generalization. They taught children with autism to follow simple instructions in a therapy setting while simultaneously assessing responding in a different setting, in which a novel therapist who did not reinforce correct responses presented the instructions. An analysis of performance across consecutive instructional trials revealed the emergence of generalized responding for two of three participants. However, performance rapidly extinguished in the generalization setting while being maintained in the training setting in which correct responses continued to produce reinforcement. It is possible that stimuli associated with reinforcement in the training setting acquired exclusive control over responding, that stimuli in the generalization setting became discriminative for extinction, or both. The processes of stimulus control and reinforcement are likely determinants of both maintenance and generalization (Kirby & Bickel, 1988). Maintenance. Two primary approaches have been used in applied research to evaluate the persistence of behavior change over time. In the most common approach, experimenters terminate the training condition and then briefly measure the response after an intervening period of time. Successful maintenance has often been reported as an outcome of training when this approach has been used to

This treatment of generalization deviates from that in the laboratory, where generalization is tested in extinction. The basic conceptualization of generalization requires that the response occur in the absence of reinforcement. The more pragmatic approach to generalization frequently taken in ABA requires only that the response occur in the absence of the same consequence that produced the original learning.

1

90

Applied Behavior Analysis

assess maintenance (e.g., Cummings & Carr, 2009; Pierce & Schreibman, 1994). However, this finding is somewhat surprising because few experimenters have explicitly arranged conditions to promote durable behavior change. Furthermore, although the authors typically delineated the conditions in effect during the maintenance check (e.g., the response did not produce reinforcement), they rarely provided information about potential determinants of maintenance during the intervening period (e.g., number of opportunities to engage in the response; contingencies for responding). In a second, less common approach to assessing maintenance, experimenters repeatedly measure levels of responding after removing all sources of reinforcement for the response or after replacing the programmed reinforcer (e.g., candy) with a more naturalistic reinforcer (e.g., praise). Performance persisted in these studies only when special arrangements were made to promote maintenance. These arrangements typically took the form of reinforcement schedule thinning (e.g., R. A. Baer, Blount, Detrich, & Stokes, 1987; Ducharme & Holborn, 1997; Hopkins, 1968; Kale, Kaye, Whelan, & Hopkins, 1968; Kazdin & Polster, 1973) or teaching participants to recruit reinforcement from others in the natural environment (e.g., Seymour & Stokes, 1976). Generalization. In a seminal article, T. F. Stokes and Baer (1977) summarized various strategies to promote generalization that had appeared in the applied literature at that time. The technology of generalization described in their article has changed very little since its publication. The most commonly used ways to program stimulus generalization include (a) varying the stimulus dimension or dimensions of interest during training (i.e., varying the setting, trainers, materials), (b) ensuring that the stimuli present during training are similar to those in the generalization context, (c) thinning the reinforcement schedule during training, and (d) arranging for the behavior to contact reinforcement in the generalization context (Ducharme & Holborn, 1997; Durand & Carr, 1991, 1992; Marzullo-Kerth, Reeve, Reeve, & Townsend, 2011; T. F. Stokes, Baer, & Jackson, 1974). As discussed by Kirby and Bickel (1988), these approaches likely

promote generalization by preventing the development of inappropriate or restricted stimulus control. They do so by varying stimuli that are irrelevant to the response (e.g., specific location of the learner) while maintaining relevant features of the training situation (e.g., delivery of a particular instruction), or by making stimuli specific to the training setting indiscriminable. In the latter case, thinning the schedule of reinforcement, delaying reinforcement, and interspersing training and generalization tests may prevent the presence of the reinforcer from acquiring a discriminative function for further responding. Unambiguous examples of response generalization are more difficult to find in the applied literature, and few studies have evaluated factors that might promote this type of generalized responding. Changes in topographical variations of the targeted behavior, similar to the previous example of the student saying “thanks a lot” instead of “thank you,” may be more likely to occur when the targeted response is exposed to extinction (e.g., Duker & van Lent, 1991; Goetz & Baer, 1973). Some authors have reported collateral changes in responses that bear no physical resemblance to the behavior exposed to treatment, but the mechanisms responsible for these changes were unclear. Presumably, the targeted and generalized response forms were members of the same functional response class. In a study by Barton and Ascione (1979), for example, children taught to engage in vocal sharing (e.g., requesting to share others’ materials, inviting others to share their own materials) showed increases in physical sharing (e.g., handing other children toys) even though the experimenters did not directly teach those responses. Koegel and Frea (1993) reported corresponding increases in untreated aspects of social communication, such as appropriate topic content and facial expressions, after teaching children with autism to use appropriate eye gaze and gestures during conversations. A similar type of generalized behavior change has also been reported when only some topographies of problem behavior were exposed to treatment (e.g., Lovaas & Simmons, 1969; Singh, Watson, & Winton, 1986). Other commonly studied forms of generalization contain elements of both stimulus and response 91

Lerman, Iwata, and Hanley

generalization because different variations of the trained response occur under different variations of the training stimuli. For example, children who receive reinforcement for imitating specific motor movements will begin to imitate novel motor movements in the absence of reinforcement, an emergent skill called generalized imitation (Garcia, Baer, & Firestone, 1971; Young, Krantz, McClannahan, & Poulson, 1994). Other examples can be found in the research on generative language, including the generalized use of the plural morpheme (e.g., Guess, Sailor, Rutherford, & Baer, 1968), subject–verb agreement (e.g., Lutzker & Sherman, 1974), and sentence structure (e.g., “I want _________”; Hernandez, Hanley, Ingvarsson, & Tiger, 2007) and generalization from expressive to receptive language modalities (e.g., Guess & Baer, 1973).

Response Suppression Treating maladaptive behavior has been a major concern of applied researchers and clinicians since the inception of the field. Behaviors targeted for reduction have included responses that put the person performing the behavior and others at risk of injury (e.g., aggression, self-injury, cigarette smoking) as well as those that interfere with learning or adaptive behavior (e.g., disruption, noncompliance). From the earliest research, experimenters recognized that many of these behaviors are maintained by reinforcement contingencies and that modifying these contingencies might help alleviate the problem (e.g., Ayllon & Michael, 1959; Lovaas, Freitag, Gold, & Kassorla, 1965; Wolf, Risley, & Mees, 1963). However, the field initially lacked a systematic approach to identifying the variables that maintain problem behavior. Although some early research focused on a range of variables that might be functionally related to serious behavior disorders (e.g., E. G. Carr, Newsom, & Binkoff, 1980; Lovaas & Simmons, 1969; Rincover, Cook, Peoples, & Packard, 1979; Thomas, Becker, & Armstrong, 1968), most were outcomedriven extensions of basic research studies in which the effects of differential reinforcement and punishment were superimposed on unknown reinforcement contingencies for responding (Bostow & Bailey, 1969; Burchard & Barrera, 1972; Skiba, Pettigrew, & Alden, 1971). Although frequently 92

successful, the latter approach was likely responsible for the inconsistent results reported with most forms of treatment (e.g., Favell et al., 1982) and a greater reliance on punishment in both research and application (Kahng, Iwata, & Lewin, 2002; Pelios, Morren, Tesch, & Axelrod, 1999). Publication of a systematic method for identifying the function or functions of problem behavior (Iwata, Dorsey, Slifer, Bauman, & Richman, 1982) shifted the focus of behavior-analytic approaches to response suppression. Functional analysis methodology involves a direct test of multiple potential reinforcers for problem behavior, including positive reinforcers such as attention or toys and negative reinforcers such as escape from demands. Because of the utility of this assessment approach, treatments that involve terminating the reinforcement contingency for problem behavior (i.e., extinction), delivering the maintaining reinforcer as part of differential reinforcement procedures, and manipulating relevant motivating operations have taken precedent in research and practice. Research has also continued to evaluate the generality of the functional analysis methodology across a variety of behavior problems, populations, and idiosyncratic variables (e.g., Bowman, Fisher, Thompson, & Piazza, 1997; Hagopian, Bruzek, Bowman, & Jennett, 2007). Most recently, knowledge about behavioral function has permitted more detailed analyses of the mechanisms underlying common treatment procedures and factors that influence their effectiveness. Laboratory research on basic processes that reduce responding provided the foundation for behavior analysts’ current technology of treatments for problem behavior, including the commonly used procedural variations of extinction, differential reinforcement, satiation, and punishment. In the following sections, we provide an overview of these response suppression procedures. Extinction. Terminating the reinforcement contingency that maintains a behavior is the simplest, most direct way to suppress responding. In application, however, extinction requires knowledge of these maintaining contingencies to ensure that the procedural form of the intervention (e.g., withholding attention, preventing escape from instructions)

Applied Behavior Analysis

is matched to behavioral function (e.g., maintenance by positive reinforcement in the form of attention or negative reinforcement in the form of escape from instructions). Early demonstrations of extinction were based on hypotheses about the function of problem behavior, which were then confirmed by withholding the putative maintaining reinforcer. For example, Wolf, Birnbrauer, Williams, and Lawler (1965) speculated that the vomiting of a 9-year-old girl with intellectual disabilities was maintained by escape from the classroom. The experimenters instructed the teacher to refrain from sending the student back to her dormitory contingent on vomiting. The frequency of vomiting decreased to zero levels across 30 days, suggesting that the behavior was, in fact, maintained by negative reinforcement. Lovaas and Simmons (1969) conducted one of the earliest demonstrations of extinction with behavior maintained by positive reinforcement. The participants were two children with intellectual disabilities who engaged in severe self-injury. On the basis of the assumption that both children’s behavior was maintained by attention from others, the children were left on their beds alone while an observer recorded instances of selfinjury from an observation room. Despite the success in reducing the self-injury, the experimenters concluded that extinction was an undesirable form of treatment because of the initial high levels of responding before response reduction. The development of functional analysis methodology, as described in Volume 2, Chapter 14, this handbook, greatly facilitated the study of extinction and its procedural variations, including extinction of behavior maintained by positive reinforcement (e.g., withholding toys; Day, Rea, Schussler, Larsen, & Johnson, 1988), extinction of behavior maintained by negative reinforcement (e.g., physically guiding compliance to prevent escape from academic demands; Iwata, Pace, Kalsher, Cowdery, & Cataldo, 1990), and extinction of behavior maintained by automatic reinforcement (e.g., applying protective equipment to block the putative sensory reinforcer for self-injury; Kuhn, DeLeon, Fisher, & Wilke, 1999). Nonetheless, reports of some undesirable effects of extinction (response bursts, resistance to

extinction, extinction-induced aggression) led to the more common practice of combining extinction with other treatment procedures. Research findings have supported this practice by showing that extinction is more effective or associated with fewer side effects when combined with differential or noncontingent reinforcement (E. G. Carr & Durand, 1985; Fisher, DeLeon, Rodriguez-Catter, & Keeney, 2004; Lerman & Iwata, 1995; Piazza, Patel, Gulotta, Sevin, & Layer, 2003; Steege et al., 1990; Vollmer et al., 1998). Moreover, it appears that extinction may often be crucial to the effectiveness of these other treatment procedures (Hagopian, Fisher, Sullivan, Acquisto, & LeBlanc, 1998; Mazaleski, Iwata, Vollmer, Zarcone, & Smith, 1993; Zarcone, Iwata, Mazaleski, & Smith, 1994). A key issue in the use of extinction is the detrimental impact of poor procedural integrity on treatment outcomes as well as strategies to remedy this impact. Caregivers are sometimes unwilling or unable to completely withhold reinforcement for problem behavior. Thus, the practical constraints of using extinction in applied settings have recently occasioned further research on ways to treat problem behavior despite continued reinforcement of the behavior. Differential reinforcement. Interventions that involve delivering a reinforcer for an alternative behavior (differential reinforcement of alterative behavior [DRA]), for the absence of problem behavior (differential reinforcement of other behavior [DRO]), and for reduced levels of problem behavior (differential reinforcement of low rates ([DRL]) remain the most common approaches to treatment. In early treatment studies, differential reinforcement procedures were applied without knowledge of the variables maintaining the targeted behavior. Hence, problem behavior is likely to continue to produce its maintaining reinforcer (e.g., escape from demands) when an irrelevant reinforcer (e.g., candy) is delivered when the individual met the reinforcement contingency. Although less than ideal, these interventions were shown to be effective in several studies (e.g., Allen & Harris, 1966). The use of functional reinforcers not only increased the likelihood of success with differential 93

Lerman, Iwata, and Hanley

reinforcement but resulted in the development of a frequently used variation of DRA called functional communication training. With functional communication training, the reinforcer that has maintained problem behavior is delivered for a communicative response (e.g., saying, “break please” to receive escape) while problem behavior is extinguished. Other variations of DRA involve reinforcing an alternative or incompatible (noncommunicative) behavior (e.g., compliance to demands, toy play). Under DRO and DRL schedules, the person performing the behavior receives a reinforcer if problem behavior does not occur, or if it has occurred less than a specified number of times, during a particular time interval. DRO and DRL have less clinical appeal than DRA because no new behaviors are taught; hence, DRA is more commonly used in research and practice. Recent research on differential reinforcement has focused on determinants of maintenance in applied settings to address problems related to caregiver errors in implementation and transitions from intensive to more practical intervention (e.g., Athens & Vollmer, 2010; Fisher, Thompson, Hagopian, Bowman, & Krug, 2000; Hagopian, Contrucci Kuhn, Long, & Rush, 2005; Hanley, Iwata, & Thompson, 2001; Kuhn et al., 2010; Lalli et al., 1999; St. Peter Pipkin, Vollmer, & Sloman, 2010; Vollmer, Roane, Ringdahl, & Marcus, 1999). This research has shown the following factors to be detrimental to successful treatment outcomes: (a) failing to withhold reinforcement for problem behavior, (b) failing to deliver earned reinforcers, and (c) thinning the schedule of reinforcement for appropriate behavior. These findings have led to additional research on ways to increase the success of differential reinforcement despite these challenges to successful outcomes. Effective strategies have included increasing the quality of reinforcement for appropriate behavior when reinforcement continues to follow problem behavior (Athens & Vollmer, 2010), providing access to alternative stimuli or activities during schedule thinning (Fisher et al., 2000; Hagopian, Contrucci, Kuhn, Long, & Rush, 2005), and teaching clients to respond differentially to stimuli associated with periods of reinforcement versus extinction (Kuhn et al., 2010). 94

Motivating operations. Procedures intended to abolish the reinforcing effects of consequences that maintain problem behavior have most commonly taken the form of response-independent delivery of a reinforcer, also called noncontingent reinforcement (e.g., Vollmer, Iwata, Zarcone, Smith, & Mazaleski, 1993). In most applications, the reinforcer that had maintained problem behavior was delivered on a fixed-time or variable-time schedule while problem behavior was exposed to extinction. Other variations of noncontingent reinforcement, however, have been shown to be effective, including delivery of an irrelevant reinforcer (e.g., food, toys) and delivery of reinforcement in the absence of extinction for problem behavior (e.g., Fisher, DeLeon, RodriguezCatter & Keeney, 2004; Lalli et al., 1999; Lomas, Fisher, & Kelley, 2010). The suppressive effects of noncontingent reinforcement also appeared to endure when reinforcer delivery was discontinued for short periods of time (e.g., 10–15 minutes; M. A. McGinnis, Houchins-Juárez, McDaniel, & Kennedy, 2010; O’Reilly et al., 2009). Most other procedures intended to abolish the reinforcing value of the consequence have focused on modifications to aversive stimuli that set the occasion for problem behavior. These modifications have included reducing the frequency or pace of instructions, changing features of tasks, embedding instructions in preferred activities, and alternating difficult instructions with easier ones (e.g., Dunlap, Kern-Dunlap, Clarke, & Robbins, 1991; Horner, Day, Sprague, O’Brien, & Heathfield, 1991; Kemp & Carr, 1995; Zarcone, Iwata, Smith, Mazaleski, & Lerman, 1994). In nearly all cases, these interventions were combined with extinction for problem behavior. Punishment. A variety of procedures have been effective in reducing behavior through an apparent punishment process. Research has shown that a variety of stimuli, including reprimands, physical restraint, water mist, tastes, smells, noise, and shock (Dorsey, Iwata, Ong, & McSween, 1980; Lalli, Livezey, & Kates, 1996; Linscheid, Iwata, Ricketts, Williams, & Griffin, 1990; Maglieri, DeLeon, Rodriguez-Catter, & Sevin, 2000; Sajwaj, Libet, & Agras, 1974; Stricker, Miltenberger, Garlinghouse, & Tulloch, 2003) can decrease problem behavior very

Applied Behavior Analysis

quickly and safely. Punishment based on the contingent removal of events, including time out from positive reinforcement and response cost, has also been evaluated in the treatment of behavior disorders (Kahng, Tarbox, & Wilke, 2001; Toole, Bowman, Thomason, Hagopian, & Rush, 2003). Although research on punishment has declined in recent years, a substantial body of work has accumulated over the past five decades, revealing much about the application of punishment and factors that influence treatment outcomes. Consistent with basic findings, this research has indicated that punishment will suppress behavior most effectively when the consequence (a) is delivered immediately, (b) follows nearly all instances of the problem behavior, and (c) is combined with extinction of the problem behavior (see Lerman & Vorndran, 2002, for a review). In addition, treatment outcomes can be enhanced by combining punishment with DRA (R. H. Thompson, Iwata, Conners, & Roscoe, 1999) and establishing discriminative control over the response (e.g., Maglieri et al., 2000; Piazza, Hanley, & Fisher, 1996). Nonetheless, an insufficient amount of research has been conducted to develop prescriptions for long-term maintenance and generalization of punishment effects. Furthermore, although the literature contains numerous reports of desirable and undesirable side effects of punishment (e.g., increases in toy play [Koegel, Firestone, Kramme, & Dunlap, 1974], increases in aggression and crying [Hagopian & Adelinis, 2001]), no research has identified the determinants of these outcomes. Concluding Comments The ABA technologies derived from the basic principles of behavior have produced socially important outcomes for many different types of people (e.g., people who abuse substances, college students, athletes, older individuals, employees, people with intellectual disabilities), for a variety of target responses (e.g., literacy, smoking, sleep disturbance, aggression, safe driving), and in a diversity of settings (e.g., businesses, schools, hospitals, homes). Research in ABA has gone beyond simple demonstrations of application, generating knowledge about the mechanisms that underlie common social

problems and how behavioral processes operate under more naturalistic conditions. However, despite more than 50 years of research and practice, some essential questions remain. For example, how do behavior analysts ensure that treatment effects endure over the long term? What approaches are needed to establish complex social repertoires? And how do behavior analysts promote adoption of their technologies by those who would most benefit from them? Moreover, behavior analysts have barely scratched the surface in studying some critical social problems, such as overeating, criminal behavior, and schoolyard bullying. The documented success of their behavioral technologies for remediating other sorts of problems (e.g., self-injury in individuals with developmental disabilities, safety skills of factory workers, reading skills of school-age children; drug use of people with addiction) suggests that further research and practice into relatively unexplored areas will broaden the impact and reach of ABA.

References Agras, W. S., Jacob, R. G., & Lebedeck, M. (1980). The California drought: A quasi-experimental analysis of social policy. Journal of Applied Behavior Analysis, 13, 561–570. doi:10.1901/jaba.1980.13-561 Allen, K. E., & Harris, F. R. (1966). Elimination of a child’s excessive scratching by training the mother in reinforcement procedures. Behaviour Research and Therapy, 4, 79–84. doi:10.1016/0005-7967(66) 90046-5 Allison, M. G., & Ayllon, R. (1980). Behavioral coaching in the development of skills in football, gymnastics, and tennis. Journal of Applied Behavior Analysis, 13, 297–314. doi:10.1901/jaba.1980.13-297 Athens, E. S., & Vollmer, T. R. (2010). An investigation of differential reinforcement of alternative behavior without extinction. Journal of Applied Behavior Analysis, 43, 569–589. doi:10.1901/jaba.2010.43-569 Ayllon, T., & Michael, J. (1959). The psychiatric nurse as a behavioral engineer. Journal of the Experimental Analysis of Behavior, 2, 323–334. doi:10.1901/ jeab.1959.2-323 Azrin, N. H., & Foxx, R. M. (1971). A rapid method of toilet training the institutionalized retarded. Journal of Applied Behavior Analysis, 4, 89–99. doi:10.1901/ jaba.1971.4-89 Azrin, N. H., & Lindsley, O. R. (1956). The reinforcement of cooperation between children. Journal 95

Lerman, Iwata, and Hanley

of Abnormal and Social Psychology, 52, 100–102. doi:10.1037/h0042490 Bacon-Prue, A., Blount, R., Pickering, D., & Drabman, R. (1980). An evaluation of three litter control procedures: Trash receptacles, paid workers, and the marked item techniques. Journal of Applied Behavior Analysis, 13, 165–170. doi:10.1901/jaba.1980.13-165 Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91–97. doi:10.1901/jaba.1968.1-91 Baer, R. A., Blount, R. L., Detrich, R., & Stokes, T. F. (1987). Using intermittent reinforcement to program maintenance of verbal/nonverbal correspondence. Journal of Applied Behavior Analysis, 20, 179–184. doi:10.1901/jaba.1987.20-179

Bosch, S., & Fuqua, R. W. (2001). Behavioral cusps: A model for selecting target behaviors. Journal of Applied Behavior Analysis, 34, 123–125. doi:10.1901/ jaba.2001.34-123 Bostow, D. E., & Bailey, J. (1969). Modification of severe disruptive and aggressive behavior using brief timeout and reinforcement procedures. Journal of Applied Behavior Analysis, 2, 31–37. doi:10.1901/jaba.1969.2-31 Bourgeois, M. S. (1990). Enhancing conversation skills in patients with Alzheimer’s disease using a prosthetic memory aid. Journal of Applied Behavior Analysis, 23, 29–42. doi:10.1901/jaba.1990.23-29 Bourret, J., Vollmer, T. R., & Rapp, J. T. (2004). Evaluation of a vocal mand assessment and vocal mand training procedures. Journal of Applied Behavior Analysis, 37, 129–144. doi:10.1901/jaba.2004.37-129

Bailey, J. S., Wolf, M. M., & Phillips, E. L. (1970). Home-based reinforcement and the modification of pre-delinquents’ classroom behavior. Journal of Applied Behavior Analysis, 3, 223–233. doi:10.1901/ jaba.1970.3-223

Bowman, L. G., Fisher, W. W., Thompson, R. H., & Piazza, C. C. (1997). On the relation of mands and the function of destructive behavior. Journal of Applied Behavior Analysis, 30, 251–265. doi:10.1901/ jaba.1997.30-251

Bancroft, S. L., Weiss, J. S., Libby, M. E., & Ahern, W. H. (2011). A comparison of procedural variations in teaching behavior chains: Manual guidance, trainer completion, and no completion of untrained steps. Journal of Applied Behavior Analysis, 44, 559–569.

Boyle, M. E., & Greer, R. D. (1983). Operant procedures and the comatose patient. Journal of Applied Behavior Analysis, 16, 3–12. doi:10.1901/jaba.1983.16-3

Bannerman, D. J., Sheldon, J. B., & Sherman, J. A. (1991). Teaching adults with severe and profound retardation to exit their homes upon hearing the fire alarm. Journal of Applied Behavior Analysis, 24, 571–577. doi:10.1901/jaba.1991.24-571 Barton, E. J., & Ascione, F. R. (1979). Sharing in preschool children: Facilitation, stimulus generalization, response generalization, and maintenance. Journal of Applied Behavior Analysis, 12, 417–430. doi:10.1901/ jaba.1979.12-417

Brigham, T. A., Graubard, P. S., & Stans, A. (1972). Analysis of the effects of sequential reinforcement contingencies on aspects of composition. Journal of Applied Behavior Analysis, 5, 421–429. doi:10.1901/ jaba.1972.5-421 Budd, K. S., Green, D. R., & Baer, D. M. (1976). An analysis of multiple misplaced parental social contingencies. Journal of Applied Behavior Analysis, 9, 459–470. doi:10.1901/jaba.1976.9-459 Burchard, J. D., & Barrera, F. (1972). An analysis of timeout and response cost in a programmed environment. Journal of Applied Behavior Analysis, 5, 271–282. doi:10.1901/jaba.1972.5-271

Bates, P. (1980). The effectiveness of interpersonal skills training on the social skill acquisition of moderately and mildly retarded adults. Journal of Applied Behavior Analysis, 13, 237–248. doi:10.1901/ jaba.1980.13-237

Carnine, D. W. (1976). Effects of two teacher-presentation rates on off-task behavior, answering correctly, and participation. Journal of Applied Behavior Analysis, 9, 199–206. doi:10.1901/jaba.1976.9-199

Bennett, C. W., & Ling, D. (1972). Teaching a complex verbal response to a hearing-impaired girl. Journal of Applied Behavior Analysis, 5, 321–327. doi:10.1901/ jaba.1972.5-321

Carr, E. G., & Durand, V. M. (1985). Reducing behavior problems through functional communication training. Journal of Applied Behavior Analysis, 18, 111–126. doi:10.1901/jaba.1985.18-111

Bijou, S. W. (1955). A systematic approach to an experimental analysis of young children. Child Development, 26, 161–168.

Carr, E. G., Newsom, C. D., & Binkoff, J. A. (1980). Escape as a factor in the aggressive behavior of two retarded children. Journal of Applied Behavior Analysis, 13, 101–117. doi:10.1901/jaba.1980.13-101

Bornstein, P. H., Bach, P. J., McFall, M. E., Friman, P. C., & Lyons, P. D. (1980). Application of a social skills training program in the modification of interpersonal deficits among retarded adults: A clinical replication. Journal of Applied Behavior Analysis, 13, 171–176. doi:10.1901/jaba.1980.13-171 96

Carr, J. E., Austin, J. E., Britton, L. N., Kellum, K. K., & Bailey, J. S. (1999). An assessment of social validity trends in applied behavior analysis. Behavioral Interventions, 14, 223–231. doi:10.1002/(SICI)1099078X(199910/12)14:43.0.CO;2-Y

Applied Behavior Analysis

Carr, J. E., Nicolson, A. C., & Higbee, T. S. (2000). Evaluation of a brief multiple-stimulus preference assessment in a naturalistic context. Journal of Applied Behavior Analysis, 33, 353–357. doi:10.1901/ jaba.2000.33-353 Cautela, J. R., & Kastenbaum, R. (1967). A reinforcement survey schedule for use in therapy, training, and research. Psychological Reports, 20, 1115–1130. doi:10.2466/pr0.1967.20.3c.1115 Charlop, M. H., & Walsh, M. E. (1986). Increasing autistic children’s spontaneous verbalizations of affection: An assessment of time delay and peer modeling procedures. Journal of Applied Behavior Analysis, 19, 307–314. doi:10.1901/jaba.1986.19-307 Clarfield, J., & Stoner, G. (2005). The effects of computerized reading instruction on the academic performance of students identified with ADHD. School Psychology Review, 34, 246–254. Clark, H. B., Rowbury, T., Baer, A. M., & Baer, D. M. (1973). Timeout as a punishing stimulus in continuous and intermittent schedules. Journal of Applied Behavior Analysis, 6, 443–455. doi:10.1901/ jaba.1973.6-443

Journal of Applied Behavior Analysis, 11, 345–355. doi:10.1901/jaba.1978.11-345 Dallery, J., & Glenn, I. M. (2005). Effects of an Internetbased voucher reinforcement program for smoking abstinence: A feasibility study. Journal of Applied Behavior Analysis, 38, 349–357. doi:10.1901/jaba. 2005.150-04 Day, R. M., Rea, J. A., Schussler, N. G., Larsen, S. E., & Johnson, W. L. (1988). A functionally based approach to the treatment of self-injurious behavior. Behavior Modification, 12, 565–589. doi:10.1177/ 01454455880124005 DeLeon, I. G., & Iwata, B. A. (1996). Evaluation of a multiple-stimulus presentation format for assessing reinforcer preferences. Journal of Applied Behavior Analysis, 29, 519–533. doi:10.1901/jaba.1996.29-519 Demchak, M. (1990). Response prompting and fading methods: A review. American Journal on Mental Retardation, 94, 603–615. Dorsey, M. F., Iwata, B. A., Ong, P., & McSween, T. E. (1980). Treatment of self-injurious behavior using a water mist: Initial response suppression and generalization. Journal of Applied Behavior Analysis, 13, 343–353. doi:10.1901/jaba.1980.13-343

Clarke, S., Remington, B., & Light, P. (1986). An evaluation of the relationship between receptive speech skills and expressive signing. Journal of Applied Behavior Analysis, 19, 231–239. doi:10.1901/jaba. 1986.19-231

Ducharme, D. E., & Holborn, S. W. (1997). Programming generalization of social skills in preschool children with hearing impairments. Journal of Applied Behavior Analysis, 30, 639–651. doi:10.1901/jaba.1997.30-639

Cote, C. A., Thompson, R. H., Hanley, G. P., & McKerchar, P. M. (2007). Teacher report versus direct assessment of preferences for identifying reinforcers for young children. Journal of Applied Behavior Analysis, 40, 157–166. doi:10.1901/jaba.2007.177-05

Duffy, L., & Wishart, J. G. (1987). A comparison of two procedures for teaching discrimination skills to Down’s syndrome and non-handicapped children. British Journal of Educational Psychology, 57, 265–278. doi:10.1111/j.2044-8279.1987.tb00856.x

Cronin, K. A., & Cuvo, A. J. (1979). Teaching mending skills to mentally retarded adolescents. Journal of Applied Behavior Analysis, 12, 401–406. doi:10.1901/ jaba.1979.12-401

Duker, P. C., & van Lent, C. (1991). Inducing variability in communicative gestures used by severely retarded individuals. Journal of Applied Behavior Analysis, 24, 379–386. doi:10.1901/jaba.1991.24-379

Cumming, W. W., & Schoenfeld, W. N. (1959). Some data on behavioral reversibility in a steady state experiment. Journal of the Experimental Analysis of Behavior, 2, 87–90. doi:10.1901/jeab.1959.2-87

Dunlap, G., Kern-Dunlap, L., Clarke, S., & Robbins, F. R. (1991). Functional assessment, curricular revision, and severe behavior problems. Journal of Applied Behavior Analysis, 24, 387–397. doi:10.1901/ jaba.1991.24-387

Cummings, A. R., & Carr, J. E. (2009). Evaluating progress in behavioral programs for children with autism spectrum disorders via continuous and discontinuous measurement. Journal of Applied Behavior Analysis, 42, 57–71. doi:10.1901/jaba.2009.42-57 Cuvo, A. J., Davis, P. K., O’Reilly, M. F., Mooney, B. M., & Crowley, R. (1992). Promoting stimulus control with textual prompts and performance feedback for persons with mild disabilities. Journal of Applied Behavior Analysis, 25, 477–489. doi:10.1901/ jaba.1992.25-477 Cuvo, A. J., Leaf, R. B., & Borakove, L. S. (1978). Teaching janitorial skills to the mentally retarded: Acquisition, generalization, and maintenance.

Durand, V. M., & Carr, E. G. (1991). Functional communication training to reduce challenging behavior: Maintenance and application in new settings. Journal of Applied Behavior Analysis, 24, 251–264. doi:10.1901/jaba.1991.24-251 Durand, V. M., & Carr, E. G. (1992). An analysis of maintenance following functional communication training. Journal of Applied Behavior Analysis, 25, 777–794. doi:10.1901/jaba.1992.25-777 Egel, A. L. (1981). Reinforcer variation: Implications for motivating developmentally disabled children. Journal of Applied Behavior Analysis, 14, 345–350. doi:10.1901/jaba.1981.14-345 97

Lerman, Iwata, and Hanley

Favell, J. E., Azrin, N. H., Baumeister, A. A., Carr, E. G., Dorsey, M. F., Forehand, R., & Solnick, J. V. (1982). The treatment of self-injurious behavior. Behavior Therapy, 13, 529–554. doi:10.1016/S00057894(82)80015-4 Fisher, W. W., DeLeon, I. G., Rodriguez-Catter, V., & Keeney, K. M. (2004). Enhancing the effects of extinction on attention-maintained behavior through noncontingent delivery of attention or stimuli identified via a competing stimulus assessment. Journal of Applied Behavior Analysis, 37, 171–184. doi:10.1901/ jaba.2004.37-171 Fisher, W. W., Piazza, C. C., Bowman, L. G., & Amari, A. (1996). Integrating caregiver report with a direct choice assessment to enhance reinforcer identification. American Journal on Mental Retardation, 101, 15–25. Fisher, W. W., Piazza, C. C., Bowman, L. G., Hagopian, L. P., Owens, J. C., & Slevin, I. (1992). A comparison of two approaches for identifying reinforcers for persons with severe and profound disabilities. Journal of Applied Behavior Analysis, 25, 491–498. doi:10.1901/ jaba.1992.25-491 Fisher, W. W., Thompson, R. H., Hagopian, L. P., Bowman, L. G., & Krug, A. (2000). Facilitating tolerance of delayed reinforcement during functional communication training. Behavior Modification, 24, 3–29. doi:10.1177/0145445500241001

R. W. Fuqua (Eds.), Research methods in applied behavior analysis (pp. 265–292). New York, NY: Plenum Press. Garcia, E., Baer, D. M., & Firestone, I. (1971). The development of generalized imitation within topographically determined boundaries. Journal of Applied Behavior Analysis, 4, 101–112. doi:10.1901/ jaba.1971.4-101 Gast, D. L., VanBiervliet, A., & Spradlin, J. E. (1979). Teaching number-word equivalences: A study of transfer. American Journal of Mental Deficiency, 83, 524–527. Geiger, K. B., LeBlanc, L. A., Dillon, C. M., & Bates, S. L. (2010). An evaluation of preference for video and in vivo modeling. Journal of Applied Behavior Analysis, 43, 279–283. doi:10.1901/jaba.2010.43-279 Gewirtz, J. L., & Baer, D. M. (1958). The effect of brief social deprivation on behaviors for a social reinforcer. Journal of Abnormal and Social Psychology, 57, 165–172. doi:10.1037/h0042880 Gleeson, S., & Lattal, K. A. (1987). Response-reinforcer relations and the maintenance of behavior. Journal of the Experimental Analysis of Behavior, 48, 383–393. doi:10.1901/jeab.1987.48-383 Glover, J., & Gary, A. L. (1976). Procedures to increase some aspects of creativity. Journal of Applied Behavior Analysis, 9, 79–84. doi:10.1901/jaba.1976.9-79

Fisher, W. W., Thompson, R. H., Piazza, C. C., Crosland, K., & Gotjen, D. (1997). On the relative reinforcing effects of choice and differential consequences. Journal of Applied Behavior Analysis, 30, 423–438. doi:10.1901/jaba.1997.30-423

Godby, S., Gast, D. L., & Wolery, M. (1987). A comparison of time delay and system of least prompts in teaching object discrimination. Research in Developmental Disabilities, 8, 283–305. doi:10.1016/ 0891-4222(87)90009-6

Fixsen, D. L., Phillips, E. L., & Wolf, M. M. (1973). Achievement place: Experiments in self-government with pre-delinquents. Journal of Applied Behavior Analysis, 6, 31–47. doi:10.1901/jaba.1973.6-31

Goetz, E. M., & Baer, D. M. (1973). Social control of form diversity and the emergence of new forms in children’s blockbuilding. Journal of Applied Behavior Analysis, 6, 209–217. doi:10.1901/jaba.1973.6-209

Foxx, R. M., & Shapiro, S. T. (1978). The timeout ribbon: A nonexclusionary timeout procedure. Journal of Applied Behavior Analysis, 11, 125–136. doi:10.1901/ jaba.1978.11-125

Greene, B. F., Bailey, J. S., & Barber, F. (1981). An analysis and reduction of disruptive behavior on school buses. Journal of Applied Behavior Analysis, 14, 177–192. doi:10.1901/jaba.1981.14-177

France, K. G., & Hudson, S. M. (1990). Behavior management of infant sleep disturbance. Journal of Applied Behavior Analysis, 23, 91–98. doi:10.1901/ jaba.1990.23-91

Guess, D., & Baer, D. M. (1973). An analysis of individual differences in generalization between receptive and productive language in retarded children. Journal of Applied Behavior Analysis, 6, 311–329. doi:10.1901/jaba.1973.6-311

Frankosky, R., & Sulzer-Azaroff, B. (1978). Individual and group contingencies and collateral social behaviors. Behavior Therapy, 9, 313–327. doi:10.1016/ S0005-7894(78)80075-6 Fuller, P. R. (1949). Operant conditioning of a vegetative human organism. American Journal of Psychology, 62, 587–590. doi:10.2307/1418565 Fuqua, R. W., & Schwade, J. (1986). Social validation and applied behavioral research. In A. Poling & 98

Guess, D., Sailor, W., Rutherford, G., & Baer, D. M. (1968). An experimental analysis of linguistic development: The productive use of the plural morpheme. Journal of Applied Behavior Analysis, 1, 297–306. doi:10.1901/jaba.1968.1-297 Hagopian, L. P., & Adelinis, J. D. (2001). Response blocking with and without redirection for the treatment of pica. Journal of Applied Behavior Analysis, 34, 527–530. doi:10.1901/jaba.2001.34-527

Applied Behavior Analysis

Hagopian, L. P., Bruzek, J. L., Bowman, L. G., & Jennett, H. K. (2007). Assessment and treatment of problem behavior occasioned by interruption of free-operant behavior. Journal of Applied Behavior Analysis, 40, 89–103. doi:10.1901/jaba.2007.63-05 Hagopian, L. P., Contrucci Kuhn, S. A., Long, E. S., & Rush, K. S. (2005). Schedule thinning following communication training: Using competing stimuli to enhance tolerance to decrements in reinforcer density. Journal of Applied Behavior Analysis, 38, 177–193. doi:10.1901/jaba.2005.43-04

Applied Behavior Analysis, 13, 407–432. doi:10.1901/ jaba.1980.13-407 Hawkins, R. P. (1991). Is social validity what we are interested in? Argument for a functional approach. Journal of Applied Behavior Analysis, 24, 205–213. doi:10.1901/jaba.1991.24-205 Hernandez, E., Hanley, G. P., Ingvarsson, E. T., & Tiger, J. H. (2007). A preliminary evaluation of the emergence of novel mand forms. Journal of Applied Behavior Analysis, 40, 137–156. doi:10.1901/ jaba.2007.96-05

Hagopian, L. P., Fisher, W. W., Sullivan, M. T., Acquisto, J., & LeBlanc, L. A. (1998). Effectiveness of functional communication training with and without extinction and punishment: A summary of 21 inpatient cases. Journal of Applied Behavior Analysis, 31, 211–235. doi:10.1901/jaba.1998.31-211

Hopkins, B. L. (1968). Effects of candy and social reinforcement, instructions, and reinforcement schedule leaning on the modification and maintenance of smiling. Journal of Applied Behavior Analysis, 1, 121–129. doi:10.1901/jaba.1968.1-121

Halle, J. W., & Holt, B. (1991). Assessing stimulus control in natural settings: An analysis of stimuli that acquire control during training. Journal of Applied Behavior Analysis, 24, 579–589. doi:10.1901/jaba. 1991.24-579

Horner, R. H., Day, H. M., Sprague, J. R., O’Brien, M., & Heathfield, L. T. (1991). Interspersed requests: A nonaversive procedure for reducing aggression and self-injury during instruction. Journal of Applied Behavior Analysis, 24, 265–278. doi:10.1901/ jaba.1991.24-265

Hanley, G. P., Iwata, B. A., & Roscoe, E. M. (2006). Factors influencing the stability of preferences. Journal of Applied Behavior Analysis, 39, 189–202. doi:10.1901/jaba.2006.163-04 Hanley, G. P., Iwata, B. A., & Thompson, R. H. (2001). Reinforcement schedule thinning following treatment with functional communication training. Journal of Applied Behavior Analysis, 34, 17–38. doi:10.1901/jaba.2001.34-17 Hanley, G. P., Piazza, C. C., Fisher, W. W., Contrucci, S. A., & Maglieri, K. A. (1997). Evaluation of client preference for function-based treatment packages. Journal of Applied Behavior Analysis, 30, 459–473. doi:10.1901/jaba.1997.30-459 Haring, T. G., & Breen, C. G. (1992). A peer-mediated social network intervention to enhance the social integration of persons with moderate and severe disabilities. Journal of Applied Behavior Analysis, 25, 319–333. doi:10.1901/jaba.1992.25-319 Hart, B. M., & Risley, T. R. (1968). Establishing use of descriptive adjectives in the spontaneous speech of disadvantaged preschool children. Journal of Applied Behavior Analysis, 1, 109–120. doi:10.1901/ jaba.1968.1-109 Hart, B., & Risley, T. R. (1974). Using preschool materials to modify the language of disadvantaged children. Journal of Applied Behavior Analysis, 7, 243–256. doi:10.1901/jaba.1974.7-243 Hart, B., & Risley, T. R. (1975). Incidental teaching of language in the preschool. Journal of Applied Behavior Analysis, 8, 411–420. doi:10.1901/jaba.1975.8-411 Hart, B., & Risley, T. R. (1980). In vivo language intervention: Unanticipated general effects. Journal of

Hughes, H. M. (1979). Behavior change in children at a therapeutic summer camp as a function of feedback and individual versus group contingencies. Journal of Abnormal Child Psychology, 7, 211–219. doi:10.1007/ BF00918901 Hunter, I., & Davison, M. (1982). Independence of response force and reinforcement rate on concurrent variable-interval schedule performance. Journal of the Experimental Analysis of Behavior, 37, 183–197. doi:10.1901/jeab.1982.37-183 Isaacs, C. D., Embry, L. H., & Baer, D. M. (1982). Training family therapists: An experimental analysis. Journal of Applied Behavior Analysis, 15, 505–520. doi:10.1901/jaba.1982.15-505 Iwata, B. A., Dorsey, M. F., Slifer, K. J., Bauman, K. E., & Richman, G. S. (1982). Toward a functional analysis of self-injury. Analysis and Intervention in Developmental Disabilities, 2, 3–20. doi:10.1016/ 0270-4684(82)90003-9 Iwata, B. A., Pace, G. M., Kalsher, M. J., Cowdery, G. E., & Cataldo, M. F. (1990). Experimental analysis and extinction of self-injurious escape behavior. Journal of Applied Behavior Analysis, 23, 11–27. doi:10.1901/ jaba.1990.23-11 Jackson, D. A., & Wallace, R. F. (1974). The modification and generalization of voice loudness in a fifteen-yearold retarded girl. Journal of Applied Behavior Analysis, 7, 461–471. doi:10.1901/jaba.1974.7-461 Johnson, M. D., & Fawcett, S. B. (1994). Courteous service: Its assessment and modification in a human service organization. Journal of Applied Behavior Analysis, 27, 145–152. doi:10.1901/jaba.1994.27-145 99

Lerman, Iwata, and Hanley

Jones, R. J., & Azrin, N. H. (1969). Behavioral engineering: Stuttering as a function of stimulus duration during speech synchronization. Journal of Applied Behavior Analysis, 2, 223–229. doi:10.1901/ jaba.1969.2-223

Koegel, R. L., Firestone, P. B., Kramme, K. W., & Dunlap, G. (1974). Increasing spontaneous play by suppressing self-stimulation in autistic children. Journal of Applied Behavior Analysis, 7, 521–528. doi:10.1901/ jaba.1974.7-521

Jones, R. T., Kazdin, A. E., & Haney, J. I. (1981). Social validation and training of emergency fire safety skills for potential injury prevention and life saving. Journal of Applied Behavior Analysis, 14, 249–260. doi:10.1901/jaba.1981.14-249

Koegel, R. L., & Frea, W. D. (1993). Treatment of social behavior in autism through the modification of pivotal social skills. Journal of Applied Behavior Analysis, 26, 369–377. doi:10.1901/jaba.1993.26-369

Kahng, S., Iwata, B. A., & Lewin, A. (2002). Behavioral treatment of self-injury, 1964–2000. American Journal on Mental Retardation, 107, 212–221. doi:10.1352/0895-8017(2002)107 2.0.CO;2 Kahng, S. W., Tarbox, J., & Wilke, A. E. (2001). Use of a multicomponent treatment for food refusal. Journal of Applied Behavior Analysis, 34, 93–96. doi:10.1901/ jaba.2001.34-93 Kale, R. J., Kaye, J. H., Whelan, P. A., & Hopkins, B. L. (1968). The effects of reinforcement on the modification, maintenance, and generalization of social responses of mental patients. Journal of Applied Behavior Analysis, 1, 307–314. doi:10.1901/jaba. 1968.1-307 Kazdin, A. E. (1977). Artifact, bias, and complexity of assessment: The ABCs of reliability. Journal of Applied Behavior Analysis, 10, 141–150. doi:10.1901/ jaba.1977.10-141 Kazdin, A. E. (1978). History of behavior modification. Baltimore, MD: University Park Press. Kazdin, A. E., & Bootzin, R. R. (1972). The token economy: An evaluative review. Journal of Applied Behavior Analysis, 5, 343–372. doi:10.1901/jaba.1972.5-343 Kazdin, A. E., & Polster, R. (1973). Intermittent token reinforcement and response maintenance in extinction. Behavior Therapy, 4, 386–391. doi:10.1016/ S0005-7894(73)80118-2 Kemp, D. C., & Carr, E. G. (1995). Reduction of severe problem behavior in community employment using an hypothesis-driven multicomponent intervention approach. Journal of the Association for Persons With Severe Handicaps, 20, 229–247. Kirby, K. C., & Bickel, W. K. (1988). Toward an explicit analysis of generalization: A stimulus control interpretation. Behavior Analyst, 11, 115–129. Kirby, K. C., & Holborn, S. W. (1986). Trained, generalized, and collateral behavior changes of preschool children receiving gross-motor skills training. Journal of Applied Behavior Analysis, 19, 283–288. doi:10.1901/jaba.1986.19-283 Koegel, R. L., Dunlap, G., & Dyer, K. (1980). Intertrial interval duration and learning in autistic children. Journal of Applied Behavior Analysis, 13, 91–99. doi:10.1901/jaba.1980.13-91 100

Koegel, R. L., & Rincover, A. (1977). Research on the difference between generalization and maintenance in extra-therapy responding. Journal of Applied Behavior Analysis, 10, 1–12. doi:10.1901/jaba.1977.10-1 Kuhn, D. E., Chirighin, A. E., & Zelenka, K. (2010). Discriminated functional communication: A procedural extension of functional communication training. Journal of Applied Behavior Analysis, 43, 249–264. doi:10.1901/jaba.2010.43-249 Kuhn, D. E., DeLeon, I. G., Fisher, W. W., & Wilke, A. E. (1999). Clarifying an ambiguous functional analysis with matched and mismatched extinction procedures. Journal of Applied Behavior Analysis, 32, 99–102. doi:10.1901/jaba.1999.32-99 Lalli, J. S., Livezey, K., & Kates, K. (1996). Functional analysis and treatment of eye poking with response blocking. Journal of Applied Behavior Analysis, 29, 129–132. doi:10.1901/jaba.1996.29-129 Lalli, J. S., Vollmer, T. R., Progar, P. R., Wright, C., Borrero, J., Daniel, D., & May, W. (1999). Competition between positive and negative reinforcement in the treatment of escape behavior. Journal of Applied Behavior Analysis, 32, 285–296. doi:10.1901/jaba.1999.32-285 Laraway, S., Snycerski, S., Michael, J., & Poling, A. (2003). Motivating operations and terms to describe them: Some further refinements. Journal of Applied Behavior Analysis, 36, 407–414. doi:10.1901/jaba. 2003.36-407 Lerman, D. C., & Iwata, B. A. (1995). Prevalence of the extinction burst and its attenuation during treatment. Journal of Applied Behavior Analysis, 28, 93–94. doi:10.1901/jaba.1995.28-93 Lerman, D. C., Tetreault, A., Hovanetz, A., Bellaci, E., Miller, J., Karp, H., & Toupard, A. (2010). Applying signal detection theory to the study of observer accuracy and bias in behavioral assessment. Journal of Applied Behavior Analysis, 43, 195–213. doi:10.1901/ jaba.2010.43-195 Lerman, D. C., & Vorndran, C. (2002). On the status of knowledge for using punishment: Implications for treating behavior disorders. Journal of Applied Behavior Analysis, 35, 431–464. doi:10.1901/ jaba.2002.35-431 Lerman, D. C., Vorndran, C., Addison, L., & Kuhn, S. A. C. (2004). A rapid assessment of skills in young

Applied Behavior Analysis

children with autism. Journal of Applied Behavior Analysis, 37, 11–26. doi:10.1901/jaba.2004.37-11 Lindsley, O. R. (1956). Operant conditioning methods applied to research in chronic schizophrenia. Psychiatric Research Reports, 5, 118–139. Linscheid, T. R., Iwata, B. A., Ricketts, R. W., Williams, D. E., & Griffin, J. C. (1990). Clinical evaluation of the self-injurious behavior inhibiting device (SIBIS). Journal of Applied Behavior Analysis, 23, 53–78. doi:10.1901/jaba.1990.23-53 Lomas, J. E., Fisher, W. W., & Kelley, M. E. (2010). The effects of variable-time delivery of food items and praise on problem on problem behavior reinforced by escape. Journal of Applied Behavior Analysis, 43, 425–435. doi:10.1901/jaba.2010.43-425 Lovaas, O. I., Freitag, G., Gold, V. J., & Kassorla, I. C. (1965). Experimental studies in childhood schizophrenia: Analysis of self-destructive behavior. Journal of Experimental Child Psychology, 2, 67–84. doi:10.1016/0022-0965(65)90016-0 Lovaas, O. I., & Simmons, J. Q. (1969). Manipulation of self-destruction in three retarded children. Journal of Applied Behavior Analysis, 2, 143–157. doi:10.1901/ jaba.1969.2-143 Lutzker, J. R., & Sherman, J. A. (1974). Producing generative sentence usage by imitation and reinforcement procedures. Journal of Applied Behavior Analysis, 7, 447–460. doi:10.1901/jaba.1974.7-447 Maglieri, K. A., DeLeon, I. G., Rodriguez-Catter, V., & Sevin, B. M. (2000). Treatment of covert food stealing in an individual with Prader-Willi syndrome. Journal of Applied Behavior Analysis, 33, 615–618. doi:10.1901/jaba.2000.33-615 Mann, R. A. (1972). The behavior-therapeutic use of contingency contracting to control an adult behavior problem: Weight control. Journal of Applied Behavior Analysis, 5, 99–109. doi:10.1901/jaba.1972.5-99 Marzullo-Kerth, D., Reeve, S. A., Reeve, K. F., & Townsend, D. B. (2011). Using multiple-exemplar training to teach a generalized repertoire of sharing to children with autism. Journal of Applied Behavior Analysis, 44, 279–294. Mazaleski, J. L., Iwata, B. A., Vollmer, T. R., Zarcone, J. R., & Smith, R. G. (1993). Analysis of the reinforcement and extinction components in DRO contingencies with self-injury. Journal of Applied Behavior Analysis, 26, 143–156. doi:10.1901/jaba.1993.26-143

reinforcement for problem behavior. Journal of Applied Behavior Analysis, 43, 119–123. doi:10.1901/ jaba.2010.43-119 Meany-Daboul, M. G., Roscoe, E. R., Bourret, J. C., & Ahearn, W. A. (2007). A comparison of momentary time sampling and partial-interval recording for evaluating functional relations. Journal of Applied Behavior Analysis, 40, 501–514. doi:10.1901/jaba.2007.40-501 Moher, C. A., Gould, D. D., Hegg, E., & Mahoney, A. M. (2008). Non-generalized and generalized conditioned reinforcers: Establishment and validation. Behavioral Interventions, 23, 13–38. doi:10.1002/ bin.253 Mudford, O. C., Taylor, S. A., & Martin, N. T. (2009). Continuous recording and interobserver agreement algorithms reported in the Journal of Applied Behavior Analysis (1995–2005). Journal of Applied Behavior Analysis, 42, 165–169. doi:10.1901/jaba.2009.42-165 Mueller, M. M., Olmi, D. J., & Saunders, K. J. (2000). Recombinative generalization of within-syllable units in prereading children. Journal of Applied Behavior Analysis, 33, 515–531. doi:10.1901/jaba.2000.33-515 Ninness, C., Barnes-Holmes, D., Rumph, R., McCuller, G., Ford, A. M., Payne, R., & Elliott, M. P. (2006). Transformations of mathematical and stimulus functions. Journal of Applied Behavior Analysis, 39, 299–321. doi:10.1901/jaba.2006.139-05 Nutter, D., & Reid, D. H. (1978). Teaching retarded women a clothing selection skill using community norms. Journal of Applied Behavior Analysis, 11, 475–487. doi:10.1901/jaba.1978.11-475 O’Brien, F., & Azrin, N. H. (1972). Developing proper mealtime behaviors of the institutionalized retarded. Journal of Applied Behavior Analysis, 5, 389–399. doi:10.1901/jaba.1972.5-389 Odom, S. L., Chandler, L. K., Ostrosky, M., McConnell, S. R., & Reaney, S. (1992). Fading teacher prompts from peer-initiation interventions for young children with disabilities. Journal of Applied Behavior Analysis, 25, 307–317. doi:10.1901/jaba.1992.25-307 O’Reilly, M., Lang, R., Davis, T., Rispoli, M., Machalicek, W., Sigafoos, J., & Didden, R. (2009). A systematic examination of different parameters of presession exposure to tangible stimuli that maintain problem behavior. Journal of Applied Behavior Analysis, 42, 773–783. doi:10.1901/jaba.2009.42-773

McGinnis, J. C., Friman, P. C., & Carlyon, W. D. (1999). The effect of token rewards on “intrinsic” motivation for doing math. Journal of Applied Behavior Analysis, 32, 375–379. doi:10.1901/jaba.1999.32-375

Pace, G. M., Ivancic, M. T., Edwards, G. L., Iwata, B. A., & Page, T. J. (1985). Assessment of stimulus preference assessment and reinforcer value with profoundly retarded individuals. Journal of Applied Behavior Analysis, 18, 249–255. doi:10.1901/jaba.1985.18-249

McGinnis, M. A., Houchins-Juarez, N., McDaniel, J. L., & Kennedy, C. H. (2010). Abolishing and establishing operation analyses of social attention as positive

Pelios, L., Morren, J., Tesch, D., & Axelrod, S. (1999). The impact of functional analysis methodology on treatment choice for self-injurious and aggressive 101

Lerman, Iwata, and Hanley

behavior. Journal of Applied Behavior Analysis, 32, 185–195. doi:10.1901/jaba.1999.32-185 Peterson, L., Homer, A. L., & Wonderlich, S. A. (1982). The integrity of independent variables in behavior analysis. Journal of Applied Behavior Analysis, 15, 477–492. doi:10.1901/jaba.1982.15-477 Phillips, E. L. (1968). Achievement place: Token reinforcement procedures in a home-style rehabilitation setting for “pre-delinquent” boys. Journal of Applied Behavior Analysis, 1, 213–223. doi:10.1901/ jaba.1968.1-213 Phillips, E. L., Phillips, E. A., Fixsen, D. L., & Wolf, M. M. (1971). Achievement Place: Modification of the behaviors of pre-delinquent boys within a token economy. Journal of Applied Behavior Analysis, 4, 45–59. doi:10.1901/jaba.1971.4-45 Phillips, E. L., Phillips, E. A., Wolf, M. M., & Fixsen, D. L. (1973). Achievement Place: Development of the elected manager system. Journal of Applied Behavior Analysis, 6, 541–561. doi:10.1901/jaba.1973.6-541 Piazza, C. C., Hanley, G. P., & Fisher, W. W. (1996). Functional analysis and treatment of cigarette pica. Journal of Applied Behavior Analysis, 29, 437–450. doi:10.1901/jaba.1996.29-437 Piazza, C. C., Patel, M. R., Gulotta, C. S., Sevin, B. M., & Layer, S. A. (2003). On the relative contributions of positive reinforcement and escape extinction in the treatment of food refusal. Journal of Applied Behavior Analysis, 36, 309–324. doi:10.1901/jaba.2003.36-309 Pierce, K. L., & Schreibman, L. (1994). Teaching daily living skills to children with autism in unsupervised settings through pictorial self-management. Journal of Applied Behavior Analysis, 27, 471–481. doi:10.1901/jaba.1994.27-471 Pilgrim, C., Jackson, J., & Galizio, M. (2000). Acquisition of arbitrary conditional discriminations by young normally developing children. Journal of the Experimental Analysis of Behavior, 73, 177–193. doi:10.1901/jeab.2000.73-177 Porterfield, J. K., Herbert-Jackson, E., & Risley, T. R. (1976). Contingent observation: An effective and acceptable procedure for reducing disruptive behavior of young children in a group setting. Journal of Applied Behavior Analysis, 9, 55–64. doi:10.1901/ jaba.1976.9-55 Repp, A. C., & Karsh, K. G. (1992). An analysis of a group teaching procedure for persons with developmental disabilities. Journal of Applied Behavior Analysis, 25, 701–712. doi:10.1901/jaba.1992.25-701 Resnick, L. B., Wang, M. C., & Kaplan, J. (1973). Task analysis in curriculum design: A hierarchically sequenced introductory mathematics curriculum. Journal of Applied Behavior Analysis, 6, 679–709. doi:10.1901/jaba.1973.6-679 102

Rincover, A., Cook, R., Peoples, A., & Packard, D. (1979). Sensory extinction and sensory reinforcement principles for programming multiple adaptive behavior change. Journal of Applied Behavior Analysis, 12, 221–233. doi:10.1901/jaba.1979.12-221 Roane, H. S., Call, N. A., & Falcomata, T. S. (2005). A preliminary analysis of adaptive responding under open and closed economies. Journal of Applied Behavior Analysis, 38, 335–348. doi:10.1901/jaba.2005.85-04 Rolider, A., & Van Houten, R. (1985). Movement suppression time-out for undesirable behavior in psychotic and severely developmentally delayed children. Journal of Applied Behavior Analysis, 18, 275–288. doi:10.1901/ jaba.1985.18-275 Rosales-Ruiz, J., & Baer, D. M. (1997). Behavioral cusps: A developmental and pragmatic concept for behavior analysis. Journal of Applied Behavior Analysis, 30, 533–544. doi:10.1901/jaba.1997.30-533 Sajwaj, T., Libet, J., & Agras, S. (1974). Lemon-juice therapy: The control of life-threatening rumination in a six-month-old infant. Journal of Applied Behavior Analysis, 7, 557–563. doi:10.1901/jaba.1974.7-557 Schaefer, H. H. (1970). Self-injurious behavior: Shaping “head-banging” in monkeys. Journal of Applied Behavior Analysis, 3, 111–116. doi:10.1901/jaba. 1970.3-111 Schreibman, L. (1975). Effects of within-stimulus and extra-stimulus prompting on discrimination learning in autistic children. Journal of Applied Behavior Analysis, 8, 91–112. doi:10.1901/jaba.1975.8-91 Schroeder, S. R., & Holland, J. G. (1968). Operant control of eye movements. Journal of Applied Behavior Analysis, 1, 161–166. doi:10.1901/jaba.1968.1-161 Schuster, J. W., Gast, D. L., Wolery, M., & Guiltinan, S. (1988). The effectiveness of a constant time-delay procedure to teach chained responses to adolescents with mental retardation. Journal of Applied Behavior Analysis, 21, 169–178. doi:10.1901/jaba.1988.21-169 Serna, L. A., Schumaker, J. B., Sherman, J. A., & Sheldon, J. B. (1991). In-home generalization of social interactions in families of adolescents with behavior problems. Journal of Applied Behavior Analysis, 24, 733–746. doi:10.1901/jaba.1991.24-733 Seymour, F. W., & Stokes, T. F. (1976). Self-recording in training girls to increase work and evoke staff praise in an institution for offenders. Journal of Applied Behavior Analysis, 9, 41–54. doi:10.1901/jaba.1976.9-41 Shore, B. A., Iwata, B. A., Lerman, D. C., & Shirley, M. J. (1994). Assessing and programming generalized behavioral reduction across multiple stimulus parameters. Journal of Applied Behavior Analysis, 27, 371–384. doi:10.1901/jaba.1994.27-371 Sidman, M., & Stoddard, L. T. (1967). The effectiveness of fading in programming a simultaneous

Applied Behavior Analysis

form discrimination for retarded children. Journal of the Experimental Analysis of Behavior, 10, 3–15. doi:10.1901/jeab.1967.10-3 Sidman, M., & Willson-Morris, M. (1974). Testing for reading comprehension: A brief report on stimulus control. Journal of Applied Behavior Analysis, 7, 327–332. doi:10.1901/jaba.1974.7-327 Singh, N. N., Watson, J. E., & Winton, A. S. (1986). Treating self-injury: Water mist spray versus facial screening or forced arm exercise. Journal of Applied Behavior Analysis, 19, 403–410. doi:10.1901/jaba. 1986.19-403 Skiba, E. A., Pettigrew, L. E., & Alden, S. E. (1971). A behavioral approach to the control of thumbsucking in the classroom. Journal of Applied Behavior Analysis, 4, 121–125. doi:10.1901/jaba.1971.4-121 Skinner, B. F. (1966). What is the experimental analysis of behavior? Journal of the Experimental Analysis of Behavior, 9, 213–218. doi:10.1901/jeab.1966.9-213 Speltz, M. L., Shimamura, J. W., & McReynolds, W. T. (1982). Procedural variations in group contingencies: Effects on children’s academic and social behaviors. Journal of Applied Behavior Analysis, 15, 533–544. doi:10.1901/jaba.1982.15-533 Steege, M. W., Wacker, D. P., Cigrand, K. C., Berg, W. K., Novak, C. G., Reimers, T. M., & DeRaad, A. (1990). Use of negative reinforcement in the treatment of self-injurious behavior. Journal of Applied Behavior Analysis, 23, 459–467. doi:10.1901/jaba.1990. 23-459 Stokes, J. V., Luiselli, J. K., & Reed, D. D. (2010). A behavioral intervention for teaching tackling skills to high school football athletes. Journal of Applied Behavior Analysis, 43, 509–512. doi:10.1901/jaba. 2010.43-509 Stokes, J. V., Luiselli, J. K., Reed, D. D., & Fleming, R. K. (2010). Behavioral coaching to improve offensive line pass-blocking skills of high school football athletes. Journal of Applied Behavior Analysis, 43, 463–472. doi:10.1901/jaba.2010.43-463 Stokes, T. F., & Baer, D. M. (1977). An implicit technology of generalization. Journal of Applied Behavior Analysis, 10, 349–367. doi:10.1901/jaba.1977.10-349 Stokes, T. F., Baer, D. M., & Jackson, R. L. (1974). Programming the generalization of a greeting response in four retarded children. Journal of Applied Behavior Analysis, 7, 599–610. doi:10.1901/ jaba.1974.7-599 St. Peter Pipkin, C., Vollmer, T. R., & Sloman, K. N. (2010). Effects of treatment integrity failures during differential reinforcement of alternative behavior: A translational model. Journal of Applied Behavior Analysis, 43, 47–70. doi:10.1901/jaba.2010.43-47 Stricker, J. M., Miltenberger, R. G., Garlinghouse, M., & Tulloch, H. E. (2003). Augmenting stimulus

intensity with an awareness enhancement device in the treatment of finger sucking. Education and Treatment of Children, 26, 22–29. Sumpter, C., Temple, W., & Foster, T. M. (1998). Response form, force, and number: Effects on concurrent-schedule performance. Journal of the Experimental Analysis of Behavior, 70, 45–68. doi:10.1901/jeab.1998.70-45 Sundby, S. M., Dickinson, A., & Michael, J. (1996). Evaluation of a computer simulation to assess subject preference for different types of incentive pay. Journal of Organizational Behavior Management, 16, 45–67. doi:10.1300/J075v16n01_04 Thomas, D. R., Becker, W. C., & Armstrong, M. (1968). Production and elimination of disruptive classroom behavior by systematically varying teacher’s behavior. Journal of Applied Behavior Analysis, 1, 35–45. doi:10.1901/jaba.1968.1-35 Thompson, R. H., Iwata, B. A., Conners, J., & Roscoe, E. M. (1999). Effects of reinforcement for alternative behavior during punishment of self-injury. Journal of Applied Behavior Analysis, 32, 317–328. doi:10.1901/ jaba.1999.32-317 Thompson, R. H., McKerchar, P. M., & Dancho, K. A. (2004). The effects of delayed physical prompts and reinforcement on infant sign language acquisition. Journal of Applied Behavior Analysis, 37, 379–383. doi:10.1901/jaba.2004.37-379 Thompson, T. J., Braam, S. J., & Fuqua, R. W. (1982). Training and generalization of laundry skills: A multiple probe evaluation with handicapped persons. Journal of Applied Behavior Analysis, 15, 177–182. doi:10.1901/jaba.1982.15-177 Tiger, J. H., Hanley, G. P., & Hernandez, E. (2006). A further evaluation of the reinforcing value of choice. Journal of Applied Behavior Analysis, 39, 1–16. doi:10.1901/jaba.2006.158-04 Toole, L. M., Bowman, L. G., Thomason, J. L., Hagopian, L. P., & Rush, K. S. (2003). Observed increases in positive affect during behavioral treatment. Behavioral Interventions, 18, 35–42. doi:10.1002/bin.124 Van Houten, R. (1979). Social validation: The evolution of standards of competency for target behaviors. Journal of Applied Behavior Analysis, 12, 581–591. doi:10.1901/jaba.1979.12-581 Van Houten, R., Nau, P., & Marini, Z. (1980). An analysis of public posting in reducing speeding behavior on an urban highway. Journal of Applied Behavior Analysis, 13, 383–395. doi:10.1901/jaba.1980.13-383 Vladescu, J. C., & Kodak, T. (2010). A review of recent studies on differential reinforcement during skill acquisition in early intervention. Journal of Applied Behavior Analysis, 43, 351–355. doi:10.1901/ jaba.2010.43-351 103

Lerman, Iwata, and Hanley

Vollmer, T. R., & Hackenberg, T. D. (2001). Reinforce ment contingencies and social reinforcement: Some reciprocal relations between basic and applied research. Journal of Applied Behavior Analysis, 34, 241–253. doi:10.1901/jaba.2001.34-241 Vollmer, T. R., & Iwata, B. A. (1991). Establishing operations and reinforcement effects. Journal of Applied Behavior Analysis, 24, 279–291. doi:10.1901/ jaba.1991.24-279 Vollmer, T. R., Iwata, B. A., Zarcone, J. R., Smith, R. G., & Mazaleski, J. L. (1993). The role of attention in the treatment of attention-maintained self-injurious behavior: Noncontingent reinforcement and differential reinforcement of other behavior. Journal of Applied Behavior Analysis, 26, 9–21. doi:10.1901/ jaba.1993.26-9 Vollmer, T. R., Progar, P. R., Lalli, J. S., Van Camp, C. M., Sierp, B. J., Wright, C. S., & Eisenschink, K. J. (1998). Fixed-time schedules attenuate extinctioninduced phenomena in the treatment of severe aberrant behavior. Journal of Applied Behavior Analysis, 31, 529–542. doi:10.1901/jaba.1998.31-529 Vollmer, T. R., Roane, H. S., Ringdahl, J. E., & Marcus, B. A. (1999). Evaluating treatment challenges with differential reinforcement of alternative behavior. Journal of Applied Behavior Analysis, 32, 9–23. doi:10.1901/jaba.1999.32-9 Wacker, D. P., & Berg, W. K. (1983). Effects of picture prompts on the acquisition of complex vocational tasks by mentally retarded individuals. Journal of Applied Behavior Analysis, 16, 417–433. doi:10.1901/ jaba.1983.16-417 Watson, P. J., & Workman, E. A. (1981). The nonconcurrent multiple baseline across individuals design: An extension of the traditional multiple baseline design. Journal of Behavior Therapy and Experimental Psychiatry, 12, 257–259. White, G. D., Nielsen, G., & Johnson, S. M. (1972). Timeout duration and the suppression of deviant behavior in children. Journal of Applied Behavior Analysis, 5, 111–120. doi:10.1901/jaba.1972.5-111

Wolery, M., & Gast, D. L. (1984). Effective and efficient procedures for the transfer of stimulus control. Topics in Early Childhood Special Education, 4, 52–77. doi:10.1177/027112148400400305 Wolery, M., Holcombe, A., Cybriwsky, C., Doyle, P. M., Schuster, J. W., Ault, M. J., & Gast, D. L. (1992). Constant time delay with discrete responses: A review of effectiveness and demographic, procedural, and methodological parameters. Research in Developmental Disabilities, 13, 239–266. doi:10.1016/ 0891-4222(92)90028-5 Wolf, M. M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its heart. Journal of Applied Behavior Analysis, 11, 203–214. doi:10.1901/jaba.1978.11-203 Wolf, M. M., Birnbrauer, J. S., Williams, T., & Lawler, J. (1965). A note on apparent extinction of the vomiting behavior of a retarded child. In L. P. Ullmann & L. Krasner (Eds.), Case studies in behavior modification (pp. 364–366). New York, NY: Holt, Rinehart, & Winston. Wolf, M. M., Risley, T. R., & Mees, H. (1963). Application of operant conditioning procedures to the behavior problems of an autistic child. Behaviour Research and Therapy, 1, 305–312. doi:10.1016/00057967(63)90045-7 Worsdell, A. S., Iwata, B. A., Dozier, C. L., Johnson, A. D., Neidert, P. L., & Thomason, J. L. (2005). Analysis of response repetition as an error-correction strategy during sight-word reading. Journal of Applied Behavior Analysis, 38, 511–527. doi:10.1901/jaba.2005.115-04 Yeaton, W. H., & Bailey, J. S. (1983). Utilization analysis of a pedestrian safety training program. Journal of Applied Behavior Analysis, 16, 203–216. doi:10.1901/ jaba.1983.16-203 Young, J. M., Krantz, P. J., McClannahan, L. E., & Poulson, C. L. (1994). Generalized imitation and response-class formation in children with autism. Journal of Applied Behavior Analysis, 27, 685–697. doi:10.1901/jaba.1994.27-685

Wildman, B. G., Erickson, M. T., & Kent, R. N. (1975). The effect of two training procedures on observer agreement and variability of behavior ratings. Child Development, 46, 520–524.

Zarcone, J. R., Iwata, B. A., Mazaleski, J. L., & Smith, R. G. (1994). Momentum and extinction effects on self-injurious escape behavior and noncompliance. Journal of Applied Behavior Analysis, 27, 649–658. doi:10.1901/jaba.1994.27-649

Williams, G. E., & Cuvo, A. J. (1986). Training apartment upkeep skills to rehabilitation clients: A comparison of task analytic strategies. Journal of Applied Behavior Analysis, 19, 39–51. doi:10.1901/ jaba.1986.19-39

Zarcone, J. R., Iwata, B. A., Smith, R. G., Mazaleski, J. L., & Lerman, D. C. (1994). Reemergence and extinction of self-injurious escape behavior during stimulus (instructional) fading. Journal of Applied Behavior Analysis, 27, 307–316. doi:10.1901/jaba.1994.27-307

104

Chapter 5

Single-Case Experimental Designs Michael Perone and Daniel E. Hursh

Single-case experimental designs are characterized by repeated measurements of an individual’s behavior, comparisons across experimental conditions imposed on that individual, and assessment of the measurements’ reliability within and across the conditions. Such designs were integral to the development of behavioral science. Early work in the field of psychology depended on the analysis of the experiences of one or a few individuals (Ebbinghaus, 1885/1913; Thorndike, 1911; Wertheimer, 1912). The investigator identified a phenomenon (e.g., learning and memory, the law of effect, the phi phenomenon) and pursued experimental arrangements that assessed its reliability and the functional relations among the pertinent variables (e.g., the relation between the length of a series of nonsense syllables and learning curves, recall, and retention; the relation between the consequences of behavior and the rate of the behavior; the relation between an observer’s distance from blinking lights and appearance of movement). Because the research was conducted on the investigators themselves (e.g., the memory work of Ebbinghaus) or on just a few participants (e.g., Thorndike’s cats and Wertheimer’s human observers), the experimental arrangements often involved intensive study, with numerous measurements of behavior recorded while each individual was studied under a variety of conditions. Only after the development of statistical methods for analyzing aggregate data did the focus shift to comparisons across groups of participants, with each group exposed to a single condition (see also Chapter 8, this volume). In the original case, the

“participants” were plants in fields split into plots. The statistical methods were developed to assess the significance of differences in yields of plots of plants treated differently. R. A. Fisher’s (1925) Statistical Methods for Research Workers set the course for the field. Fisher began development of his methods while employed as the statistician at an agricultural experiment station early in his career. The fact that data on large numbers of participants tend to be normally distributed (regardless of whether the participants are plants, people, or other animals) led to the easy adaptation of group statistical methods to research with humans. The standard practice came to emphasize the importance of group means, differences in these means, and the use of statistical tests to draw inferences about the likelihood that the group differences were representative of differences in the populations of interest (e.g., Kazdin, 1999; Perone, 1999). Despite the rise of group statistical methods, single-case designs continued to be used in some important work because they allowed the investigator to study the details of relations among variables as expressed in the behavior of individuals (e.g., Bijou, 1955; Skinner, 1938; Watson, 1913), which resulted in reasonably clear demonstrations of functional relations among the variables being studied (e.g., conditioned startle responses, reinforcement, and schedules of reinforcement). Articulation of the necessary elements of single-case designs, notably in Sidman’s (1960) seminal Tactics of Scientific Research, helped make the designs practically de rigueur in basic research on free-operant behavior

DOI: 10.1037/13937-005 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

107

Perone and Hursh

(Baron & Perone, 1998; Johnston & Pennypacker, 2009; Perone, 1991). Translation of basic laboratory research for application in everyday situations resulted in the further development of how singlecase research designs were to serve applied researchers (Baer, Wolf, & Risley, 1968, 1987; Bailey & Burch, 2002; Barlow, Nock, & Hersen, 2009; Morgan & Morgan, 2009; see Chapter 8, this volume). In this chapter, we describe and provide examples of the various design elements that constitute single-case methods. We begin by considering the fundamental requirement of any experiment— internal validity—and the kinds of obstacles to internal validity that are most likely to be encountered in single-case experiments. Next, we describe a variety of designs, ranging in complexity, that are commonly associated with the single-case approach. Included are designs to study irreversible or reversible changes in behavior, experimental conditions arranged successively or simultaneously, and the effects of one or more independent variables. In each instance, we evaluate the degree to which the design can overcome obstacles to internal validity. Some designs, for practical or ethical reasons, exclude important controls and thus compromise internal validity, but most single-case designs are robust in promoting internal validity. A great strength of the single-case approach is its flexibility, and we describe how single-case designs can be adjusted dynamically, over the course of an experiment, in response to the ongoing pattern of results. We go on to review the commitment of single-case investigators to identifying and taking command of the variables that control behavior. This commitment is expressed in the steady-state strategy that underlies most contemporary single-case research. Finally, we describe how interparticipant replication, a seeming departure from a single-case approach, is needed to assess the degree to which an investigator has succeeded in identifying and controlling relevant variables (see also Chapter 7, this volume). Internal Validity of Single-Case Experiments The essential goal of an experiment is to make valid decisions about causal relations between the 108

variables of interest. When the results of an experiment provide clear evidence that manipulation of the independent variable caused the changes measured in the dependent variable, the experiment is said to have internal validity. Investigators are also concerned with other kinds of validity. Kazdin (1999) and Shadish, Cook, and Campbell (2002) listed construct validity, statistical conclusion validity, and external validity. Of these, external validity, which is concerned with the generality of experimental outcomes across populations, settings, times, or variables, seems to draw the lion’s share of attention from methodologists. This critically important issue is addressed by Branch and Pennypacker in this volume’s Chapter 7. Here we need only say that from the standpoint of experimental design, internal validity takes precedence because it is prerequisite to external validity. Unless an investigator can describe the functional relation between the independent and dependent variables with confidence, worrying about the generality of the relation would be premature. As Campbell and Stanley (1963) put it, “Internal validity is the basic minimum without which an experiment is uninterpretable” (p. 5). (For a thoughtful discussion of the interplay between internal and external validity, see Kazdin, 1999, pp. 35–38, and for a more general discussion considering all four types of validity, see Shadish et al., 2002, pp. 93–102.) Experimental designs are judged largely in terms of how well they promote internal validity. It may be helpful to think of a completed experiment as a kind of argument in which the design and results lead to a conclusion about causality. Internal validity has to do with the persuasiveness of the argument. Consumers of the research—journal reviewers and editors initially—will differ in their susceptibility to the argument, which is why editors weigh the judgments of several reviewers to render a verdict on the validity of an experiment and whether a report of it merits publication. It may also be helpful to remember, as you read this or any other chapter about experimental design, that good design can only foster internal validity; it cannot guarantee it. Internal validity is determined not only by the experimental design but also by the experimental outcomes. Consider, for example, a simple experiment to evaluate a treatment to reduce

Single-Case Experimental Designs

smoking. The investigator begins by taking a few weeks to measure the baseline rate of smoking (e.g., in cigarettes per day). Suppose the treatment is applied, and after a few weeks smoking ceases altogether. Finally, the treatment is withdrawn, that is, the investigator reinstates the baseline conditions. What happens next is critical to an evaluation of the experiment’s internal validity. If smoking recovers, returning to levels near those observed during the initial baseline, the investigator can make a strong inference about the reductive effect of the treatment on smoking. If smoking fails to recover, however, the causal status of the treatment is ambiguous. It might have been the cause of a permanent reduction in smoking, but the evidence is open to alternative accounts. It is possible that some other variable, operating over the course of time, is responsible for the absence of smoking. Fortunately, there are ways to resolve the ambiguity; they are discussed later in the Designs for Irreversible Effects section. The general point remains: A final decision about internal validity must wait until the data have been collected and analyzed and conclusions about the effect of the experimental treatment have been made. Internal validity is fostered by designs that eliminate or reduce the influence of extraneous variables that could compete with the independent variable for control of the dependent variable. The investigator’s design objective is to eliminate such variables— famously labeled by Campbell and Stanley (1963) as threats to internal validity—or, if that is not possible, to equalize their effects across experimental conditions so that they are not confounded with the independent variable. Because single-case experiments compare conditions imposed on an individual, investigators must guard against threats that operate as a function of time or repeated exposure to experimental treatments: history, maturation, testing, and instrumentation. History, in this context, generally refers to the influence of factors outside the laboratory. For example, an increase in the tobacco tax during a smoking cessation study could contribute to a smoker’s success in giving up the habit and inflate the apparent effect of the experimental treatment. Maturation refers to processes occurring within the research participant. As the name implies, they

may be developmental in character; for example, with age, changes in cognitive and social development could affect the efficacy of cartoons as reinforcers. Maturational variables may also involve shorter term processes such as fatigue, boredom, and hunger, and investigators should be aware of these processes even in highly controlled laboratory experiments. Working in the animal laboratory, McSweeney and her colleagues (e.g., McSweeney & Roll, 1993) showed that even when the procedure is held constant, response rates may change systematically over the course of a session. There has been some disagreement about the responsible process (the primary contenders are satiation and habituation; see McSweeney & Murphy, 2000), but from the standpoint of experimental design this disagreement does not matter. What does matter is that any design that compares treatment conditions arranged early and late in a session may confound the conditions with a maturational process. Testing is a concern when repeated exposure to a measurement procedure may, in itself, affect behavior. Investigators who rely on verbal measures may be especially concerned. It is obvious that asking a participant the same questions over and over could lead to stereotyped answers, thus blocking the test’s sensitivity to changes in experimental treatments. It may be less obvious that purely operant procedures are also susceptible to the testing threat. For example, as rats gain experience with fixed-ratio schedules, they tend to acquire increasingly efficient response topographies. Over a series of sessions with a fixed-ratio schedule, these changes in responding will be confounded with the effects of the experimental conditions. Instrumentation is a threat when systematic changes or drift in a measuring device may contaminate the data collected over the course of a study. An investigator may neglect to periodically recalibrate the force required to activate an operandum, for example, or the sensitivity of a computer touch screen may be reduced by repeated use. The instrumentation threat is most likely an issue in research that relies on human observers to collect or code data (Chapter 6, this volume). Prudent investigators will carefully consider both the methods used to train their human observers and those aspects of 109

Perone and Hursh

their experimental protocol that may influence the consistency of the observers’ work. These four time- and experience-related threats to internal validity can be addressed successfully in single-case designs by way of replication. Throughout an experiment, behavior is measured repeatedly so that the effect of the experimental manipulation can be assessed on a nearly continuous basis. Kazdin (1982) emphasized the importance of repeated measurement by calling it the fundamental requirement of single-case designs (p. 104). If the behavioral measures show (a) minimal variation in value across time within each experimental condition, (b) systematic differences across conditions, and (c) comparable values when conditions are replicated, then the experimental manipulation is the most plausible causal factor. With such a pattern of results, the influence of extraneous factors categorized as history, maturation, testing, or instrumentation would appear to be either eliminated or held constant. Designs Next, we turn to some illustrative designs and consider the degree to which they are likely to be successful in addressing threats to internal validity.

Designs Without Replicated Conditions Two simple designs that omit replication of experimental conditions have appeared in the literature. These designs do, however, involve repeated measurement within a condition, allowing investigators to rely on patterns in the results over time to assess the possible impact of an intervention. The intervention-only design (Moxley, 1998) is most useful in situations in which it is unethical to take the time to collect baseline data (as with dangerous or illegal behavior) or it is not feasible (as in instructional situations in which the yet-to-betaught behavior is absent from the participant’s repertoire). The data collected early in the process of intervening serves as a kind of baseline for changes that occur as the intervention proceeds. Changes that are systematic, such as accelerations, decelerations, or changes in variability, are taken as evidence of the intervention’s effectiveness. 110

Considered in the abstract, the intervention-only design would appear to be unacceptably weak in its defense against threats to internal validity. Consider the idealized pattern of results in Figure 5.1. The increase in behavior could be the result of the intervention, but it is also easy to imagine how it might result from, say, historical or maturational factors. Details about the procedure and the independent and dependent variables might lead to a more positive evaluation of the study’s validity. Suppose, for example, the behavior represented in Figure 5.1 is correct operations of a factory machine and the intervention is some procedure for training the correct operation. Suppose also that the machine is unique in both form and operation—nothing similar is available outside the factory training environment. Under these restricted circumstances, attributing the improved performance to the training is plausible. Still, one must admit that the conclusion is limited, and the restricted circumstances needed to support it might be rare. The baseline intervention or A-B design improves on the intervention-only design by adding a true baseline phase. In the idealized results shown in Figure 5.2, a stable behavioral baseline is followed by a conspicuous change that coincides with the intervention. The time course of behavioral change in the intervention phase is similar to that shown in Figure 5.1 for the intervention-only design. The evidence of an intervention effect is strengthened in the A-B design because the intervention results are

Figure 5.1. Idealized results in an intervention-only design. The increase in behavior over the initial values, consistent with the goal or expected effect of the intervention, is taken as evidence that the intervention caused the increase.

Single-Case Experimental Designs

attributed to the treatment. Even a long and stable baseline is no guarantee of internal validity: To the extent that behavioral change is delayed from the onset of the intervention, alternative explanations may become increasingly plausible. In recognition of these sometimes insurmountable limitations of designs without replications, a considerable range of designs has evolved that includes replications.

Designs With Successive Conditions Figure 5.2. Idealized results in a baseline– intervention or A-B design. A stable behavioral baseline is followed by a conspicuous change coincident with the intervention, suggesting that the intervention caused the change.

preceded by a lengthy series of measurements in which change is absent. The causal inference—that the intervention is responsible for the behavioral change—is supported by the fact that the behavior changed only when the intervention was implemented. More generally, an immediate change in level, trend, or variability coincident with the beginning of the intervention is taken as evidence of a possible functional relation between the intervention and the dependent variable. Although the A-B design is an improvement over the intervention-only design, it remains susceptible to history, maturation, and testing effects (and perhaps also to instrumentation effects). The plausibility of these threats to internal validity is exacerbated when the experimental outcomes fall short of the ideal, as is often the case, especially in applied research in which field settings may compromise experimental control of extraneous variables and ethical or clinical concerns may prevent the collection of extended baseline data. The shorter the baseline is, the more the A-B design comes to resemble the intervention-only design. If the baseline measurements are characterized by significant variability, it may be difficult to claim that any change in behavior is clearly coincident with the treatment, which is especially the case if the baseline variability is systematic. For example, if an upward trend is apparent in the baseline, continuation of the trend in the intervention phase cannot with confidence be

A straightforward extension of the A-B design yields a major improvement in promoting internal validity: Simply reinstate the baseline condition after the intervention—an A-B-A design. A common variation is the B-A-B design, in which the intervention is imposed in the first phase, withdrawn in the second, and reinstated in the third. In either case, the underlying logic is the same, and in both the ideal outcome is for behavior to change in some systematic way from the first phase to the second and then return to initial values when the original condition is reinstated. In the A-B-A-B design, the replication of the baseline is followed by a replication of the intervention. If a change occurs in the data patterns that replicates or approximates those of the first intervention phase, the plausibility of history, maturation, testing, or instrumentation effects is reduced even further, and a compelling case can be made for the intervention’s effectiveness. Put simply, the likelihood of other events being responsible for behavioral changes is greatly reduced if the changes occur when and only when the conditions are changed. The A-B-A-B design contains an initial demonstration of an effect (the first A to B change), shows that the effect is likely the result of the intervention (the B to A change), and convinces one of that by replicating the effect (the second A to B change). Figure 5.3 illustrates a possible outcome. The hypothetical data in this particular example fall short of the ideal: The initial baseline is brief, and behavior is still changing when each of the subsequent three conditions is terminated. With such an outcome, the experiment leaves unanswered the ultimate effect of the experimental treatment. Nevertheless, the systematic changes in trend that coincide repeatedly 111

Perone and Hursh

Figure 5.3. Hypothetical results in an A-B-A-B design. The experimental treatments in the two B phases consistently reduce behavior, and reinstatement of the baseline procedure in the second A phase increases behavior. The reversibility of the behavior change in this pattern of results supports causal inferences about the experimental treatment.

with the initiation of the intervention (B) and baseline (A) phases leave no doubt about the causal role of the experimental treatment. It would be highly implausible to claim that something other than the treatment was responsible for reducing the behavior. Many research questions call for a comparison across two or more interventions. Several design options are available. One may use an A-B-A (or A-B-A-B) design in which both the A and B phases involve an experimental treatment. If a conventional baseline is desired, one may use an A-B-A-C-A design or perhaps an A-B-C-B design (in which A designates the conventional baseline and B and C designate distinct interventions). In all of these designs, each condition is imposed for a series of observations so that the effect of each treatment is given sufficient time to become evident (as in Figure 5.3). In basic laboratory research with rats or pigeons, it is not unusual for a condition to be imposed for weeks of daily sessions until behavior stabilizes and the behavioral effect is replicated from one observation to the next (this topic is discussed in the SteadyState Strategy section).

Designs With Simultaneous Conditions Another tactic for comparing interventions involves changing the conditions frequently to assess their relative impacts quickly. For example, a therapist 112

may want to identify the most effective way to get a client to talk more rationally about his or her fears. One approach may be to debunk any irrational talk; another may be to suggest alternative rational ways to talk about fears. The therapist can simply alternate these approaches within or across sessions and observe which approach produces more rational talk. A teacher who wants to know whether the latest approach to spelling is effective may use that approach on some days and the old approach on other days while assessing the students’ spelling performance throughout to decide whether the latest approach is better. This design tactic requires that the outcomes being assessed are likely to be sensitive to such frequent changes and that the experience of one intervention has only minimal impact on the effectiveness of the alternatives. Such designs are called multielement designs (Sidman, 1960; Ulman & Sulzer-Azaroff, 1975). In one variation on this tactic, the alternating-treatments design, two or more treatments are alternated rapidly (Barlow et al., 2009). The operational definition of rapid depends on the experimental context and could involve individual conditions lasting from minutes to days. For example, a therapist may, within a single session, switch back and forth from debunking irrational talk to suggesting alternative rational ways to talk about a client’s fears, or a teacher may spend a week on the old approach to spelling before switching to the latest approach. Figure 5.4 shows a common way to present the results from experiments with an alternatingtreatments design. Results from the experimental treatments are represented by different symbols; the separation of the two functions documents the difference in the treatments’ effectiveness, and more important, the reproducibility of the difference across time attests to the reliability of the effect. When inspecting a graph such as that in Figure 5.4, it is important to remember that the design involves a special kind of reversal, in that the behavior is rising and falling across successive presentations of the two treatments. The highly reliable character of the effects of the two treatments is obscured by the graphing convention: Results from like conditions are connected, even though the data points do not represent successive observations.

Single-Case Experimental Designs

Figure 5.4. Conventional presentation of results in an alternating-treatments design. The lines do not connect the data points in temporal sequence; rather, they connect data points collected under like treatment conditions.

In another multielement design, the experimental treatments are arranged concurrently, with the participant choosing which to access (sometimes called simultaneous-availability-of-all-conditions design [Browning, 1967] or, more commonly in the basic literature, simply a concurrent schedule). In many cases, the goal is to assess preferences. For example, a therapist may ask the client which tactic he or she wants the therapist to use during the session or the teacher may ask students which approach to spelling they want to use that day. A problem arises, however, if the participant’s choices are unconstrained: One treatment may be chosen to the exclusion of the other. Such an outcome may represent a strong preference, but it could also represent happenstance, as when a participant selects a particular option at the outset of an experiment and simply sticks with it. Without adequate exposure to all of the options, it would be inappropriate to draw conclusions about preference or, indeed, even to consider the procedure as arranging a meaningful choice. Procedures have been developed to address this problem and ensure that the participant is regularly exposed to the available treatment conditions. Some investigators devote portions of the experiment to forced-choice procedures that momentarily constrain the participant’s options to a single treatment (e.g., Mazur, 1985). When the concurrent assessment involves schedules of reinforcement, the schedules can be arranged so that reinforcement

rates can be maximized only if the participant occasionally samples all of the schedules (Stubbs & Pliskoff, 1969). We have discussed multielement designs in the context of comparisons across experimental treatments, a design tactic that Sidman (1960) called multielement manipulations. Multielement designs can also be used to measure an experimental treatment’s effect on two or more different response classes or operants, a design tactic that Sidman called multielement baselines. The idea is to arrange the experimental circumstances to generate two or more behavioral baselines more or less simultaneously, which can be accomplished by arranging a multiple schedule or concurrent schedules. Once stable baselines have been established, an experimental treatment is applied to both. For example, a multiple schedule might be arranged with contingencies to engender high rates of behavior in one component and low rates in the other. In one or more experimental conditions, a drug may be administered to discover whether the effect of the drug depends on the baseline rate (e.g., Lucki & DeLong, 1983). Multielement designs have a major strength as well as a significant limitation. Their strength is in promoting internal validity. Because multielement designs allow experimental treatments to be compared almost simultaneously (i.e., within a single session or pair of sessions), the influence of the time-related threats of history, maturation, testing, and instrumentation is equalized across the conditions. Their limitation is that the temporal juxtaposition of the two conditions may generate different effects than the conditions might generate if arranged in isolation from one another—or, put another way, the treatments may interact. The use of signals to demarcate the treatments and foster discrimination between them, as in the concurrent schedule variant, is sometimes intended to reduce the interaction. Another step is to separate the treatments in time; if the treatments are not temporally contiguous, the effect of one treatment is less likely to carry over to the next. In basic laboratory experiments, this separation is effected by interposing time outs between the components of a multiple schedule. In field experiments, the separation may arise in 113

Perone and Hursh

the customary scheme of things—for example, when treatments are alternated across school days or across weekly therapy sessions. There is no guarantee, however, that these steps actually do prevent interacting treatments. The only sure way to allay this concern is to conduct additional research in which each treatment is studied in isolation.

Designs for Irreversible Effects So far, we have considered two general classes of experimental designs. The first consists of the intervention-only and baseline–intervention (A-B) designs. Although these designs may be justifiable under special circumstances, they are undesirable because, in general, they provide little protection against time- and experience-related threats to internal validity. The second class of experimental designs promotes internal validity through replication of experimental conditions. The difference between the two classes can be summarized this way: In an experiment with an A-B design, any change observed from A to B might be related to the experimental treatment, but—depending on the particulars of the experiment—the change might reflect the operation of maturation, history, testing, or instrumentation. Adding replications (e.g., in A-B-A, A-B-A-B, or multielement designs) tests these alternative explanations. Each replication, if accompanied by appropriate changes in behavior, makes it less plausible that something other than the experimental treatment could have caused the changes. To promote internal validity, the designs in the second class require that the participant experience another treatment or a return to baseline. Replicating conditions is not always possible or desirable, however, for several reasons. First, some treatment effects are not likely to disappear simply because the treatment has been discontinued (e.g., the learning of a math fact, reading skill, or social skill that allows the learner access to desirable items or activities). The use of an A-B-A design to assess such an irreversible outcome will yield ambiguous results: When the baseline condition is replicated, the behavior remains unchanged. It is not possible to say whether the outcome is the persistent effect of the treatment or the effect of some other factor. 114

Another problem arises in cases in which a participant’s experience with one treatment has an impact on the effects produced by another treatment (e.g., being taught decoding skills can result in more rapid sight word learning). If the two treatments were compared in an alternating-treatments design, their effects would be obscured. The last problem is ethical rather than logistical: If the treatment effect is beneficial (e.g., reduction in self-injurious behavior), it would be undesirable to withdraw it and return behavior to pretreatment values even if the withdrawal might decisively demonstrate the treatment’s efficacy. Multiple-baseline designs. One way to avoid the practical, ethical, and confounding problems of withdrawing, reversing, or alternating treatments is to arrange for the replication of a treatment’s impact to occur across participants, behaviors, or settings. These multiple-baseline designs (Baer et al., 1968) were developed just for such situations. Data are collected under two or more independent baseline conditions. The baselines often come from more than one participant, but they may also come from the same participant engaged in different behaviors or from the same participant behaving in different settings. Once the baseline behavior is shown to be stable, the experimental treatment is implemented in time-staggered fashion to one baseline (i.e., one participant, behavior, or setting) at a time. Adding the treatment to a second baseline is only done once the impact of the treatment for the first baseline has become obvious. Thus, the baseline data for untreated participants, responses, or settings serves as a control for confounding variables. That is, if changes are observed when and only when the treatment is applied to each of the participants, responses, or settings, it is unlikely that other variables can account for the changes. An idealized pattern of results is shown in Figure 5.5. Some examples may help illustrate the three common variants of the multiple-baseline design. If a teacher has experience suggesting that peer tutoring may help some of her students who struggle with solving equations, that teacher may assign a peer tutor to one struggling student at a time to observe whether each of the struggling students’

Single-Case Experimental Designs

concurrently across more than one participant, class of behavior, or setting. When such frequent measurement is not feasible, multiple-probe designs (Horner & Baer, 1978) are available. These designs differ from multiple-baseline designs in that instead of frequent measurements, only occasional probe measurements are taken. That is, the teacher, parent, or mental health worker mentioned in the examples arranges to measure the outcomes less often. He or she may assess the outcomes only weekly rather than daily, even though the experimental conditions (baseline or treatment) would be implemented continuously.

Figure 5.5. A multiple-baseline design with an experimental treatment imposed in staggered temporal fashion across three independent baselines. The baselines could represent the behavior of different participants, the behavior of one participant in different settings, or the different behaviors of one participant. The strict coincidence between the imposition of the treatment and the appearance of behavior change allows the change to be attributed to the treatment.

equation-solving performance improves when they begin to work with their peer tutor and not before (multiple-baseline design across participants). If a parent has heard from other parents that developing a behavior contract can be a successful means of getting their child to do their chores, that parent may create an initial contract that includes only one chore and then add chores to the contract one at a time as he or she observes that the child’s completion of each chore becomes reliable only after it is added to the contract (multiple-baseline design across behaviors). If a mental health worker serves a client who has difficulty purchasing items, that mental health worker may provide modeling, guidance, and reinforcement for the client’s purchasing skills at a neighborhood convenience store, then provide the same treatment at the supermarket, and if successful there provide the same treatment at the department store across town (multiple-baseline design across settings). All of these multiple-baseline designs require the feasibility of taking frequent measures more or less

Changing-criterion designs. What if the research problem is restricted to just a single baseline—only one participant, one class of behavior, or one setting—and it is not practical or ethical to withdraw or reverse treatment? We have already described two ways to deal with such a situation: the interventiononly design and the A-B design. We have also noted the weaknesses of these designs as regards internal validity. A third option, the changing-criterion design (Hartmann & Hall, 1976), offers better protection against threats to internal validity. This design is well suited to the study of variables that can be implemented progressively. For example, a teacher may use token reinforcers to help a student develop fluency in solving math problems. After measuring the student’s baseline rate of problem solving, the teacher may offer a token if the student’s rate is increased by, say, 10%. Each time the student’s rate of problem solving stabilizes at the new criterion for reinforcement, the criterion is raised. If, as illustrated in Figure 5.6, the student’s performance repeatedly conforms to the succession of increasingly stringent criteria, it is possible to attribute the changes in performance to the changing experimental treatment. As this example implies, changing-criterion designs are especially useful when the goal is to assess treatments designed to shape skilled performances or engender novel behavior (Morgan & Morgan, 2009).

Additional Design Options Two additional classes of single-case designs are commonly used, especially in the basic experimental analysis of behavior. 115

Perone and Hursh

Figure 5.6. A changing-criterion design. Reinforcement is contingent on particular rates of behavior; each time behavior adjusts to a rate criterion, a new criterion is imposed.

Parametric designs. Experiments that compare several levels of a quantitative treatment are said to use parametric designs. The literature on choice (e.g., Chapter 14, this volume) abounds with such designs; for example, in studies of matching, a pigeon may be exposed to a series of conditions that differ in terms of the distribution of food reinforcers across a pair of concurrently available response keys. Across successive conditions, the relative rate of reinforcement might be progressively increased (an ascending order) or decreased (a descending order), or the rates may be imposed in some irregular order. From a design standpoint, the issue is how to dissociate the effects of the experimental variable from the maturation, history, testing, or instrumentation. If an experiment arranges five relative rates in an ascending sequence, the design might be designated an A-B-C-D-E design. It is easy to see that the fundamental logic parallels that of the A-B design, and as such, the design is vulnerable to the same threats to internal validity. If, for example, relative response rates rise across the successive conditions, the outcome may be attributed to the experimental manipulation (response allocations match reinforcer allocations), but alternative explanations in terms of maturation, history, testing, or instrumentation may also be plausible. As actually implemented, however, parametric designs rarely suffer from this problem. Three 116

strategies are commonly used. First, one or more conditions are replicated to separate the effects of the treatment from the effects associated with timing. For example, one could replace a deficient A-B-C-D-E design with an A-B-C-D-E-A design or perhaps an A-B-C-D-E-A-C design. If the rising relative response rates result from some time-related or experiential factor, the rates should continue to rise in the replicated conditions. If, however, the rates revert back to the values observed in the initial A (and C) conditions, one can safely attribute the behavioral effects to the manipulation of relative reinforcement rate. The second strategy is to implement the conditions not in an ascending or descending sequence but rather in an irregular sequence. If response rates rise or fall simply in relation to the temporal position of the condition, the results may be attributed to time-related or experiential factors. If, instead, the rates are systematically related to the levels of the experimental variable (e.g., if response allocations match reinforcer allocations), the most plausible explanation would identify the experimental variable as the causal factor. The last strategy departs from a purely singlecase analysis: Different participants are exposed to the conditions in different orders. For example, one participant may experience an ascending sequence while another experiences a descending sequence and yet a third experiences an irregular sequence, or each participant may receive a different irregular order. If the behavior of all the participants shows the same relation to the experimental variable, despite the variation in the temporal order of the conditions, then it would again appear that the experimental manipulation is responsible. It is beneficial to combine these strategies. For example, one might arrange one or more replicated conditions as part of each participant’s experience, while arranging different sequences of conditions across participants. If a systematic relation between the experimental manipulation and behavior is observed under such circumstances, the case for attributing causality to the experimental manipulation becomes compelling. Yet another approach is to combine the parametric strategy with the A-B-A reversal strategy. An

Single-Case Experimental Designs

investigator might begin the experiment with a baseline Condition A (or treat the first level of the quantitative independent variable as a baseline) and, after stabilizing behavior at each successive level of the quantitative variable (Conditions B, C, etc.), return to the baseline condition. Thus, an A-B-C-D-E design could be replaced with an A-B-A-C-A-D-AE-A design. The obvious disadvantage is the large investment of time in repeating the baseline condition. The advantage is that the effect of each treatment can be evaluated relative to a fixed baseline. Factorial designs. Behavior is controlled by multiple variables at any given moment, and experiments may be designed to analyze such control by including all possible combinations of the levels of two or more independent variables. These factorial designs are ubiquitous in the behavioral and biomedical sciences. They tend to be associated with group statistical traditions—indeed, a staple of graduate training in psychology is to teach the statistical methods of analysis of variance in the context of factorial research designs (e.g., Keppel & Wickens, 2004). Nevertheless, the factorial strategy is by no means restricted to group statistical approaches (Smith, Best, Cylke, & Stubbs, 2000) and is readily used in single-case experiments. As an example, consider an unpublished experiment (Wade-Galuska, Galuska, & Perone, 2004) concerned with variables that affect pausing on fixed-ratio schedules. A pigeon was trained on a multiple-baseline schedule in which 100 pecks on a response key produced either 2-second or 6-second access to mixed grain. Different key colors signaled the two schedule components, designated here as lean (ending in 2-second access to grain) and rich (ending in 6-second access). This arrangement (details are available in Perone & Courtney, 1992) made it possible to study, on a within-session basis, the effects of two factors on the pausing that took place between components: the magnitude of the reinforcer delivered before the pause (the past reinforcer, lean or rich) and the signaled magnitude of the reinforcer to be delivered on completing the next ratio (the upcoming reinforcer, lean or rich). Another factor was manipulated across successive phases of the experiment: The pigeon’s body weight

was 70%, 80%, or 90% of its free-feeding weight. Thus, the experiment had a 2 × 2 × 3 factorial design (two levels of past reinforcer × two levels of upcoming reinforcer × three levels of body weight) and, therefore, 12 combinations of the levels of the three factors. The results are shown in Figure 5.7. Each panel represents one of the body weight conditions. Note that this factor was manipulated quantitatively in an ascending series (70%, 80%, 90%), with a final phase devoted to a replication of the 70% condition. In this way, following the recommendations offered earlier in the Parametric Designs section, the experiment disentangled any confound between time- or experience-related processes and the experimental variable of body weight. Within each panel are the median pauses, calculated over the last 10 sessions of each body weight condition, in each of the four possible combinations of the other two experimental variables, the past and upcoming reinforcer magnitudes. The past reinforcer is shown on the x-axis and the upcoming reinforcer is shown with filled (lean) and unfilled (rich) data points.

Figure 5.7. A factorial design to study three factors that could affect pausing on a fixed-ratio schedule: Past schedule condition (lean [L] or rich [R], represented on the x-axis), upcoming (Upc.) schedule condition (L or R, represented by filled and unfilled circles, respectively), and body weight (expressed as a percentage of free-feeding weight; each weight condition is represented in a different panel). Note the replication of the 70% body weight condition (rightmost panel). The results are from a single pigeon; shown are medians and interquartile ranges of the last 10 sessions of each condition. Data from Wade-Galuska, Galuska, and Perone (2004). 117

Perone and Hursh

Within each condition, pausing was a joint function of the past and upcoming schedules of reinforcement. When the key color signaled that the upcoming schedule would be rich (unfilled circles), the past reinforcer had no apparent effect: Pausing was brief after both lean and rich schedules. When the key color signaled that the upcoming schedule would be lean (filled circles), however, the past reinforcer had a major impact: Pausing was extended after a rich schedule. In other words, the effect of the past reinforcer was bounded by, or depended on, the signaled magnitude of the next reinforcer. When the effect of one factor depends on the level of another factor, the factors are said to interact. In the conventional terminology of factorial research design, the interaction between the past and upcoming magnitudes of reinforcement would be called a two-way interaction. The interaction itself depended on the level of the body weight factor: As body weight was increased, the interaction between the past and upcoming magnitudes of reinforcement was enhanced. This kind of finding constitutes a threeway interaction. Note also that in the final phase of the experiment, replicating the 70% body weight condition reduced the interaction between the magnitudes to the values observed in the first phase. In applied research, the use of single-case factorial designs can also prove beneficial. An example is assessment of the interaction between the type of directive and reinforcement contingencies as they affect participants’ compliance with the directives and disruptive behavior (Richman et al., 2001). This three-experiment sequence first established the effectiveness of various forms of directives, then assessed their effectiveness across situations, and finally assessed the interaction between the forms of the directives and targets of differential reinforcement contingencies. All of the experiments used multielement designs to determine the impact of the independent variables on the outcomes for each of the participants. Factorial designs are prevalent in the behavioral sciences specifically because they provide a framework for describing how multiple variables interact to control behavior. The presence of an interaction sheds light on the boundaries of a variable’s effect 118

and thereby allows for more complete and general descriptions of functional relations between environment and behavior. Flexibility of Implementation A strength of single-case research designs lies in the dynamics of their implementation. The examples we have offered of various single-case designs are merely the usual ways in which the single-case research design strategy is used. It is important to recognize that, in practice, the designs may be modified in response to the pattern of results that emerges as the data are collected. Indeed, this feature of the approach is what led Skinner (1956) to favor single-case designs. It is also possible to combine aspects of the basic single-case designs and even include aspects of group comparisons. This kind of flexibility can be an asset to any program of experimental research. It takes on special significance when the research topic is novel, when the investigator’s ability to exert experimental control is limited by ethical or logistical considerations, and when the goal is to produce an empirically validated therapeutic result for an individual. It is possible that once a behavior is changed, withdrawal of the treatment or reversal of the contingencies in an A-B-A design may not return the behavior to baseline values. From a therapeutic or educational standpoint, this is not a bad thing: In the long run, the therapist or teacher usually wants the participant’s behavior to come under the control of, and be maintained by, the consequences it automatically produces, so that the participant no longer depends on an intervention or treatment (see Chapter 7, this volume). However, from an experimental standpoint, it is a serious problem because it leaves unanswered the question of what caused the behavior to change in the first place: Was it the experimental treatment or some process of maturation, history, testing, or instrumentation? When behavior fails to revert to baseline values in an A-B-A design, the investigator may switch to a multiple-baseline design (if data have been collected for more than one participant, behavior, or setting). Thus, it is advisable for any investigator to consider the feasibility of establishing multiple baselines from the

Single-Case Experimental Designs

beginning, in case the behavior of interest does not return to the baseline value. Multiple-baseline designs have their own set of challenges requiring dynamic decision making by the investigator. Sometimes imposing the experimental treatment on one baseline will be followed by behavioral change not only in the treated baseline but also in the as-yet-untreated baselines. This might reflect the operation of maturation, history, testing, or instrumentation—in other words, it might mean that the treatment is ineffective. Another possibility is that the treatment really is responsible for change, and the effect has spread across the baselines because they are not independent of one another. This threat to internal validity, which Cook and Campbell (1979) called diffusion of treatments, can jeopardize multiple-baseline experiments under several circumstances: (a) In a multiplebaseline across-participants design, all of the participants are in the same environment and may learn by observing the treatment being applied; (b) in a multiple-baseline across-behaviors design, all of the responses are coming from the same participant, and learning one response may facilitate the learning of other responses; or (c) in a multiplebaseline across-settings design, the same participant is responding in all of the settings, and when the response is treated and changed in one setting, it may change in the untreated settings. The antidote, of course, is for the investigator to select participants, behaviors, or settings that experience and logic suggest will be independent of one another. Because experience and logic do not guarantee that an investigator will choose independent participants, behaviors, or settings, it is advisable to select as many baselines as is feasible so that the probability of at least some of them being independent is increased. Interdependence of a few baselines (changes occurring concurrently across those baselines) with independence of other baselines (changes occurring only when treatment is applied) in a multiplebaseline design can be informative. The investigator has the opportunity to inspect the similarities across the baselines that change concurrently and the differences between those baselines and the baselines that change only when the treatment is applied.

These comparisons and contrasts can help to isolate the participant, behavior, and setting variables that interact with the treatment to produce the changes. For example, a teacher modeling tactics for solving various types of math problems may see students solving problems for which the solutions have yet to be modeled. If the teacher is also collecting data on the students’ solving of social studies problems and does not observe those problems being solved until the solutions are modeled, one can make the case for the general effects of modeling problem solutions. This then sets the occasion for designing another investigation to systematically study the features of the modeling of the math problem solutions to determine which features are essential for which types of problems. If having many baselines is not feasible and the investigator faces interdependence of all of the baselines, the possible design tactics include (a) withdrawing the treatment or reversing the contingencies or (b) arranging for a changing criterion within the treatment. The first choice depends on the probability of the behavior’s return to baseline values and the ethical appropriateness of such a tactic. The second choice depends on the feasibility of incorporating the changing criterion into the treatment and the sensitivity of the behavior being measured to such changes. Either tactic, when successful, demonstrates the functional relation between the treatment and the outcomes. They both also set up the rationale for studying the interdependence of the baselines in a subsequent investigation. As with all efforts to investigate natural phenomena, unexpected results help to hone understanding of the phenomena and guide further investigations. Other design combinations may be considered. Withdrawing the treatment or reversing the contingencies in a successful multiple-baseline experiment can probe for the durability of the treatment effects and add another degree of replication should the changes not be durable. Gradually removing components of interventions to assess the importance of each or the intervention’s durability is another variation (a partial withdrawal design; Rusch & Kazdin, 1981). Withdrawing treatment from some participants, responses, or settings (a sequential withdrawal design; Rusch & Kazdin, 1981) to assess the 119

Perone and Hursh

urability of treatment effects is another variation to d be considered depending on the focus of the investigation. The point of all of these additional design options is that although the research question drives the initial selection of the elements of single-case design, once the data collection begins decisions about the next condition are driven by the patterns emerging in the data being collected. Unexpected patterns can and should lead the investigator to ask how best to arrange the next phase of the investigation to ensure that the original or revised research question can be answered unambiguously. Steady-State Strategy Behavioral experiments assess the effect of a treatment by comparing behavior measured during exposure to the treatment with behavior measured without the treatment or, if the experimental logic dictates, with behavior measured during exposure to some other treatment. In a single-case experiment, the conditions are imposed on an individual over some period of time, and behavior is measured repeatedly within each condition. Inferences about the experimental treatment’s effectiveness are usually supported by demonstrations that the difference in behavior observed across the conditions clearly exceeds any variability observed within the conditions. The basic strategy is not unlike the one that underlies conventional tests of statistical inference: The F ratio associated with the analysis of variance is formed by dividing an estimate of variance between experimental groups by an estimate of variance within the groups, and only if the betweengroups variance is large relative to the within-group variance does the investigator conclude that the experimental treatment made a statistically significant difference. The prevailing approach in single-case experiments—the steady-state strategy—is to impose a condition until behavior is more or less stable from one measurement (session, lesson, etc.) to the next. The idea is to fix the environmental variables controlling behavior until the environment–behavior relation reaches equilibrium or, as Sidman (1960) put it, a steady state. 120

At this point, the experimental environment is rearranged to impose the next condition, again until behavior stabilizes.

Strategic Requirements The steady-state strategy has three requirements (Perone, 1994): 1. The investigator must have sufficient control over extraneous variables to allow behavior to stabilize. 2. The investigator must be able to maintain each condition long enough to allow behavior to stabilize; even under ideal laboratory controls, it will take time for behavior to reach a new equilibrium when conditions are changed. 3. The investigator must be able to recognize the steady state when it is achieved. Meeting the first two requirements is not a matter of experimental design; rather, the key issues are scientific understanding and resources, including time and access to participants’ behavior. The investigator must have a reasonable idea of the extraneous variables to be eliminated or held constant to allow the potential effect of the experimental variable to become manifest. The investigator must have the wherewithal to control the extraneous variables, and he or she must have relatively unimpeded access to the behavior of interest: An A-B-A-B design, for example, may require scores of sessions distributed over several months or more if behavior is to be given time to stabilize in each phase. In any given area of study, initial investigations will suffer from gaps in the understanding of the behavioral processes at work and, consequently, of the variables in need of control. Persistent efforts at experimental analysis will pay dividends in identifying the relevant variables and developing the means to control them. Persistence alone, however, cannot provide an investigator the access to behavior that may be needed to execute the steady-state strategy. Much depends on the nature of the topic at hand and the available resources. Investigators of topics in basic research may be in the most advantageous position, especially if they study animals. Not only are they able to control almost every facet of the animal’s

Single-Case Experimental Designs

living arrangements (e.g., diet, housing, light–dark cycles, opportunities to engage conspecifics), they also have unfettered access to the animal’s behavior. Sessions may be conducted daily for months without interruption. Such circumstances are ideal for steady-state research. Special problems arise when human participants replace rats and pigeons (Baron & Perone, 1998). The typical human participant lives, works, plays, eats, drinks, and sleeps outside of the investigator’s influence and is thus exposed to numerous factors that may play a role in the participant’s experimental behavior (only in rare cases do human participants live in the laboratory; for an interesting example, see Bernstein & Ebbesen, 1978). These limitations indicate a need for strong countermeasures, such as experimental manipulations that are “especially forcing” (Morse & Kelleher, 1977; see also Baron & Perone, 1998, pp. 68–69) and increased exposure to the laboratory environment over an extended series of sessions. Unfortunately, human research—when extended access to the participant’s behavior may be needed most—is when such access is most difficult to attain. Monetary incentives can help bring participants to the laboratory for repeated study, of course, but even well-funded investigators will find that the number of sessions that, say, a college student will tolerate is lower than that commonly conducted in research with rats. To address this practical constraint, some investigators arrange brief sessions, sometimes lasting as little as 10 minutes (e.g., Okouchi, 2009), and schedule a series of such sessions each time the participant visits the laboratory. Of course, the duration of the sessions is not the critical issue; rather, the question is whether one can complete an experiment in a few hours in the human laboratory and compare the results to experiments that take months or years in the animal laboratory. The answer will depend on the goals of the research as well as the investigator’s judgment about the size of the anticipated effects and the speed of their onset. Relatively brief experiments can be defended when they are successful in producing stable and reproducible behavioral outcomes within and across participants. Caution is warranted in planning and interpreting such experiments, however, because the behavioral effects of the experimental

manipulations may not always develop according to the investigator’s timetable. Sometimes there is no substitute for prolonged exposure to the contingencies, and what happens in the short term may not predict what happens in the long term (for an illustration, see Baron & Perone, 1998, pp. 50–52). In applied research, logistical and ethical issues magnify the problem of behavioral access. Participants with clinically relevant repertoires may not be available in large numbers, and the nature of their problem behavior may sharply limit the duration of sessions. If the research is conducted in a therapeutic context, addressing the participant’s problem will take priority over purely scientific considerations, and ethical concerns about leaving problem behavior untreated may restrict the nature of the experimental designs as well as the durations of both baseline and treatment conditions. The steady-state strategy works best when behavior is measured repeatedly under controlled experimental conditions imposed long enough for the behavior to reach demonstrable states of equilibrium. The pages of the Journal of the Experimental Analysis of Behavior and the Journal of Applied Behavior Analysis attest that these challenges can be met. It is inevitable, however, that some experiments will fall short. In some cases, conducting single-case experiments in the absence of steady states will still be possible, as suggested by the hypothetical outcomes depicted in Figures 5.2, 5.3, and 5.6. Even in these examples, however, the number of behavioral observations is large. We suggest, therefore, that although single-case experiments may be viable in some cases without steady states, they are not likely to succeed without significant access to the behavior in the form of extensive repeated measurement (for a comprehensive discussion of this issue in the context of applied research, see Barlow et al., 2009, pp. 62–65 and 88–94, and Johnston & Pennypacker, 2009, pp. 191–218). When the efforts to achieve steady states fall short, an investigator may consider the use of statistical tests to discriminate treatment effects from a noisy background of behavioral variability. Many arguments, both pro and con, have been made in this connection (e.g., Ator, 1999; Baron, 1999; Branch, 1999; Crosbie, 1999; Davison, 1999; Kratochwill & Levin, 2010; 121

Perone and Hursh

Perone, 1999; Shull, 1999; Smith et al., 2000; Todman & Dugard, 2001; see Chapters 7 and 11, this volume). We are concerned that reliance on inferential statistics may retard the search for effective forms of control. By comparison, the major advantage of the steadystate strategy is that it fosters the development of strong control. Unsystematic variability (noise or bounce in the data) is addressed by reducing the influence of extraneous factors and increasing the influence of the independent variable. Systematic variability (the trend that occurs in the transition between steady states) is addressed by holding the experimental environment constant until behavior stabilizes. Put simply, the steady-state strategy demands that treatment effects be clarified by improving direct control of behavior.

Stability Criteria The final requirement of the steady-state strategy is that of recognizing the production of a steady state. Various decision rules have been devised for this purpose. These stability criteria are often expressed in mathematical terms and indicate, in one way or another, the kind and amount of variation in behavior that will be acceptable over a series of observations. Commonly used criteria specify (a) the number of sessions or observations to be considered in assessing the evidence of a steady state, (b) that an increasing or decreasing trend must be absent, and (c) how much bounce can be tolerated in the behavior across sessions. If the most recent behavior within a condition (e.g., responding in the last six sessions) is absent of trend and reasonably free from bounce, behavior is said to be stable. Sidman (1960) provided the seminal discussion of stability criteria. Detailed descriptions of stability criteria, with examples from the human and animal literature, can be found in Baron and Perone (1998) and Perone (1991). Perhaps the most important difference among stability criteria is in how they specify the tolerable limits on bounce. Some criteria use relative measures; for example, when considering the most recent six sessions, the mean response rate in the first three sessions and the mean in the last three sessions may differ by no more than 10% of the overall six-session mean. Other criteria may use 122

absolute measures; for example, the mean rates in the first three sessions and last three sessions may differ by no more than five responses per minute. Not all stability criteria are expressed in quantitative terms. In some experiments, steady states are identified by visual inspection of graphed results. In other experiments, each condition is imposed for a fixed number of sessions (e.g., 30), and behavior in the last several sessions (e.g., five) is considered representative of the steady state. As Sidman (1960) noted, the selection of a stability criterion depends on the nature of the experimental question and the investigator’s judgment and experience. The visual stability criterion may be justified, for example, when the investigator’s experience leads to the expectation of large or dramatic changes across conditions. The fixed-time stability criterion works well when a program of research has progressed to the point at which the investigator can confidently predict how many sessions will be needed to achieve a steady state. Even the quantitative criteria—the relative stability criterion and the absolute stability criterion—are specified in light of the experimental question and the investigator’s judgment and experience. In the abstract, divorced from such considerations, it is impossible to say, for example, whether a 10% relative criterion is more or less stringent than a five-responses-per-minute absolute criterion (for a detailed discussion of the relationship between relative and absolute stability criteria, see Perone, 1991, pp. 141–144). The adequacy of a stability criterion is assessed over the course of an experiment. A criterion is adequate, according to Sidman (1960, p. 259), if it yields orderly and replicable functional relations between the independent and dependent variables. In this connection, it is important to recognize that any stability criterion, no matter how stringent, may be met by chance, that is, in the absence of an actual steady state. However, a criterion is highly unlikely to be repeatedly met by chance across the various experimental conditions. Once is not Enough Single-case designs are single because the primary unit of analysis is the behavior of the individual

Single-Case Experimental Designs

organism. Treatment effects are assessed by comparing the individual’s response with different levels of the independent variable, and control is demonstrated by two kinds of replication: (a) the stability of the individual’s behavior from one observation to the next under constant circumstances within a condition and (b) the stability of the change in the individual’s behavior from one experimental condition to another. The single descriptor is misleading in that singlecase research rarely involves just one individual. In addition to the within-participant forms of replication that we have emphasized throughout this chapter, procedures are also replicated across participants. Single-case investigators approach interparticipant replication in two general ways, described by Sidman (1960) as direct replication and systematic replication (see also Chapter 7, this volume). In the context of interparticipant replication, direct replication consists of repeating the experimental procedures with additional participants. A review of any representative sample of the literature of basic or applied behavior analysis will document that direct interparticipant replication is, for all intents and purposes, required to establish the credibility of single-case experimentation—even in basic laboratory research with animals, where control is at its utmost. Why, in a science devoted to the analysis of behavior in the individual organism, should be this so? Interparticipant replication is needed to show that the investigator has been successful in identifying the relevant variables and bringing them under satisfactory degrees of control. Whenever manipulation of an independent variable produces the same kind of behavioral change in a new participant, one grows increasingly confident that the investigator is both manipulating the causal factor and eliminating (or otherwise controlling) the influence of extraneous factors that could obscure the causal relation. What if the attempt at interparticipant replication fails? Suppose, for example, that an A-B-A-B design produces a clear, reliable effect in one participant but not in another? One might be inclined to question the reality of the original result, to declare it a fluke. However, this would be a strategic error of

elephantine proportions. A result that can be replicated on an intraparticipant basis (i.e., from A to B, back to A, and back again to B) cannot be dismissed so easily. The failure to replicate the effect in another participant does not negate the original finding; rather, it unveils the incompleteness of one’s understanding of the original finding. The investigator may have erred in his or her operational definition of the independent variable, his or her control of the independent variable may be defective, or the investigator may have failed to recognize other relevant variables and isolate the experiment from their influence. “If this proves to be the case,” said Sidman (1960, p. 74), “failure of [interparticipant] replication will serve as a spur to further research rather than lead to a simple rejection of the original data.” Systematic replication is an attempt to replicate a functional relation under circumstances that differ from those of the original experiment. The experimental conditions might be imposed in a different order. The range of a parametric variable might be extended. The personal characteristics of a therapeutic agent or teacher might be changed (e.g., from female to male). The classification of the participants might differ (e.g., pigeons might be studied instead of rats or typically developing children instead of those with developmental delays). New behavioral repertoires might be observed (e.g., swimming instead of studying), new stimulus modalities might be activated (e.g., with auditory rather than visual stimuli), or new behavioral consequences might be arranged (e.g., attention instead of edibles or the postponement of a shock instead of the presentation of a food pellet). In this way—by replicating functional relations across a range of individuals, behaviors, and operations—investigators can discover the boundaries of a phenomenon and thereby reach conclusions about its generality. This issue is discussed in more detail in this volume’s Chapter 7. Direct replications are often considered an integral part of a given experiment: The report of a typical experiment incorporates single-case results from several participants, and the similarity of results across the participants is a key feature in assessing the adequacy of control over the variables under 123

Perone and Hursh

study. By comparison, systematic replications are conducted across experiments and, over the course of time, across investigators, laboratories, clinics, and so forth. When investigators refer to research topics or areas of investigation, they are commonly referring to collections of systematic replications that have been designed specifically to address the limits of some behavioral phenomenon (e.g., consider the literatures on choice, conditioned reinforcement, resurgence, and response-independent presentations of previously established reinforcers, so-called “noncontingent reinforcement”). Sometimes systematic replications are a byproduct, rather than the focus, of the research, as when an investigator makes use of a previous finding to pursue a new line of inquiry. For example, the investigator might use fixed-interval schedules of reinforcement as a baseline procedure to evaluate the effects of a drug on temporally controlled behavior. Whatever may be learned about the drug’s effects, the results are likely to extend understanding of fixed-interval behavior. The ubiquity of such research (Sidman [1960] called it the baseline method of systematic replication) is responsible for an array of empirical generalizations—such as schedule effects—that have come to be regarded as foundations of basic and applied behavior analysis. Final Remarks Single-case methods are well suited to the intensive study of behavior at the level of the individual organism. When properly designed and executed, single-case experiments offer robust protection against threats to internal validity. Their ability to support causal inferences certainly matches that of the group comparison methods that dominate the behavioral sciences, despite the small number of participants and the absence of sophisticated statistical tests. Strong methods of direct control over behavior, exemplified by the steady-state strategy, obviate the need for statistical inference, and replication within and across participants attests to the adequacy of the control. The prevalence of single-case designs in basic and applied behavior analysis is not simply a matter of their effectiveness in assessing the effects of 124

experimental treatments or describing functional relations, although their effectiveness is obviously important. There is another less practical, more theoretical reason. Fundamentally, the prevalence of the approach derives from a conception of behavior as an intrinsically intraindividual phenomenon, a continuous reciprocal interaction between an organism and its environment (e.g., Skinner, 1938, 1966). By this reckoning, only methods that respect the individual character of behavior—single-case methods—are valid. The point was made forcefully by Sidman (1960), who essentially ruled that methods based on comparisons across individuals—as in group statistical methods—fall outside the boundaries of behavioral science. To illustrate, Sidman (1960) considered the plight of an investigator interested in the effects of the number of reinforcements during acquisition of behavior on the subsequent extinction of the behavior. It is easy to see the severe limitation of a singlecase experiment that would expose an individual to a series of reinforcement conditions, each followed by an extinction test. Clearly, the individual’s cumulating experience with successive extinctions would be confounded with the effects of the reinforcement variable; one could expect extinction to proceed more rapidly with experience regardless of the number of reinforcements in the acquisition phase before each test. An obvious solution might be to expose separate groups of individuals to the values of the reinforcement variable and combine the results of each group’s single extinction test to construct a function relating the number of reinforcements to the rate of extinction. “But,” said Sidman, the function so obtained does not represent a behavioral process. The use of separate groups destroys the continuity of cause and effect that characterizes an irreversible behavioral process. . . . If it proves impossible to obtain an uncontaminated relation between number of reinforcements and resistance to extinction in a single subject, because of the fact that successive extinctions interact with each other, then the “pure” relation simply does not exist. The solution to

Single-Case Experimental Designs

our problem is to cease trying to discover such a pure relation, and to direct our research toward the study of behavior as it actually exists. . . . The [investigator] should not be deceived into concluding that the group type of experiment in any way provides a more adequately controlled or more generalizable substitute for individual data. (p. 53) For investigators who endorse Sidman’s (1960) position, the adoption of single-case methods is not a pragmatic decision. To the contrary, it is a theoretical commitment: Single-case methods are seen as a defining feature of behavioral science (see also Johnston & Pennypacker, 2009; Sidman, 1990). For these investigators, to depart from single-case methods is to change fields. By investigating functional relations via the single-case approach described in this chapter, a thorough understanding of how and why behavior occurs can be achieved.

References Ator, N. A. (1999). Statistical inference in behavior analysis: Environmental determinants? Behavior Analyst, 22, 93–97. Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91–97. doi:10.1901/jaba.1968.1-91 Baer, D. M., Wolf, M. M., & Risley, T. R. (1987). Some still-current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 20, 313–327. doi:10.1901/jaba.1987.20-313 Bailey, J. S., & Burch, M. R. (2002). Research methods in applied behavior analysis. Thousand Oaks, CA: Sage. Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Single case experimental designs: Strategies for studying behavior change (3rd ed.). Boston, MA: Pearson. Baron, A. (1999). Statistical inference in behavior analysis: Friend or foe? Behavior Analyst, 22, 83–85.

Bijou, S. W. (1955). A systematic approach to an experimental analysis of young children. Child Development, 26, 161–168. Branch, M. N. (1999). Statistical inference in behavior analysis: Some things significance testing does and does not do. Behavior Analyst, 22, 87–92. Browning, R. M. (1967). A same-subject design for simultaneous comparison of three reinforcement contingencies. Behaviour Research and Therapy, 5, 237–243. doi:10.1016/0005-7967(67)90038-1 Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago, IL: Rand McNally. Cook, T. D., & Campbell, D. T. (1979). Quasiexperimentation: Design and analysis issues for field settings. Chicago, IL: Rand McNally. Crosbie, J. (1999). Statistical inference in behavior analysis: Useful friend. Behavior Analyst, 22, 105–108. Davison, M. (1999). Statistical inference in behavior analysis: Having my cake and eating it? Behavior Analyst, 22, 99–103. Ebbinghaus, H. (1913). Memory: A contribution to experimental psychology (H. A. Ruger & C. E. Bussenius, Trans.). New York, NY: Columbia Teachers College. (Original work published 1885) Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh, Scotland: Oliver & Boyd. Hartmann, D. P., & Hall, R. V. (1976). The changing criterion design. Journal of Applied Behavior Analysis, 9, 527–532. doi:10.1901/jaba.1976.9-527 Horner, R. D., & Baer, D. M. (1978). Multiple-probe technique: A variation on the multiple baseline. Journal of Applied Behavior Analysis, 11, 189–196. doi:10.1901/ jaba.1978.11-189 Johnston, J. M., & Pennypacker, H. S. (2009). Strategies and tactics of behavioral research (3rd ed.). New York, NY: Routledge. Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. New York, NY: Oxford University Press. Kazdin, A. E. (1999). Research designs in clinical psychology. New York, NY: Allyn & Bacon. Keppel, G., & Wickens, T. D. (2004). Design and analysis: A researcher’s handbook (4th ed.). Upper Saddle River, NJ: Prentice-Hall.

Baron, A., & Perone, M. (1998). Experimental design and analysis in the laboratory study of human operant behavior. In K. A. Lattal & M. Perone (Eds.), Handbook of research methods in human operant behavior (pp. 45–91). New York, NY: Plenum Press.

Kratochwill, T. R., & Levin, J. R. (2010). Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods, 15, 124–144. doi:10.1037/a0017736

Bernstein, D. J., & Ebbesen, E. B. (1978). Reinforcement and substitution in humans: A multiple-response analysis. Journal of the Experimental Analysis of Behavior, 30, 243–253. doi:10.1901/jeab.1978.30-243

Lucki, I., & DeLong, R. E. (1983). Control rate of response or reinforcement and amphetamine’s effect on behavior. Journal of the Experimental Analysis of Behavior, 40, 123–132. doi:10.1901/jeab.1983.40-123 125

Perone and Hursh

Mazur, J. E. (1985). Probability and delay of reinforcement as factors in discrete-trial choice. Journal of the Experimental Analysis of Behavior, 43, 341–351. doi:10.1901/jeab.1985.43-341

Rusch, F. R., & Kazdin, A. E. (1981). Toward a methodology of withdrawal designs for the assessment of response maintenance. Journal of Applied Behavior Analysis, 14, 131–140. doi:10.1901/jaba.1981.14-131

McSweeney, F. K., & Murphy, E. S. (2000). Criticisms of the satiety hypothesis as an explanation for within-session decreases in responding. Journal of the Experimental Analysis of Behavior, 74, 347–361. doi:10.1901/jeab.2000.74-347

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

McSweeney, F. K., & Roll, J. M. (1993). Responding changes systematically within sessions during conditioning procedures. Journal of the Experimental Analysis of Behavior, 60, 621–640. doi:10.1901/ jeab.1993.60-621 Morgan, D. L., & Morgan, R. K. (2009). Single-case research methods for the behavioral and health sciences. Thousand Oaks, CA: Sage. Morse, W. H., & Kelleher, R. T. (1977). Determinants of reinforcement and punishment. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 174–200). Englewood Cliffs, NJ: Prentice-Hall. Moxley, R. A. (1998). Treatment-only designs and student self-recording as strategies for public school teachers. Education and Treatment of Children, 21, 37–61. [Erratum. (1998). Education and Treatment of Children, 21, 229.] Okouchi, H. (2009). Response acquisition by humans with delayed reinforcement. Journal of the Experimental Analysis of Behavior, 91, 377–390. doi:10.1901/ jeab.2009.91-377 Perone, M. (1991). Experimental design in the analysis of free-operant behavior. In I. H. Iversen & K. A. Lattal (Eds.), Techniques in the behavioral and neural sciences: Vol. 6. Experimental analysis of behavior, Part 1 (pp. 135–171). Amsterdam, the Netherlands: Elsevier. Perone, M. (1994). Single-subject designs and developmental psychology. In S. H. Cohen & H. W. Reese (Eds.), Life-span developmental psychology: Methodological contributions (pp. 95–118). Hillsdale, NJ: Erlbaum. Perone, M. (1999). Statistical control in behavior analysis: Experimental control is better. Behavior Analyst, 22, 109–116. Perone, M., & Courtney, K. (1992). Fixed-ratio pausing: Joint effects of past reinforcer magnitude and stimuli correlated with upcoming magnitude. Journal of the Experimental Analysis of Behavior, 57, 33–46. doi:10.1901/jeab.1992.57-33 Richman, D. M., Wacker, D. P., Cooper-Brown, L. J., Kayser, K., Crosland, K., Stephens, T. J., & Asmus, J. (2001). Stimulus characteristics within directives: Effects on accuracy of task completion. Journal of Applied Behavior Analysis, 34, 289–312. doi:10.1901/ jaba.2001.34-289 126

Shull, R. L. (1999). Statistical inference in behavior analysis: Discussant’s remarks. Behavior Analyst, 22, 117–121. Sidman, M. (1960). Tactics of scientific research: Evaluating experimental data in psychology. New York, NY: Basic Books. Sidman, M. (1990). Tactics: In reply. Behavior Analyst, 13, 187–197. Skinner, B. F. (1938). The behavior of organisms. New York, NY: Appleton-Century-Crofts. Skinner, B. F. (1956). A case history in scientific method. American Psychologist, 11, 221–233. doi:10.1037/ h0047662 Skinner, B. F. (1966). What is the experimental analysis of behavior? Journal of the Experimental Analysis of Behavior, 9, 213–218. doi:10.1901/jeab.1966.9-213 Smith, L. D., Best, L. A., Cylke, V. A., & Stubbs, D. A. (2000). Psychology without p values: Data analysis at the turn of the century. American Psychologist, 55, 260–263. doi:10.1037/0003-066X.55.2.260 Stubbs, D. A., & Pliskoff, S. S. (1969). Concurrent responding with fixed relative rate of reinforcement. Journal of the Experimental Analysis of Behavior, 12, 887–895. doi:10.1901/jeab.1969.12-887 Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York, NY: Macmillan. doi:10.5962/bhl. title.1201 Todman, J. B., & Dugard, P. (2001). Single-case and small-n experimental designs: A practical guide to randomization tests. Mahwah, NJ: Erlbaum. Ulman, J. D., & Sulzer-Azaroff, B. (1975). Multielement baseline design in educational research. In E. Ramp & G. Semb (Eds.), Behavior analysis: Areas of research and application (pp. 371–391). Englewood Cliffs, NJ: Prentice-Hall. Wade-Galuska, T., Galuska, C. M., & Perone, M. (2004). [Pausing during signaled rich-to-lean shifts in reinforcer context: Effects of cue accuracy and food deprivation]. Unpublished raw data. Watson, J. (1913). Psychology as the behaviorist views it. Psychological Review, 20, 158–177. doi:10.1037/ h0074428 Wertheimer, M. (1912). Experimentelle studien uber das sehen von bewegung [Experimental studies of the perception of motion]. Zeitschrift fur Psychologie, 61, 61–265.

Chapter 6

Observation and Measurement in Behavior Analysis Raymond G. Miltenberger and Timothy M. Weil

You can observe a lot by watching. —Yogi Berra, circa 1964 Throughout its history, behavior analysis has focused on building an inductive science that uses behavioral observation techniques to identify functional behavior–environment relations such that alteration of these relations may result in behavior change that is scientifically, and often socially, meaningful. Measuring behavior–environment relations in accurate and reliable ways is thus an integral tool in the process of analyzing and changing behavior. Observable behavior has formal properties or dimensions that can be measured. These behavioral dimensions include frequency, intensity, duration, and latency (e.g., Bailey & Burch, 2002; Miltenberger, 2012). Each of these dimensions affords the observer the opportunity to measure changes in level, trend, and variability when alterations of environmental variables occur naturally or are manipulated under various programmed conditions. Observation and measurement of behavior may take many forms and involve a variety of techniques across practically any setting. In this chapter, we discuss observation and measurement in the field of behavior analysis with a focus on identifying and measuring the target behavior, logistics of observation, recording procedures and devices, reactivity, interobserver agreement (IOA), and ethical considerations. The information discussed in this chapter is relevant to both research and practice in behavior analysis because observation and measurement of behavior is central to both endeavors.

Behavior Regardless of whether the purpose of investigation is research or practice, it is first necessary to define behavior. Many authors define behavior in slightly different terms; however, each stresses an individual’s action or movement. According to Miltenberger (2012), behavior involves the actions of individuals, what people say and do. Malott and Trojan-Suarez (2004) suggested that behavior is anything a dead man cannot do, again suggesting that behavior consists of action or movement. Cooper, Heron, and Heward (2007) said that “behavior is the activity of living organisms. Human behavior is everything people do including how they move and what they say, think, and feel” (p. 25). Finally, Johnston and Pennypacker (1993) stated that behavior is that portion of an organism’s interaction with its environment that is characterized by detectable displacement in space through time of some part of the organism and that results in a measurable change in at least one aspect of the environment. (p. 23) These definitions of behavior are rooted in the traditional characterization of an operant as observable action or movement that has some subsequent effect on (operates on) the environment (Johnston & Pennypacker, 1993, p. 25). Although the Cooper et al. definition of behavior includes thinking and feeling, these actions are nonetheless those of an individual that can be observed and recorded. Therefore, in this chapter we focus on observation and measurement of

DOI: 10.1037/13937-006 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

127

Miltenberger and Weil

behavior that can be detected, and thus recorded, by an observer. In some cases, the observer may be the individual engaging in the behavior. Selecting and Defining Target Behavior The first step in developing a plan for observing and recording behavior is to select and then define appropriate target behavior.

Selecting Target Behavior Target behavior can be selected for a variety of overlapping reasons (see Kazdin, 2010). It may be useful but arbitrary; representative of a broader class of operants; the focus of intervention or educational efforts that occur in a particular setting (e.g., academic performance in a school setting); chosen because it causes impairment in some area of functioning for the individual; of particular concern to the individual or significant others who seek to change the behavior; or chosen because it will prevent the development of future problems (e.g., the promotion of safety skills to prevent injury). When selecting target behavior, three general categories are considered: behavioral deficits, behavioral excesses, and problems of stimulus control. Behavioral deficits are behaviors that need to increase, such as desirable communicative responses for a child with autism who has limited language (e.g., Sundberg & Michael, 2001). Behavioral excesses are behaviors that need to decrease, such as selfinjurious or aggressive behavior emitted by an individual with intellectual disability (e.g., Lerman, Iwata, Smith, & Vollmer, 1994). Problems of stimulus control are present when behaviors occur, but not at the appropriate time or place or in the appropriate context. For example, a child may learn to engage in a safety skill during training but fail to use it when the opportunity arises in the natural environment (e.g., Gatheridge et al., 2004; Himle, Miltenberger, Flessner, & Gatheridge, 2004). Likewise, a child with autism may learn to label an object but be unable to ask for the same object (e.g., failure of tact-to-mand transfer; Wallace, Iwata, & Hanley, 2006). Identifying developmentally appropriate topographies and levels of behavior is also important when selecting 128

the target behavior. For example, in research on stuttering, Wagaman, Miltenberger, and Arndorfer (1993) chose a criterion of 3% or fewer stuttered words as an indication of treatment success on the basis of research showing that as many as 3% of the words spoken by typical speakers were dysfluent. A guiding factor in the selection of a target behavior in applied work is its social significance (Baer, Wolf, & Risley, 1968). Behavior is targeted that will increase the client’s meaningful and effective interactions with the environment. One index of social significance is the assessment of the social validity of the targeted behavior (Wolf, 1978). According to Wolf (1978), one of the three levels of social validity is the degree to which society validates the social significance of the goals of a behavior change procedure. In this regard, the important question posed by Wolf is, “Are the specific behavioral goals really what society wants?” (p. 207). In practice, assessment of the social validity of a target behavior or goal involves asking consumers for feedback on what behavior should be addressed, in what order, and to what extent. Of course, the target behavior selected in this way may possibly have some secondary gain or benefit for the person providing the report and thus may or may not be in the client’s best interest. The behavior analyst must be aware of this possibility and decide with the client, client surrogates, treatment team members, or some or all of these on the target behavior that best serves the client’s interests. Although behavior analysts are interested in the behavior of clients or research participants, they are also interested in the behavior of the implementers carrying out behavior-analytic procedures. The degree to which individuals implement assessment and treatment procedures as planned is referred to as implementation fidelity, procedural fidelity, or treatment integrity (e.g., Gresham, Gansle, & Noell, 1993; Peterson, Homer, & Wonderlich, 1982). Implementation fidelity is important because higher fidelity is associated with better treatment outcomes (e.g., DiGennaro, Martens, & Kleinmann, 2007; DiGennaro, Martens, & McIntyre, 2005; DiGennaroReed, Codding, Catania, & Maguire, 2010; Plavnick, Ferreri, & Maupin, 2010). Implementation fidelity is assessed by observing and recording the behavior

Observation and Measurement in Behavior Analysis

of the implementers as they observe and record the behavior of the clients or research participants and as the implementers carry out intervention procedures. Everything discussed in this chapter applies not only to observing and recording the behavior of clients or research participants, but also to measuring the behavior of the implementers.

Defining the Target Behavior The target behavior should be defined in terms that are objective, clear, and complete (Kazdin, 2010). A behavioral definition must include active verbs describing the individual’s actions, that is, the topography or form of the action being observed. Some behavioral definitions may also include the environmental events that precede (antecedents) or follow (consequences) the behavior. Behavioral definitions cannot include category labels (e.g., aggression) or appeal to internal states or characteristics (e.g., strong willed) but rather must identify the topography of the behavior. A behavioral definition should be easy to read and should suffice as a starting point for the observer to engage in data collection. Once a

behavior analyst begins to observe instances of the behavior, the behavioral definition may be modified on the basis of those observations. Some examples of target behavior definitions used in behavior-analytic research are shown in Table 6.1. Note the precise descriptions of behavior in these examples and the inclusion of the appropriate context for the behavior (e.g., unscripted verbalizations are “verbalizations that were not modeled in the video script but were appropriate to the context of the toy”) or the necessary timing of the behavior in relation to other events (e.g., an acceptance occurs when “the child’s mouth opened . . . within 3 seconds after the food item was held within 1 inch of the mouth”). Logistics of Observation Once the target behavior is identified and defined, the time and place of observation must be determined and the observers identified. These logistics of observation are not insignificant, because the choices of observation periods and observers will determine the quality of the data derived from the observations.

Table 6.1 Examples of Target Behavior Definitions From Published Articles Involving Behavior Analytic Assessment and Treatment Label Empathy

Acceptance Expulsion

Activity engagement Compliance Unscripted verbalizations

Definition “A contextually appropriate response to a display of affect by a doll, puppet, or person that contained motor and vocal components (in any order) and began within 3 s of the end of the display.” (p. 20) “The child’s mouth opened so that the spoon or piece of food could be delivered within 3 s after the food item was held within 1 in. of the mouth.” (p. 329) “Any amount of food (that had been in the mouth) was visible outside the mouth (Joan only) or outside the lip and chin area (Nancy, Jerry, Holly) prior to presentation of the next bite.” (p. 329) “Facial orientation toward activity materials, appropriate use of activity materials, or comments related to the activity.” (p. 178) “The child independently completing or initiating the activity described in the instruction within 10 s.” (p. 535) “Verbalizations that were not modeled in the video script but were appropriate to the context of the toy [that was present].” (p. 47)

Citation Schrandt, Townsend, and Poulson (2009) Riordan, Iwata, Finney, Wohl, and Stanley (1984) Riordan et al. (1984)

Mace et al. (2009) Wilder, Zonneveld, Harris, Marcus, and Reagan (2007) MacDonald, Sacramone, Mansfield, Wiltz, and Ahern (2009)

129

Miltenberger and Weil

Time and Place of Observations Observation periods (the time and place chosen for observation) should be scheduled when the target behavior is most likely to occur (or, in the case of behavioral deficits, when the target behavior should be occurring but is not). In some cases, the target behavior occurs mostly or exclusively in the context of specific events (e.g., behavioral acquisition training in academic sessions or athletic performance), and therefore, observation sessions have to occur at those times. A behavior analyst, however, may be interested in measuring the target behavior in naturalistic settings in which the behavior is, for the most part, free of temporal constraints. In these instances, it is possible to interview the client and significant people to narrow the observation window. In addition, it may be valuable to validate the reports by collecting initial data on the occurrence of the target behavior. For instance, scatterplot assessments, which identify at half-hour intervals throughout the day whether the behavior did not occur, occurred once, or occurred multiple times, may help identify the best time to schedule the observation period (Touchette, Macdonald, & Langer, 1985). In instances in which reports on the occurrence of the target behavior are not available or when the reports are not descriptive enough, the behavior analyst should err on the side of caution and conduct a scatterplot assessment or other initial observations to narrow the observation window. Identifying the times at which the target behavior is most likely to occur is desirable to capture the greatest number of instances. The rationale for observing and recording as many instances of the behavior as possible rests with evaluation of function during assessment and analysis of the effects of the independent variable during intervention. When it is not possible to observe enough instances of the behavior across a number of observation periods to establish clear relations between the behavior and specific antecedents and consequences, treatment implementation may be delayed. With behavior that occurs at a lower rate, a longer time frame of observation may be necessary to establish functional relations. Although a delay to intervention after a baseline may be acceptable in some situations, in others it 130

could be undesirable or unacceptable for the client or significant others, such as teachers or parents. Such delays can sometimes be circumvented by structuring observations in an analog setting to evaluate the effects of likely antecedent and consequent stimuli with the objective of evoking the target behavior. Alternatively, samples of the behavior might be collected in the natural environment at various times to provide a sufficient baseline that would allow making an accurate assessment of function, deciding on an appropriate intervention, or both. Circumstances such as availability of observers and the client’s availability must also be considered. A final consideration in preparing to make observations is to select a placement within the observation environment that permits a full view of the person and the behavior of interest while at the same time minimizing disruptions to the client or others in the environment. In addition, when collecting IOA data, it is important for both observers to see the same events from similar angles and distances but simultaneously maintain their status as independent observers. Depending on the characteristics of the setting, issues may arise, such as walls or columns that impede seeing the behavior and interruptions from staff or other individuals. Disruptions in the environment are also a concern. For example, children in elementary school classrooms are notorious for approaching and interacting with an adult while he or she is observing and recording the target behavior. In addition, if the target child is identified, the other children may cause disruptions by talking to the target child or behaving in otherwise disruptive ways. Any disruption should be recorded so that it can be considered in accounting for variability in the data.

Selecting Observers Most behavioral observations in research and practice are conducted by trained observers. Trained observers may not be needed in some laboratory settings in which permanent products are produced or equipment records the behavior automatically. However, in applied settings, individuals conducting behavioral observations must see the behavior as it occurs so data can be collected onsite or recorded for review at a later time. Individuals who could

Observation and Measurement in Behavior Analysis

conduct behavioral observations in the same setting as the client include participant observers (individuals who are typically present in the environment), nonparticipant observers (data collectors who are external to the typical workings of the environment), or the client whose behavior is being observed (self-monitoring). Participant observers. According to Hayes, Barlow, and Nelson-Gray (1999), participant observers may be used in situations in which a significant other or other responsible party in the setting (e.g., parent, teacher) is available (and trained) to collect data. The primary advantage of including these individuals as observers is that they are already in the environment, which eliminates the potential logistical problems of scheduling nonparticipant observers. In addition, the likelihood of the child or student showing reactivity to observation is lessened because the person being observed is likely to have habituated to the participant observer’s presence over time. A limitation when using participant observers is that the observers may not have time to conduct observations because of their other responsibilities in the setting. One factor to consider when arranging participant observation (and nonparticipant observation) is the possibility of surreptitious observation. With surreptitious observation, the participant observer would not announce or otherwise cue the participant to the fact that a particular observation session is taking place (e.g., Mowery, Miltenberger, & Weil, 2010). For example, in Mowery et al. (2010), graduate students were present in a group home setting to record staff behavior. However, the staff members were told that the students were there to observe and record client behavior as part of a class project (this deception was approved by the institutional review board, and participants were later debriefed). Surreptitious observation leads to less reactivity in the person being observed. For surreptitious observation to occur ethically, the client or participant must consent to observation with the knowledge that he or she will not be told, and may not be aware of, the exact time and place of observation (e.g., Wright & Miltenberger, 1987). The exception would be when a parent or guardian gives consent

for surreptitious observation of a child or a researcher gets institutional review board approval for deception and later debriefs the participants. Nonparticipant observers. When it is either impossible or undesirable to involve a participant observer, nonparticipant observers who are not part of the typical environment are used. For instance, observations in school settings may require that an observer sit in an unobtrusive area of the classroom and conduct observations of the child at various times of the day. Three challenges of having nonparticipant observers involved in data collection are access, scheduling, and cost. Because observations tend to occur while clients are involved in social settings, it may not be permissible to observe because of the potential for disruption or for reasons of confidentiality. In the latter case, when conducting observations of the behavior of a single individual in a group setting such as a classroom, it is typical to require consent of all students in the group because all are present during the observation. This is especially true when observations of minors occur. Because observation periods may be relatively short (especially in the context of research), it may also be difficult to schedule an observer several times a day or week to collect data for only 15 to 60 minutes. In addition, the client’s schedule may restrict when observation may occur. Finally, a significant cost may be associated with the inclusion of skilled data collectors who may need to be hired to fulfill this role. Circumventing excessive costs is possible, however, if student interns or other staff already at the site are available. In addition to the monetary cost of the observers, there is cost in terms of time and effort to train the observers and conduct checks for IOA to ensure consistency in the data collected. Self-monitoring. When the target behavior occurs in the absence of others, it may be useful to have clients observe and record their own behavior. When asking clients to record their own behavior, it is necessary to train them as you would train any observer. Although there are examples of research using data gathered through self-monitoring (e.g., marijuana use [Twohig, Shoenberger, & Hayes, 2007]; disruptive outbursts during athletic performances [Allen, 131

Miltenberger and Weil

1998]; physical activity levels [Van Wormer, 2004]; binge eating [Stickney & Miltenberger, 1999]), self-monitoring is less desirable than observation by another individual because it may be unreliable. If the target behavior occurs in the absence of others, then IOA cannot be assessed. For example, Bosch, Miltenberger, Gross, Knudson, and BrowerBreitweiser (2008) used self-monitoring to collect information on instances of binge eating by young women but could not collect IOA data because binge eating occurred only when the individual was alone. Self-monitoring is best used when it can be combined with periodic independent observations to assess IOA. Independent observations occur when a second observer records the same behavior at the same time but has no knowledge of the other observer’s recording. Thus, the recording of both observers is under the stimulus control of the behavior being observed and is not influenced by the recording behavior of the other observer. When IOA is high, it might indicate that self-monitoring is being conducted with fidelity. It is possible, however, that self-monitoring is conducted with fidelity only under the conditions of another observer being present, but not when the client is alone or away from the other observer. In some instances, it is possible to collect secondary data or product measures that can be used to verify self-monitoring. For instance, researchers measured expired carbon monoxide samples in smoking cessation research (Brown et al., 2008; Raiff, Faix, Turturici, & Dallery, 2010) and tested urine samples in research on substance abuse (Hayes et al., 2004; Wong et al., 2003). Given the potential unreliability of self-monitoring, taking steps to produce the most accurate data possible through self-monitoring is important. Such steps might include making a data sheet or data collection device as easy to use as possible, tying data collection to specific times or specific activities to cue the client to record his or her behavior, having other people in the client’s environment cue the client to conduct self-monitoring, checking with the client frequently by phone or e-mail to see whether self-monitoring is occurring, having the client submit data daily via e-mail or text message, and praising the client for reporting data rather than for the level of the behavior to avoid influencing the data. 132

Even with these procedures in place, clients may still engage in data collection with poor fidelity or make up data in an attempt to please the therapist or researcher. Therefore, self-monitoring that lacks verification should be avoided as a form of data collection whenever possible.

Training Observers Adequate observer training is necessary to have confidence in the data. Observing and recording behavior can be a complex endeavor in which the observer must record, simultaneously or in rapid order, a number of response classes following a specific protocol, often while attending to timing cues (see Sampling Procedures section). Finally, following this routine session after session may lead to boredom and set the occasion for observer drift. Observer drift is the loosening of the observer’s adherence to the behavioral definitions that are used to identify the behavioral topographies to be recorded, a decrease in attending to specific features of the data collection system, or both. When observer drift occurs, the accuracy and reliability of the data suffer, and faulty decisions or conclusions may result (see Kazdin, 1977). One way to train observers is to use behavior skills training (Miltenberger, 2012), which involves providing instructions and modeling, having the observer rehearse the observation and recording procedures, and providing feedback immediately after the performance. Such training occurs first with simulated occurrences of the target behavior in the training setting and then with actual occurrences of the target behavior in the natural environment. Subsequent booster sessions can be conducted in which the necessary training components are used to correct problems. To maintain adequate data collection, it is necessary to reinforce accurate data collection and detect and correct errors that occur. Several factors can influence the fidelity of data collection (Kazdin, 1977). These include the quality of initial training, consequences delivered for the target behavior (Harris & Ciminero, 1978), feedback from a supervisor for accurate data collection (Mozingo, Smith, Riordan, Reiss, & Bailey, 2006), complexity and predictability of the behavior being observed (Mash & McElwee, 1974), and the mere presence of a

Observation and Measurement in Behavior Analysis

supervisor (Mozingo et al., 2006). With these factors in mind, strong initial training and periodic assessment and retraining of observers are recommended for participant observers, nonparticipant observers, and individuals engaging in self-monitoring. Recording Procedures The procedures available for collecting data on targeted behavior are categorized as continuous recording procedures, sampling procedures, and product recording.

Continuous Recording Continuous recording (also called event recording) procedures involve observation and recording of each behavioral event as it occurs during the observation period. Continuous recording will produce the most precise measure of the behavior because every occurrence is recorded. However, continuous recording is also the most laborious method because the observer must have constant contact with the participant’s behavior throughout the observation period. As with all forms of data collection, continuous recording requires the behavior analyst to first identify the dimensions on which to focus. Observers are recommended to initially collect data on multiple dimensions of the behavior (e.g., frequency and duration) to identify the most relevant dimensions and to then wean over the course of observations as the analysis identifies the most important dimensions. For example, in a classroom situation involving academic performance, it may be useful to count the number of math problems completed correctly, latency to initiate the task (and each problem), and the time spent on each problem. If after several observations the observer finds that it takes a while for the child to initiate the task, resulting in a low number of problems completed, focusing on measuring latency to initiate the task and frequency of correct responses may be useful. Next, we describe data collection procedures related to different dimensions of behavior. Although we discuss the procedures separately, various combinations of these procedures may produce important data for analysis that would not be apparent with a focus on a single procedure.

Frequency. Perhaps the most common form of continuous recording is frequency recording: counting the number of occurrences of the target behavior in the observation period (Mozingo et al., 2006). Frequency recording is most appropriate when the behavior occurs in discrete units with fairly consistent durations. In frequency recording, each occurrence of the target behavior (defined by the onset and offset of the behavior) is recorded in the observation period. Frequency data may be reported as total frequency—number of responses per observation session—or converted to rate—number of responses per unit of time (e.g., responses per minute). Total frequency would only be reported if the observation periods were of the same duration over time. The advantage of reporting rate is that the measure is equivalent across observation periods of different durations. Frequency recording requires the identification of a clear onset and offset of the target behavior so each instance can be counted. It has been used with a wide range of target behavior when the number of responses is the most important characteristic of the behavior. Examples include recording the frequency of tics (Miltenberger, Woods, & Himle, 2007), greetings (Therrien, Wilder, Rodriguez, & Wine, 2005), requests (Marckel, Neef, & Ferreri, 2006), and mathematics problems completed (Mayfield & Vollmer, 2007). When it is difficult to discriminate the onset or offset of the behavior or the behavior occurs at high rates such that instances of the behavior cannot be counted accurately (e.g., high-frequency tics or stereotypic behavior), a behavior sampling procedure (i.e., interval or time-sample recording; see below) is a more appropriate recording procedure. As we elaborate on later, in sampling procedures the behavior is recorded as occurring or not occurring within consecutive or nonconsecutive intervals of time, but individual responses are not counted. Four additional methods of recording frequency are frequency-within-interval recording, real-time recording, cumulative frequency, and percentage of opportunities. Each method has advantages over a straight frequency count. Frequency-within-interval recording. One limitation of frequency recording is that it does not 133

Miltenberger and Weil

provide information on the timing of the responses in the observation period. With frequency-withininterval recording, the frequency of the behavior is recorded within consecutive intervals of time to indicate when the behavior occurred within the observation period. To conduct frequency-withininterval recording, the data sheet is divided into consecutive intervals, a timing device cues the observer to the appropriate interval, and the observer records each occurrence of the behavior in the appropriate interval. By providing information on the number of responses and the timing of responses, more precise measures of IOA can be calculated. Real-time recording. Combining features of frequency and duration procedures, real-time recording also allows the researcher to collect information on the temporal distribution of the target behavior over the course of an observation period (Kahng & Iwata, 1998; Miltenberger, Rapp, & Long, 1999). Through use of either video playback or computers in real time, it is possible to record the exact time of onset and offset of each occurrence of the behavior. For discrete momentary responses that occur for 1 second or less, the onset and offset are recorded in the same second. Real-time recording is especially valuable when conducting within-session analysis of behavioral sequences or antecedent–behavior–consequence relations. Borrero and Borrero (2008) conducted realtime observations that included the recording of both target behavior and precursor behavior or events related to the target behavior. These data were then used to construct a moment-to-moment analysis (lagsequential analysis) that provided probability values for the occurrence of the precursor given the target behavior and of the target behavior given the precursor. The probability of a precursor reliably increased approximately 1 second before the emission of the target behavior. In addition, the probability of the target behavior was greatest within 1 second after the precursor behavior or event. The real-time analysis suggested that the precursor behavior or event was a reliable predictor of the target behavior. Additional analysis showed that both the precursor behavior and the target behavior served the same function (e.g., both led to escape from demands). 134

Cumulative frequency. The origins of measuring operant behavior involved the use of an electromechanical data recording procedure that was designed to record instances of behavior cumulatively across time (Skinner, 1956). Each response produced an uptick in the data path as the pen moved across the paper and the paper revolved around a drum. The original paper records of this recording were only about 6 inches wide, and thus the pen used to record responses would, on reaching the top of the paper, reset to the bottom of the paper and continue recording the responses. Increasing slopes indicated higher response rates; horizontal lines indicated an absence of the response. This apparatus for the automatic recording of cumulative frequencies is no longer used, but the usefulness of cumulative response measures persists. In cumulative frequency graphs, data are displayed as a function of time across the x-axis and cumulative frequency along the y-axis. The frequency of responses that occur in a given time period is added to the frequency in the previous time period. Thus, data presented in a cumulative record must either maintain at a particular level (no new responses) or increase (new responses) across time but never decrease. The use of cumulative frequency plots allows one to assess frequency and temporal patterns of responding. Percentage of opportunities. In some cases, recording the occurrence of a response in relation to a specific event or response opportunity is useful. In such cases, the percentage of opportunities with correct responses is more important than the number of responses that occurred. For example, in recording compliance with adult requests, the percentage of requests to which the child responds correctly is more important than the number of correct responses. Ten instances of compliance are desirable if 10 opportunities occur. However, 10 instances of compliance are much less desirable in relation to 30 opportunities. Other examples include the percentage of math problems completed correctly, percentage of free throws made in a basketball game, percentage of signals detected on a radar screen during a training exercise, and percentage of trials in which an item is labeled correctly during language

Observation and Measurement in Behavior Analysis

training. Considering that the number of opportunities might vary in each of these cases, the percentage of opportunities is a more sensitive measure of the behavior than a simple frequency count or a rate measure. When a percentage-of-opportunities measure is used, reporting the number of opportunities as well as the percentage of correct responses is important. If the number of opportunities is substantively different across observations, it may affect the variability of the data and the interpretation of the results. For instance, if on one occasion a child is provided with 10 spelling words and spells eight correctly, the result is 80% correct. The next day, if the two words spelled incorrectly are retested and the child spells one of the words correctly, the second performance result is 50% correct. These data are not comparable because the number of opportunities varied greatly, and inappropriate conclusions could be drawn from the results if only percentages were reported. In these instances, providing an indicator of the number of opportunities to respond in the graphical representation of the data will assist the reader in interpreting the results. Duration. When each response lasts for an extended period of time or does not lend itself to a frequency count (e.g., as in behavior such as reading or play), it may be useful to record the duration of the behavior, that is, the time from its onset to its offset. Duration recording is desirable when the most important aspect of the behavior is the length of time for which it occurs. For example, if the interest were in sustained performance or time on task, duration recording is appropriate. If classroom teachers are concerned with sustained engagement in academic activities, the observer would identify the length of time that engagement is desired (such as in reading) and collect data on the duration of engagement to identify any discrepancy between the target duration and actual performance. Once a discrepancy is determined to exist, programming for successively longer durations could be initiated. Other situations involve a combination of duration and frequency recording, as when the goal is to decrease a young child’s tantrum behavior. If tantrums occur multiple times per day and each tantrum

continues for a number of minutes, recording both frequency and duration will reveal whether tantrums are occurring less often and occurring for shorter periods of time after intervention. Finally, many types of behavior targeted in applied work do not lend themselves readily to frequency counts because they consist of (a) responses that occur rapidly and repetitively over extended periods of time (such as stereotypic behavior), (b) complexes of discrete responses integrated into chains or other higher order units, or (c) both. For rapid, repetitive responses, for which each onset and offset is not easily discriminated, a duration measure can be used. In such cases, a time period in which the behavior is absent can help the observer discriminate the end of one episode and the start of the next. For target behavior consisting of multiple component behaviors, the target behavior might be defined as the entire chain, and a duration measure would then consist of recording the time from the onset of the first response in the chain to the offset of the last response in the chain. Finally, in some instances duration is used to measure a behavior with multiple component responses when it does not make sense to reduce the behavior to a frequency count of its component responses. For example, duration of play would be of greater interest to a parent trying to increase a child’s play time than would a frequency count of the number of steps the child traveled across the playground, went up and down a slide, or moved back and forth on a swing. Latency. Latency is the length of time from the presentation of a discriminative stimulus to the initiation of the behavior. Latency is of interest when the speed of initiation of the behavior is an important feature. For example, latency is the time from the sound of the starter’s pistol to the sprinter’s movement off the starting blocks, the time it takes for a child to initiate a task after the teacher’s request, or the time it takes the wait staff at a restaurant to respond once a customer is seated. When working with a child who does not complete math problems in the allotted time, for example, latency indicates how long it takes the child to initiate the task. By contrast, duration assesses how long it takes the child to complete each problem 135

Miltenberger and Weil

once he or she starts working on it. Depending on the child and the circumstance, one or both dimensions may be an important focus of assessment and intervention. Magnitude. On occasion, evaluating the magnitude or intensity of behavior is useful. One example of response magnitude is the force exerted (e.g., muscle flexion), and another is the loudness of a verbal response (as measured in decibels). Although decreases in frequency, and perhaps duration, of undesirable behaviors such as tantrums or selfinjury are important, a reduction in magnitude may be an important initial goal. In some cases, reductions in magnitude may be observed before substantial decreases occur on other dimensions, such as frequency and duration. Alternatively, magnitude may increase temporarily during an extinction burst before the behavior decreases in frequency, duration, or intensity. Recording magnitude may be valuable when considering recovery from an accident or injury such as a knee injury for a football player. Measurement would pertain to the ability of the affected muscles to exert force after rehabilitation. In these situations, recording magnitude tends to require equipment such as that used by physical therapists to evaluate force. Direct observation of response magnitude may not always measure force, however. Observers can use intensity rating scales to evaluate the magnitude of a response. For instance, given a scale ranging from 1 to 10, a teacher may rate the intensity of each occurrence of an undesirable behavior. In using rating scales, it is important to anchor the points within the scale such that two observers can agree on the level of intensity given a variety of occurrences of the behavior (e.g., 1 = mild whining, 10 = loud screaming, throwing items, and head banging). Although anchoring categories on a scale is considered valuable to decrease the variability in responding across observers, the literature is not clear as to how many individual categories need be defined (Pedhazur & Schmelkin, 1991). Another example in which magnitude can be measured with a rating scale is the intensity of a fear response (Miltenberger, Wright, & Fuqua, 1986; Twohig, Masuda, 136

Varra, & Hayes, 2005) or other emotional responses (Stickney & Miltenberger, 1999). In general, intensity rating scales present issues of both reliability and validity because the ratings that might be assigned to specific instances of behavior may be ambiguous; this is especially true when rating fear or emotional responses because the magnitude of these behaviors can be rated only by the individual engaging in the behavior.

Sampling Procedures It may not always be possible to collect adequate information on the target behavior using continuous recording procedures. When the onset and offset of each instance of the behavior cannot be identified, continuous recording is not possible. Likewise, the constraints imposed by some environments, some target behaviors, or some observers may make continuous recording impossible. For example, the person exhibiting the target behavior might not be continuously in sight of the observer, the target behavior might occur almost exclusively when the individual is alone, or the observer might have other responsibilities that compete with observation. In these instances, it may be desirable, and necessary, to collect samples of the behavior that provide an estimate of the behavior’s true level. Behaviorsampling procedures include interval recording and time-sample recording. In both procedures, the observation period is divided into smaller units of time, and the observer records whether the behavior occurred in each interval. Interval recording. Interval recording involves dividing the observation period into equal consecutive intervals and recording whether the behavior occurred in each. Interval recording is different than frequency recording or frequency-withininterval recording in that an interval is scored once regardless of whether a single instance or multiple instances of a behavior occurred during the interval. In behavior analysis research, intervals are usually short—typically 10 to 15 seconds (DiGennaro et al., 2007; Mace et al., 2009). Short intervals (usually less than 20 seconds) are valuable when behavior occurs at moderate to high frequencies or when multiple topographies of behavior are recorded.

Observation and Measurement in Behavior Analysis

Valuable too are shorter intervals when temporal correlations may yield information on antecedent events and potential maintaining consequences. When interested in the relation between the target behavior and antecedents and consequences, the observer records whether any of the three events occurred in each interval to examine the temporal patterns of the behavior and its potential controlling variables (e.g., Repp & Karsh, 1994). An additional condition under which shorter intervals are valuable is when an understanding of within-session temporal distribution of the behavior is necessary. For example, to determine whether self-injurious behavior is high in a functional analysis condition because of an extinction burst, the behavior analyst identifies whether more intervals were scored early in the session than later in the session (Vollmer, Marcus, Ringdahl, & Roane, 1995). Similar patterns could be discerned with cumulative frequency recording or real-time recording as well. In some applied settings, however, intervals might be much longer—perhaps 15 or 30 minutes (Aikman & Garbutt, 2003)—when behavior occurs less frequently. Under these conditions, it may be difficult to draw useful correlations between antecedent and consequent events and the behavior as well as behavior–behavior relations. Such limitations notwithstanding, longer intervals are typically used for the convenience of data collectors (often participant observers) who can engage in other responsibilities and still collect data. Typically, the observer has a data sheet with consecutive intervals designated for recording, and during the observation period, the observer is prompted with auditory (through headphones so as to not disrupt the ongoing behavior of the observee) or tactile (vibration) cues delivered via a timing device to move from interval to interval while observing and recording the target behavior. As time passes, the observer records the occurrence of the target behavior in the appropriate interval; a blank interval indicates the nonoccurrence of the behavior in that interval. In some cases, a computer is used for data collection, and as the observer records the behavior, the software puts the data into the proper interval. At the end of the observation period, the number of intervals in which the behavior is observed is

divided by the number of observation intervals, and the result is reported as the percentage of intervals in which the behavior occurred. A similar process is used for time-sample recording (described in the next section). The two types of interval recording procedures are partial-interval recording and whole-interval recording. In partial-interval recording, the observer records the occurrence of the target behavior if it occurred at any time within the interval. That is, the interval is scored if the target behavior occurred briefly in the interval or throughout the entire interval. Furthermore, if the onset of the behavior occurs in one interval and its offset occurs in the next, both intervals are scored (e.g., Meidinger et al., 2005). In whole-interval recording, the interval is scored only if the target behavior occurred throughout the entire interval. Whole-interval recording is more useful with continuous behavior (e.g., play) than with discrete or quickly occurring behavior (e.g., a face slap). Typically, whole-interval recording is used when a behavior occurs over longer periods of time, as might be seen with noncompliant behavior or on-task behavior. For example, Athens, Vollmer, and St. Peter Pipkin (2007) recorded duration of ontask behavior in 3-second intervals only if the behavior was present for the entire interval. Time-sample recording. In time-sample recording, the observation period is divided into intervals of time, but observation intervals are separated by periods without observation. Time-sample recording permits the observer to focus on other tasks when not observing the target behavior. For example, the observation period might be divided into 15-second intervals, but observation occurs only at the end of the interval. Likewise, an observation period might be divided into 30-minute intervals, but observation and recording occur only in the last 5 minutes of every 30 minutes. These intervals can be equally divided, as when an observation occurs every 15 minutes, or variably divided to provide some flexibility for the observer (such as a teacher who cannot observe exactly on the quarter hour). The data are displayed as a percentage (the number of intervals with target behavior divided by the number of intervals of observation). 137

Miltenberger and Weil

For instance, if evaluating the extent of social interactions between adolescents on a wing of an inpatient psychiatric facility were desirable, conducting observations every 15 minutes might be possible. In this example, the observer would be engaged in a job-related activity and, when prompted by a timer, look up from his or her work, note whether the target behavior was occurring, and record the result. Data of this sort could identify which of the adolescents tend to engage in social interactions and the typical times at which social interactions are likely to occur. From this sampling approach, it is possible to refine the data collection process toward a more precise measure of behavior. Interval and time-sample recording have benefits and limitations. The benefit of interval recording is that with consecutive observation intervals, no instance of the target behavior is missed during the observation period. The limitation, especially with shorter intervals, is that it requires the continuous attention of, and frequent recording by, the observer, making it difficult for the observer to engage in other activities during the observation period. A limitation of time-sample recording is that because observation intervals are separated by periods without observation, some instances of the target behavior may be missed during the observation period. However, a benefit is that the observer can engage in other activities during the periods between observation intervals, making the procedure more user friendly for participant observers such as teachers or parents. Although interval and time-sample recording procedures are used widely in behavior-analytic research, some authors have cautioned that the results of these sampling procedures might not always correspond highly with data collected through continuous recording procedures in which every behavioral event is recorded (e.g., Rapp et al., 2007; Rapp, Colby-Dirksen, Michalski, Carroll, & Lindenberg, 2008). In summarizing the numerous studies that have compared interval and timesample recording with continuous recording procedures, Rapp et al. (2008) concluded that interval recording tends to overestimate the duration of the behavior, time-sample procedures with small intervals tend to produce accurate estimates of duration, 138

and interval recording with small intervals tends to produce fairly accurate estimates of frequency. Although Rapp et al. provided several suggestions to guide decision making regarding the use of interval and time-sample procedures, they concluded by suggesting that small interval sizes in interval and time-sample procedures are likely to produce the best results.

Product Recording In some cases, the outcome or the product of the behavior may be of interest, either as a primary dependent variable or as a complement to direct observation of the behavior itself. When the behavior changes the physical environment, this product can be recorded as an index of the occurrence of the behavior. In some instances, collecting data on products is valuable because measuring the behavior directly may not be possible. For example, weight is measured in weight-loss programs because measuring the behavior that produces weight loss (i.e., diet and exercise) is usually not feasible. Examples of product recording may include number of academic problems completed or number of units assembled in a factory. In each case, the occurrence of the behavior is not observed directly; rather, the products of the behavior are recorded as an indication of its occurrence. In such cases, a focus on the products of behavior is easier and more efficient than recording the behavioral events as they occur. An important note in recording permanent products is that although the focus is on results, if the results fall short of expected quantity or quality, the focus can then turn to evaluation of the behavior involved in producing the products being measured (Daniels & Daniels, 2004). Beyond measuring the production of tangible items, product recording can be used to measure the physical damage caused by a problem behavior. For example, self-injurious behavior can produce tissue damage such as bruises, lacerations, or other bodily injuries, and product recording could be used to assess the severity of these injuries. Iwata, Pace, Kissel, Nau, and Farber (1990) developed the SelfInjury Trauma Scale to quantify surface injury resulting from self-injurious behavior. Other examples of this type of product recording include the

Observation and Measurement in Behavior Analysis

assessment of the size of a bald area related to chronic hair pulling (Rapp, Miltenberger, & Long, 1998; Rapp, Miltenberger, Long, Elliott, & Lumley, 1998) or the length of fingernails as a measure of nail biting (Flessner et al., 2005; Long, Miltenberger, Ellingson, & Ott, 1999). Still other examples of product recording include a measure of weight or body mass index as an indication of changes in eating (Donaldson & Normand, 2009; see also Young et al., 2006), measuring chemicals in urine samples as a measure of drug ingestion (Silverman et al., 2007), or weighing food before and after a feeding session to assess the amount of food consumed (Kerwin, Ahearn, Eicher, & Swearingin, 1998; Maglieri, DeLeon, Rodriguez-Catter, & Sevin, 2000; Patel, Piazza, Layer, Colemen, & Swartzwelder, 2005). An advantage of product recording is that the observer does not have to be present to record the occurrence of the behavior (Miltenberger, 2012) because the product can be recorded at a more convenient time after the behavior has occurred (e.g., at the end of the class period or after the shift in a factory). A drawback of product recording, especially when used with a group of individuals, is that it might not be possible to determine which person engaged in the behavior that resulted in the product. Perhaps another student completed the academic problems or another worker helped produce the units in the factory (Jessup & Stahelski, 1999). Although product recording is valuable when the interest is in the tangible outcome of the behavior, there must be some way to determine which individual was responsible for the products being measured (e.g., did the urine sample come from the client or someone else?). Another potential problem with some uses of product recording is that it may not identify the behavior that resulted in the product. For example, correct answers to math problems may have been produced by cheating, and weight loss may have been produced through self-induced vomiting rather than an increase in exercise or a reduction in calorie consumption.

Recording Devices Once the appropriate recording procedure has been chosen, the next step is to choose a recording device. Because the observer must record instances

of the behavior as they occur, the observer’s behavior must result in a product that can be used later for purposes of analysis. A recording device allows the observer to produce a permanent product from the observation session. The most commonly used recording device is a data sheet structured for the type of recording procedure being conducted. Figures 6.1, 6.2, and 6.3 show sample data sheets structured for frequency recording, duration recording, and interval recording, respectively. Although data sheets are used most often for data collection, other recording devices, both low tech and high tech, can be used to record instances of the behavior. Several types of low-tech recording devices have been used, such as wrist counters for frequency recording (Lindsley, 1968) or stop watches for duration recording. Still other possibilities include activities as simple as moving beads from one side of a string to the other, placing a coin from one pocket to another, making small tears in a piece of paper, or making a hash mark on a piece of masking tape affixed to the recorder’s sleeve to record frequency (Miltenberger, 2012). In fact, it is feasible to record frequency with whatever may be available in the environment as long as the observer can produce a product that represents the occurrence of the behavior. Although recording on a data sheet is the most frequently used data collection process, with rapidly changing technologies there is a move to identify high-tech methods to streamline and automate data collection (Connell & Witt, 2004; Jackson & Dixon, 2007; Kahng & Iwata, 1998). In applied behavior analysis research, electronic devices such as a personal digital assistant (Fogel, Miltenberger, Graves, & Koehler, 2010) or hand-held or laptop computers (Gravlee, Zenk, Woods, Rowe, & Schulz, 2006; Kahng & Iwata, 1998; Repp, Karsh, Felce, & Ludewig, 1989) are frequently used for data collection. In addition, the use of bar codes and scanners (Saunders, Saunders, & Saunders, 1993) for data collection has been reported. With bar code scanners, an observer holds a small battery-powered scanning device and a sheet of paper with the bar codes ordered according to behavioral topography. When the target behavior is observed, the data collector scans the relevant bar code to record the 139

Miltenberger and Weil

Frequency Data Form Child: James M.

Start Date: 9/15/2010

Observer: R.M.

Primary/Reliability

Setting: Mrs. Johnson’s Class

Instructions: First, indicate date of observation in the far left column. Second, place a tick mark for each occurrence of behavior during the specified academic activity for that day. Definition of behavior:_________________________________________________________ ____________________________________________________________________________ Date

Circle Time

Individual Social Reading Studies

Mathematics

Science

Writing

Daily Total

Figure 6.1. Example of a daily frequency data sheet that involves a breakdown of the frequency of the behavior by curricular areas in a general education classroom setting. Duration Data Form Child: James M.

Start Date: 9/15/2010

Observer: R.M.

Primary/Reliability

Setting: Mathematics

Instructions: First, indicate date of observation in the far left column. Second, identify the start time (onset) and the stop time (offset) for each occurrence of the behavior. Use more than one line if necessary. Definition of behavior:_________________________________________________________ ____________________________________________________________________________ Date

Onset

Offset

Onset

Offset

Onset

Offset

Onset

Offset

Daily Duration

Figure 6.2. An example of a duration data sheet that provides information on the onset and offset of each occurrence of the behavior and the frequency of the behavior each day.

occurrence of the behavior and the time within the observation period. The use of bar codes is, however, only one of several ways to conduct electronic recording of behavior. In one investigation evaluating a shaping procedure to increase the reach of a pole vaulter, a photoelectric beam was used to determine the height of the vaulter’s reach just after planting the pole for the 140

vault (Scott, Scott, & Goldwater, 1997). Another high-tech method of data collection involves software for cell phones. These software applications, colloquially referred to as apps, allow behavior analysts to use the computing power of their phones for data collection. The advantages of this technology are numerous; however, the most obvious benefits are the use of a small, portable device that can

Observation and Measurement in Behavior Analysis

Interval Data Form Child: James M.

Date: 9/15/2010

Setting: Mathematics

Observer: R.M.

Primary/Reliability

Instructions: Place a check mark in the appropriate column to reflect the events that occurred in each 10–s interval. Definition of behavior:_________________________________________________________ ____________________________________________________________________________ 10-s Intervals

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Demand Placed

Aggression

Attention

Ignore

Escape

Figure 6.3. An example of a 10-second interval data sheet (partial or whole) that provides information on the occurrence of the target behavior and probable antecedents and consequences. In this example, the hypothesis is that the aggressive behavior occurs after the delivery of a demand by the teacher. In addition, potential responses by the teacher to the problem behavior are included. When completed, this data sheet will provide information on the temporal relationship between teacher behavior and student problem behavior.

facilitate any form of data collection mentioned thus far and the ability to graph the data. Finally, these graphs can be sent via text message to parents, teachers, or colleagues (Maher, 2009). Undoubtedly, as technology advances, even more high-tech data collection methods will emerge.

Reactivity of Observation A long-standing concern for behavioral researchers is how observation affects performance (e.g., Parsons, 1974). Reactivity is the term used to describe changes in behavior resulting from the act of observing and recording the behavior. Typically, when reactivity occurs, the behavior changes in the

desired direction (e.g., Brackett, Reid, & Green, 2007; Mowery et al., 2010). Several researchers have evaluated the effects of staff reactivity to observations (Boyce & Geller, 2001; Brackett et al., 2007; Codding, Livanis, Pace, & Vaca, 2008; Mowery et al., 2010). Mowery et al. (2010) evaluated staff adherence to a protocol designed to increase the frequency of staff’s positive social initiations during leisure activities with adults with developmental disabilities. They evaluated the effects on staff behavior of having a supervisor absent or a supervisor present in the environment to determine whether reactivity to the supervisor’s presence would occur. Positive social interactions only 141

Miltenberger and Weil

occurred at acceptable levels when the supervisor was present, suggesting that reactivity is an important issue to consider in the valid assessment of staff performance. Considering reactivity to observation in research or clinical settings is important because the target behavior may be influenced not only by the intervention but also by the act of observing. When responding comes under the stimulus control of an observer, the level of behavior in the presence of the observer is likely to be different than in the absence of the observer. Considering that most staff behavior must be performed in the absence of supervision, conducting observations without reactivity to obtain an accurate characterization of the behavior is important. There are a variety of ways in which to minimize reactivity. For instance, making observers a part of the regular environment for several sessions or weeks before actual data collection occurs may result in habituation to the presence of the observers. It is important to keep in mind that habituation to the observer is only likely to occur as long as the observer does not interact with the person being observed and no consequences are delivered by the observer or others who may be associated with the observer. If the setting permits, reactivity may be avoided in other ways. Video monitoring devices mounted unobtrusively in the setting may be used. In instances in which the cameras can be seen by the person being observed, habituation to the presence of the camera is likely to occur in the absence of feedback or consequences contingent on performance (e.g., Rapp, Miltenberger, Long, et al., 1998). In addition, using an observation room equipped with a one-way observation window provides the observer an opportunity to conduct unannounced observations. If the use of an observation window is not possible, the use of confederates may be considered. Although confederates are present in the setting to collect data (unobtrusively), a benign purpose for their presence other than data collection is provided to those being observed. That is, deception is used to conceal the true purpose of the observers’ presence. As mentioned in the Mowery et al. (2010) study, confederates may be used to increase the chances that the data collected are 142

representative of typical levels (the level expected in the absence of observation). Confederates can be any variety of individuals such as a coworker, classmate, spouse, or person external to the setting as seen in Mowery et al. (2010), in which observers were introduced as student social workers who were in the setting to observe individuals with intellectual disabilities. In recent research on abduction prevention, children were observed without their knowledge to assess their safety skills as a confederate approached and attempted to lure them in a store setting (Beck & Miltenberger, 2009). Research on child safety skills training has demonstrated that children are more likely to respond correctly when they are aware of observation than when they are not aware of observation (Gatheridge et al., 2004; Himle, Miltenberger, Gatheridge, & Flessner, 2004). An important caveat is that the use of confederates may raise ethical concerns and should be approached cautiously. The use of video or other inconspicuous monitoring systems may present the same ethical concerns and thus should also be approached with caution. Prior approval is needed when using deceptive covert observation. Additionally, for research purposes such covert observation must be approved by an institutional review board with appropriate debriefing afterward. Interobserver Agreement Within research and practice in applied behavior analysis, accurate data collection is important (see Chapter 7, this volume). Accuracy refers to the extent to which the recorded level of the behavior matches the true level of the behavior (Cooper et al., 2007; Johnston & Pennypacker, 1993; Kazdin, 1977). To evaluate accuracy, a researcher must be able to obtain a measure of the true level of the behavior to compare with the measurement of the behavior produced by the observer. The difficulty arises in obtaining the true level of the behavior, as most recording is done by humans who must discriminate the occurrence of the behavior from nonoccurrences (a stimulus–control problem). A “truer” level of the behavior may be obtained through mechanical means, but equipment also may fail

Observation and Measurement in Behavior Analysis

on occasion and produce errors. Alternatively, automated recording devices may fail to register responses that vary slightly in topography or location. Thus, knowing the true level of the behavior with certainty is impossible, and therefore accuracy is not measured in behavioral research. Instead, behavior analysts train observers so that the data they collect are in agreement with those collected by another observer who has received training in recording the target behavior. Although measuring agreement between observers provides no information about the accuracy of either set of observations, it does improve the believability of the data. That is, when two independent observers agree on every occurrence and nonoccurrence of behavior, one has more confidence that they are using the same definition of the target behavior, observing and recording the same responses, and marking the form correctly (Miltenberger, 2012). If a valid definition of the behavior of interest is being used, then high agreement scores increase the belief that the behavior has been recorded accurately; although, again, accuracy has not been measured. A frequently used measure of agreement between observers is simply the percentage of observations that agree, a measure commonly referred to as IOA. IOA is calculated by dividing the number of agreements (both observers recorded the occurrence or nonoccurrence of the behavior) by the number of agreements plus disagreements (one observer recorded the occurrence of the behavior and the other recorded the nonoccurrence of the behavior) and multiplying the quotient by 100. For an adequate assessment of IOA, two independent data collectors are recommended to be present during at least one third of all observation sessions across all participants and phases of a clinical intervention or research study (Cooper et al., 2007). This level of IOA assessment (one third of sessions) is an arbitrary number, and efforts to maximize the number of assessments that produce strong percentages of agreement should result in greater confidence in the data. Cooper et al. (2007) suggested that research studies maintain 90% or higher IOA but agreed that 80% or higher may be acceptable under some circumstances. Kazdin (2010) offered a different

perspective on the acceptable level of IOA and suggested that the level of agreement that is acceptable is one that indicates to the researcher that the observers are sufficiently consistent in their recording of the behavior, that the behaviors are adequately defined, and that the measures will be sensitive to change in the client’s performance over time. (p. 118) Kazdin suggested that the number and complexity of behaviors being recorded, possible sources of bias, expected level of change in the behavior, and method of computing IOA are all considerations in deciding on an acceptable level of IOA. For example, if small changes in behavior are likely with the intervention, then higher IOA would be demanded. However, if larger changes are expected, then lower levels of IOA might be tolerated. The bottom line is that behavior analysts should strive for levels of IOA as high as possible (e.g., 90% or more) but consider the factors that might contribute to lower levels and make adjustments as warranted by these factors. IOA can be calculated in a variety of ways. How it is computed depends on the dimension of the behavior being evaluated and how it is measured. Next we describe common methods for calculating IOA.

Frequency Recording To calculate IOA on frequency recording, the smaller frequency is divided by the larger frequency. For example, if one observer records 40 occurrences of a target behavior and a second independent observer records 35, the percentage of IOA during that observation session is 35/40, or 87.5%. The limitation of IOA in frequency recording is that there is no evidence that the two observers recorded the same behavioral event even when IOA is high. For example, if one observer recorded nine instances of the behavior and the other observer recorded 10 instances, the two observers ostensibly agreed on nine of the 10 instances for an IOA of 90%. It is possible, however, that the observers were actually recording different instances of the behavior. One way to increase confidence that the two observers were agreeing on specific responses in frequency 143

Miltenberger and Weil

IOA is to collect frequency data in intervals and then compare the frequency in each interval (see Frequency Within Interval section later in this chapter). Dividing the observation period into shorter, equal intervals permits a closer look at the recording of frequencies in shorter time blocks. In this way, there can be more confidence that the observers recorded the same instances of the behavior when agreement is high. To further enhance confidence that observers are recording the same behavioral events, it is possible to collect data on behavior as it occurs in real time. With real-time recording, it is possible to determine whether there is exact agreement on each instance of the behavior.

Real-Time Recording When using real-time recording, the onset and offset of the behavior are recorded on a second-by-second basis. Therefore, IOA can be calculated by dividing the number of seconds in which the two observers agreed that the behavior was or was not occurring by the number of seconds in the observation session. Typically, an agreement on the onset and offset of the behavior can be defined as occurring when both observers recorded the onset or offset at exactly the same second. This form of IOA is the most stringent because agreement is calculated for every second of the observation period (e.g., Rapp, Miltenberger, & Long, 1998; Rapp, Miltenberger, Long, et al., 1998). Alternatively, IOA could be conducted on the frequency of the behavior, but an agreement would only be scored when both observers recorded the onset of the behavior at the same instant or within a small window of time (e.g., within 1 or 2 seconds of each other).

Duration Recording IOA for duration recording is calculated by dividing the smaller duration by the larger duration. For example, if one observer records 90 minutes of break time taken in an 8-hour shift and the reliability observer records 85 minutes, the agreement between observers is 85/90, or 94.4%. The same limitation described earlier for IOA on frequency recording pertains to IOA on duration recording. Although the duration recorded by the two observers may be similar, unless the data are time stamped, 144

there is no evidence that the two observers were recording the same instances of the behavior. Realtime recording is a way to overcome this problem.

Interval and Time-Sample Recording Computing IOA with interval data requires an intervalby-interval check for agreement on the occurrence and nonoccurrence of the behavior throughout the observation period. The number of intervals of agreement is then divided by the number of intervals in the observation period to produce a percentage of agreement. An agreement is defined as an interval in which both observers had a marked interval (indicating that the behavior occurred) or an unmarked interval (indicating that the behavior did not occur). Using only one target behavior for this example, consider a 10-minute observation session with data recorded at 10-second intervals (60 intervals total). If the number of intervals of observation with agreements is 56 of 60, the percentage of IOA is 56/60, or 93.3%. Two variations of IOA calculations for interval recording, which correct for chance agreement with low-rate and high-rate behavior, are occurrenceonly and nonoccurrence-only calculations. The occurrence-only calculation is used with low-rate behavior (from which chance agreement on nonoccurrence is high) and involves calculating IOA using only agreements on occurrence and removing agreements on nonoccurrence from consideration (agreements on occurrence divided by agreements plus disagreements on occurrence). The nonoccurrenceonly calculation is used with high-rate behavior (from which chance agreement on occurrence is high) and involves calculation of IOA using only agreements on nonoccurrence and removing agreements on occurrence from consideration (agreements on nonoccurrence divided by agreements plus disagreements on nonoccurrence).

Cohen’s Kappa Kappa is another method of calculating observer agreement, but it corrects for the probability that two observers will agree as a result of chance alone. Kappa is computed using the following formula: =

Po − Pc , 1 − Pc

Observation and Measurement in Behavior Analysis

where Po is the proportion of agreement between observers (sum of agreements on occurrences and nonoccurrences divided by the total number of intervals) and Pc is the proportion of agreement expected by chance. The latter may be obtained using the following formula:  ( O1o )( O2o )  +  ( O1n )( O2 n )  Pc =  , I2 where O1o is the number of occurrences recorded by Observer 1, O2o is the number of occurrences recorded by Observer 2; O1n and O2n are nonoccurrence counts, and I is the number of observations made by each observer. For example if Observer 1 scored nine intervals out of 10 and Observer 2 scored eight intervals out of 10 (see Exhibit 6.1), kappa would be calculated as follows: Po = .90; PC = (72 + 2)/102 = .74, κ = .62. Kappa values can range from −1 to 1, with 0 reflecting a chance level of agreement. No single rule for interpreting an obtained kappa value may be given because the number of different categories into which behavior may be classified will affect kappa. If only two categories are used (e.g., occurrence vs. nonoccurrence), then the probability of a chance agreement is higher than if more categories had been used. Higher probabilities of chance agreement are reflected in lower kappa values. Thus, if the preceding example had used three categories (e.g., slow-, medium-, or high-rate responding) and IOA had been the same (90%), then kappa would have been more than .62.

Exhibit 6.1 Recordings for Observer 1 and Observer 2 in 10 Observation Intervals Interval 1

2

3

4

5 6 Observer 1

7

8

9

10

X

X

X

X

X Observer 2

X

X

X

X

X

X

X

X

X

X

X

X

Kappa can be affected by other factors, including the distribution of the ratings of the observers (Sim & Wright, 2005). Because these latter factors have little to do with the degree to which two observers agree, there is little agreement on what constitutes an acceptable kappa value. Within the social sciences, Kazdin (2010) suggested that a kappa value of 0.7 or higher reflects an acceptable level of agreement. Lower criteria for acceptable kappa values may be found (e.g., Fleiss, 1981; Landis & Koch, 1977), but these are as arbitrary as the cutoff suggested by Kazdin (von Eye & von Eye, 2008). Perhaps for this reason, kappa is less often used by applied behavior analysts than is IOA.

Frequency Within Interval Calculating IOA for frequency within interval minimizes the limitation identified for IOA on frequency recording (high agreement even though the two observers might be recording different instances of behavior). For example, if agreement is calculated within each of a series of 20-second intervals, then there is no chance that a response recorded by Observer A in Interval 1 will be counted as an agreement with a different response recorded by Observer B in Interval 12. To calculate frequency-within-interval agreement, calculate a percentage of agreement between observers for each interval (smaller number divided by larger number), sum the percentages for all the intervals, and divide by the number of intervals in the observation period. Exhibit 6.2 illustrates an IOA calculation for frequency-withininterval data for 10 intervals for two observers. Each X corresponds to an occurrence of the behavior in an interval.

Ethical Considerations Several ethical issues must be considered when conducting observations and measurement as part of research or clinical practice in applied behavior analysis (e.g., Bailey & Burch, 2005). First, a behavior analyst should observe and record the person’s behavior only after receiving written consent from the individual or the individual’s parent or guardian. As part of the consent process, the individual must be apprised of and agree to the ways in which the data will be used (research presentation or publication, 145

Miltenberger and Weil

Exhibit 6.2 Frequency-Within-Interval Recordings for Two Observers and Interobserver Agreement Calculation Interval 1

2

3

4

5 6 Observer 1

7

8

9

10

XXX 67%

X 100%

100%

XX 100%

XX 100% 100% Observer 2

X 50%

XXX 100%

XX 50%

100%

XX

X

XX

XXX

X

XX

XX

Note. Interobserver agreement = (67 + 100 + 100 + 100 + 100 + 100 + 50 + 100 + 50 + 100)/10 = 86.7%.

clinical decision making). If the behavior analyst identifies new uses for the data after the original consent has been obtained, new consent must be obtained from the individual for the new ways in which the data will be used. Second, the individual must know when and where observation will take place unless the individual provides written consent for surreptitious or unannounced observation. Third, observation and recording must take place in such a way that confidentiality is maintained for the individual receiving services or participating in research. To maintain confidentiality, the observer must not draw attention to the person being observed and must not inform any other people about the observations unless the individual being observed has provided written permission to do so. In addition, behavior analysts must use pseudonyms and disguise other identifying information in presentations and publications. Fourth, observers must treat the individual being observed and others in the setting with dignity and respect at all times during the course of their participation in research or as they are receiving clinical services. Summary Observation and measurement are at the heart of applied behavior analysis because behavior (and its controlling variables) is the subject matter of both research and practice. As discussed in this chapter, 146

adequate measurement of behavior requires clear definitions of the target behavior, precise specifications of recording logistics and procedures, appropriate choice of recording devices, and consideration of reactivity and IOA. The validity of conclusions that can be drawn from experimental manipulations of controlling variables or evaluations of treatment effectiveness depends on the adequacy of the observation and measurement of the behaviors targeted in these endeavors.

References Aikman, G., & Garbutt, V. (2003). Brief probes: A method for analyzing the function of disruptive behaviour in the natural environment. Behavioural and Cognitive Psychotherapy, 31, 215–220. doi:10.1017/S1352465 803002108 Allen, K. D. (1998). The use of an enhanced simplified habit reversal procedure to reduce disruptive outbursts during athletic performance. Journal of Applied Behavior Analysis, 31, 489–492. doi:10.1901/ jaba.1998.31-489 Athens, E. S., Vollmer, T. R., & St. Peter Pipkin, C. C. (2007). Shaping academic task engagement with percentile schedules. Journal of Applied Behavior Analysis, 40, 475–488. doi:10.1901/jaba.2007.40-475 Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91–97. doi:10.1901/jaba.1968.1-91 Bailey, J. S., & Burch, M. R. (2002). Research methods in applied behavior analysis. Thousand Oaks, CA: Sage.

Observation and Measurement in Behavior Analysis

Bailey, J. S., & Burch, M. R. (2005). Ethics for behavior analysts. Mahwah, NJ: Erlbaum.

negative reinforcement: Effects on teacher and student behavior. School Psychology Review, 34, 220–231.

Beck, K. V., & Miltenberger, R. G. (2009). Evaluation of a commercially available program and in situ training by parents to teach abduction-prevention skills to children. Journal of Applied Behavior Analysis, 42, 761–772. doi:10.1901/jaba.2009.42-761

DiGennaro-Reed, F. D., Codding, R., Catania, C. N., & Maguire, H. (2010). Effects of video modeling on treatment integrity of behavioral interventions. Journal of Applied Behavior Analysis, 43, 291–295. doi:10.1901/jaba.2010.43-291

Borrero, C. S. W., & Borrero, J. C. (2008). Descriptive and experimental analyses of potential precursors to problem behavior. Journal of Applied Behavior Analysis, 41, 83–96. doi:10.1901/jaba.2008.41-83

Donaldson, J. M., & Normand, M. P. (2009). Using goals setting, self-monitoring, and feedback to increase calorie expenditure in obese adults. Behavioral Interventions, 24, 73–83. doi:10.1002/bin.277

Bosch, A., Miltenberger, R. G., Gross, A., Knudson, P., & Brower-Breitweiser, C. (2008). Evaluation of extinction as a functional treatment for binge eating. Behavior Modification, 32, 556–576. doi:10.1177/ 0145445507313271

Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York, NY: Wiley.

Boyce, T. E., & Geller, E. S. (2001). A technology to measure multiple driving behaviors without self-report or participant reactivity. Journal of Applied Behavior Analysis, 34, 39–55. doi:10.1901/jaba.2001.34-39 Brackett, L., Reid, D. H., & Green, C. W. (2007). Effects of reactivity to observations on staff performance. Journal of Applied Behavior Analysis, 40, 191–195. doi:10.1901/jaba.2007.112-05 Brown, R. A., Palm, K. M., Strong, D., Lejuez, C., Kahler, C., Zvolensky, M., . . . Gifford, E. (2008). Distress tolerance treatment for early-lapse smokers: Rationale, program description, and preliminary findings. Behavior Modification, 32, 302–332. doi:10.1177/0145445507309024 Codding, R. S., Livanis, A., Pace, G. M., & Vaca, L. (2008). Using performance feedback to improve treatment integrity of classwide behavior plans: An investigation of observer reactivity. Journal of Applied Behavior Analysis, 41, 417–422. doi:10.1901/ jaba.2008.41-417 Connell, J. E., & Witt, J. C. (2004). Applications of computer-based instruction: Using specialized software to aid letter-name and letter-sound recognition. Journal of Applied Behavior Analysis, 37, 67–71. doi:10.1901/jaba.2004.37-67 Cooper, J. O., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis (2nd ed.). Upper Saddle River, NJ: Pearson Education.

Flessner, C. A., Miltenberger, R. G., Egemo, K., Jostad, C., Gatheridge, B. J., Neighbors, C., . . . Kelso, P. (2005). An evaluation of the social support component of simplified habit reversal. Behavior Therapy, 36, 35–42. doi:10.1016/S0005-7894(05)80052-8 Fogel, V. A., Miltenberger, R. G., Graves, R., & Koehler, S. (2010). Evaluating the effects of exergaming on physical activity among inactive children in a physical education classroom. Journal of Applied Behavior Analysis, 43, 591–600. doi:10.1901/jaba.2010.43-591 Gatheridge, B. J., Miltenberger, R., Huneke, D. F., Satterlund, M. J., Mattern, A. R., Johnson, B. M., & Flessner, C. A. (2004). A comparison of two programs to teach firearm injury prevention skills to 6and 7-year-old children. Pediatrics, 114, e294–e299. doi:10.1542/peds.2003-0635-L Gravlee, C. C., Zenk, S. N., Woods, S., Rowe, Z., & Schulz, A. J. (2006). Handheld computers for direct observation of the social and physical environment. Field Methods, 18, 382–397. doi:10.1177/1525822X 06293067 Gresham, F. M., Gansle, K. A., & Noell, G. H. (1993). Treatment integrity in applied behavior analysis with children. Journal of Applied Behavior Analysis, 26, 257–263. doi:10.1901/jaba.1993.26-257 Harris, F. C., & Ciminero, A. R. (1978). The effects of witnessing consequences on the behavioral recording of experimental observers. Journal of Applied Behavior Analysis, 11, 513–521. doi:10.1901/jaba.1978.11-513

Daniels, A. C., & Daniels, J. E. (2004). Performance management: Changing behavior that drives organizational effectiveness. Atlanta, GA: Performance Management.

Hayes, S. C., Barlow, D. H., & Nelson-Gray, R. O. (1999). The scientist practitioner: Research and accountability in the age of managed care. Boston, MA: Allyn & Bacon.

DiGennaro, F. D., Martens, B. K., & Kleinmann, A. E. (2007). A comparison of performance feedback procedures on teachers’ treatment implementation integrity and students’ inappropriate behavior in special education classrooms. Journal of Applied Behavior Analysis, 40, 447–461. doi:10.1901/jaba.2007.40-447

Hayes, S. C., Wilson, K. G., Gifford, E., Bissett, R., Piasecki, M., Batten, S., . . . Gregg, J. (2004). A preliminary trial of twelve-step facilitation and acceptance and commitment therapy with polysubstance-abusing methadonemaintained opiate addicts. Behavior Therapy, 35, 667–688. doi:10.1016/S0005-7894(04)80014-5

DiGennaro, F. D., Martens, B. K., & McIntyre, L. L. (2005). Increasing treatment integrity through

Himle, M. B., Miltenberger, R. G., Flessner, C., & Gatheridge, B. (2004). Teaching safety skills to children to prevent 147

Miltenberger and Weil

gun play. Journal of Applied Behavior Analysis, 37, 1–9. doi:10.1901/jaba.2004.37-1 Himle, M. B., Miltenberger, R. G., Gatheridge, B., & Flessner, C. (2004). An evaluation of two procedures for training skills to prevent gun play in children. Pediatrics, 113, 70–77. doi:10.1542/peds.113.1.70 Iwata, B. A., Pace, G. M., Kissel, R. C., Nau, P. A., & Farber, J. M. (1990). The Self-Injury Trauma (SIT) scale: A method for quantifying surface tissue damage caused by self-injurious behavior. Journal of Applied Behavior Analysis, 23, 99–110. doi:10.1901/ jaba.1990.23-99 Jackson, J., & Dixon, M. R. (2007). A mobile computing solution for collecting functional analysis data on a pocket PC. Journal of Applied Behavior Analysis, 40, 359–384. doi:10.1901/jaba.2007.46-06 Jessup, P. A., & Stahelski, A. J. (1999). The effects of a combined goal setting, feedback and incentive intervention on job performance in a manufacturing environment. Journal of Organizational Behavior Management, 19, 5–26. doi:10.1300/J075v19n03_02 Johnston, J. M., & Pennypacker, H. S. (1993). Readings for strategies and tactics of behavioral research (2nd ed.). Hillsdale, NJ: Erlbaum. Kahng, S. W., & Iwata, B. A. (1998). Computerized systems for collecting real-time observational data. Journal of Applied Behavior Analysis, 31, 253–261. doi:10.1901/jaba.1998.31-253 Kazdin, A. E. (1977). Artifact, bias, and complexity of assessment: The ABCs of reliability. Journal of Applied Behavior Analysis, 10, 141–150. doi:10.1901/ jaba.1977.10-141 Kazdin, A. E. (2010). Single case research designs: Methods for clinical and applied settings (2nd ed.). New York, NY: Oxford University Press. Kerwin, M. L., Ahearn, W. H., Eicher, P. S., & Swearingin, W. (1998). The relationship between food refusal and self-injurious behavior: A case study. Journal of Behavior Therapy and Experimental Psychiatry, 29, 67–77. doi:10.1016/S0005-7916(97)00040-2 Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. doi:10.2307/2529310 Lerman, D. C., Iwata, B. A., Smith, R. G., & Vollmer, T. R. (1994). Restraint fading and the development of alternative behaviour in the treatment of self-restraint and self-injury. Journal of Intellectual Disability Research, 38, 135–148. doi:10.1111/j.1365-2788.1994.tb00369.x Lindsley, O. R. (1968). Technical note: A reliable wrist counter for recording behavior rates. Journal of Applied Behavior Analysis, 1, 77–78. doi:10.1901/ jaba.1968.1-77 Long, E. S., Miltenberger, R. G., Ellingson, S. A., & Ott, S. M. (1999). Augmenting simplified habit reversal 148

in the treatment of oral-digit habits exhibited by individuals with mental retardation. Journal of Applied Behavior Analysis, 32, 353–365. doi:10.1901/ jaba.1999.32-353 MacDonald, R., Sacramone, S., Mansfield, R., Wiltz, K., & Ahern, W. (2009). Using video modeling to teach reciprocal pretend play to children with autism. Journal of Applied Behavior Analysis, 42, 43–55. doi:10.1901/jaba.2009.42-43 Mace, F. C., Prager, K. L., Thomas, K., Kochy, J., Dyer, T. J., Perry, L., & Pritchard, D. (2009). Effects of stimulant medication under varied motivational operations. Journal of Applied Behavior Analysis, 42, 177–183. doi:10.1901/jaba.2009.42-177 Maglieri, K. A., DeLeon, I. G., Rodriguez-Catter, V. R., & Sevin, B. M. (2000). Treatment of covert food stealing in an individual with Prader-Willi syndrome. Journal of Applied Behavior Analysis, 33, 615–618. doi:10.1901/jaba.2000.33-615 Maher, E. (2009). Behavior Tracker Pro. Retrieved from http://www.behaviortrackerpro.com/btp/Welcome. html Malott, R., & Trojan-Suarez, E. A. (2004). Elementary principles of behavior (5th ed.). Upper Saddle River, NJ: Prentice Hall. Marckel, J. M., Neef, N. A., & Ferreri, S. J. (2006). A preliminary analysis of teaching improvisation with the picture exchange communication system to children with autism. Journal of Applied Behavior Analysis, 39, 109–115. doi:10.1901/jaba.2006.131-04 Mash, E. J., & McElwee, J. (1974). Situational effects on observer accuracy: Behavioral predictability, prior experience, and complexity of coding categories. Child Development, 45, 367–377. doi:10.2307/1127957 Mayfield, K. H., & Vollmer, T. R. (2007). Teaching math skills to at-risk students using home-based peer tutoring. Journal of Applied Behavior Analysis, 40, 223–237. doi:10.1901/jaba.2007.108-05 Meidinger, A. L., Miltenberger, R. G., Himle, M., Omvig, M., Trainor, C., & Crosby, R. (2005). An investigation of tic suppression and the rebound effect in Tourette’s disorder. Behavior Modification, 29, 716–745. doi:10.1177/0145445505279262 Miltenberger, R. G. (2012). Behavior modification: Principles and procedures (5th ed.). Belmont, CA: Wadsworth. Miltenberger, R., Rapp, J., & Long, E. (1999). A low tech method for conducting real time recording. Journal of Applied Behavior Analysis, 32, 119–120. doi:10.1901/ jaba.1999.32-119 Miltenberger, R. G., Woods, D. W., & Himle, M. (2007). Tic disorders and trichotillomania. In P. Sturmey (Ed.), Handbook of functional analysis and clinical psychology (pp. 151–170). Burlington, MA: Elsevier.

Observation and Measurement in Behavior Analysis

Miltenberger, R. G., Wright, K. M., & Fuqua, R. W. (1986). Graduated in vivo exposure with a severe spider phobic. Scandinavian Journal of Behaviour Therapy, 15, 71–76. doi:10.1080/16506078609455763 Mowery, J., Miltenberger, R., & Weil, T. (2010). Evaluating the effects of reactivity to supervisor presence on staff response to tactile prompts and self-monitoring in a group home setting. Behavioral Interventions, 25, 21–35. Mozingo, D. B., Smith, T., Riordan, M. R., Reiss, M. L., & Bailey, J. S. (2006). Enhancing frequency recording by developmental disabilities treatment staff. Journal of Applied Behavior Analysis, 39, 253–256. doi:10.1901/jaba.2006.55-05 Parsons, H. M. (1974). What happened at Hawthorne? Science, 183, 922–932. doi:10.1126/science.183. 4128.922 Patel, M. R., Piazza, C. C., Layer, S. A., Coleman, R., & Swartzwelder, D. M. (2005). A systematic evaluation of food textures to decrease packing and increase oral intake in children with pediatric feeding disorders. Journal of Applied Behavior Analysis, 38, 89–100. doi:10.1901/jaba.2005.161-02 Pedhazur, E., & Schmelkin, L. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Erlbaum. Peterson, L., Homer, A. L., & Wonderlich, S. A. (1982). The integrity of independent variables in behavior analysis. Journal of Applied Behavior Analysis, 15, 477–492. doi:10.1901/jaba.1982.15-477 Plavnick, J. B., Ferreri, S. J., & Maupin, A. N. (2010). The effects of self-monitoring on the procedural integrity of behavioral intervention for young children with developmental disabilities. Journal of Applied Behavior Analysis, 43, 315–320. doi:10.1901/ jaba.2010.43-315 Raiff, B. R., Faix, C., Turturici, M., & Dallery, J. (2010). Breath carbon monoxide output is affected by speed of emptying the lungs: Implications for laboratory and smoking cessation research. Nicotine and Tobacco Research, 12, 834–838. doi:10.1093/ntr/ntq090 Rapp, J. T., Colby, A. M., Vollmer, T. R., Roane, H. S., Lomas, J., & Britton, L. N. (2007). Interval recording for duration events: A re-evaluation. Behavioral Interventions, 22, 319–345. doi:10.1002/bin.239 Rapp, J. T., Colby-Dirksen, A. M., Michalski, D. N., Carroll, R. A., & Lindenberg, A. M. (2008). Detecting changes in simulated events using partial-interval recording and momentary time sampling. Behavioral Interventions, 23, 237–269. doi:10.1002/bin.269 Rapp, J. T., Miltenberger, R. G., & Long, E. S. (1998). Augmenting simplified habit reversal with an awareness enhancement device: Preliminary findings. Journal of Applied Behavior Analysis, 31, 665–668. doi:10.1901/jaba.1998.31-665

Rapp, J. T., Miltenberger, R. G., Long, E. S., Elliott, A. J., & Lumley, V. A. (1998). Simplified habit reversal for chronic hair pulling in three adolescents: A clinical replication with direct observation. Journal of Applied Behavior Analysis, 31, 299–302. doi:10.1901/ jaba.1998.31-299 Repp, A. C., & Karsh, K. G. (1994). Hypothesis-based interventions for tantrum behaviors of persons with developmental disabilities in school settings. Journal of Applied Behavior Analysis, 27, 21–31. doi:10.1901/ jaba.1994.27-21 Repp, A. C., Karsh, K. G., Felce, D., & Ludewig, D. (1989). Further comments on using hand-held computers for data collection. Journal of Applied Behavior Analysis, 22, 336–337. doi:10.1901/jaba.1989.22-336 Riordan, M. M., Iwata, B. A., Finney, J. W., Wohl, M. K., & Stanley, A. E. (1984). Behavioral assessment and treatment of chronic food refusal in handicapped children. Journal of Applied Behavior Analysis, 17, 327–341. doi:10.1901/jaba.1984.17-327 Saunders, M. D., Saunders, J. L., & Saunders, R. R. (1993). A program evaluation of classroom data collection with bar codes. Research in Developmental Disabilities, 14, 1–18. doi:10.1016/0891-4222(93)90002-2 Schrandt, J. A., Townsend, D. B., & Poulson, C. L. (2009). Teaching empathy skills to children with autism. Journal of Applied Behavior Analysis, 42, 17–32. doi:10.1901/jaba.2009.42-17 Scott, D., Scott, L. M., & Goldwater, B. (1997). A performance improvement program for an internationallevel track and field athlete. Journal of Applied Behavior Analysis, 30, 573–575. doi:10.1901/jaba.1997.30-573 Silverman, K., Wong, C. J., Needham, M., Diemer, K. N., Knealing, T., Crone-Todd, D., . . . Kolodner, K. (2007). A randomized trial of employment-based reinforcement of cocaine abstinence in injection drug users. Journal of Applied Behavior Analysis, 40, 387–410. doi:10.1901/jaba.2007.40-387 Sim, J., & Wright, C. C. (2005). The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy, 85, 257–268. Skinner, B. F. (1956). A case history in scientific method. American Psychologist, 11, 221–233. doi:10.1037/ h0047662 Stickney, M. I., & Miltenberger, R. G. (1999). Evaluation of direct and indirect measures for the functional assessment of binge eating. International Journal of Eating Disorders, 26, 195–204. doi:10.1002/(SICI)1098108X(199909)26:23.0.CO;2-2 Sundberg, M. L., & Michael, J. (2001). The benefits of Skinner’s analysis of verbal behavior for children with autism. Behavior Modification, 25, 698–724. doi:10.1177/0145445501255003 Therrien, K., Wilder, D. A., Rodriguez, M., & Wine, B. (2005). Preintervention analysis and improvement 149

Miltenberger and Weil

of customer greeting in a restaurant. Journal of Applied Behavior Analysis, 38, 411–415. doi:10.1901/ jaba.2005.89-04 Touchette, P. E., Macdonald, R. F., & Langer, S. N. (1985). A scatterplot for identifying stimulus control of problem behavior. Journal of Applied Behavior Analysis, 18, 343–351. doi:10.1901/jaba.1985.18-343 Twohig, M. P., Masuda, A., Varra, A. A., & Hayes, S. C. (2005). Acceptance and commitment therapy as a treatment for anxiety disorders. In S. M. Orsillo & L. Roemer (Eds.), Acceptance and mindfulnessbased approaches to anxiety: Conceptualization and treatment (pp. 101–129). New York, NY: Kluwer. doi:10.1007/0-387-25989-9_4 Twohig, M. P., Shoenberger, D., & Hayes, S. C. (2007). A preliminary investigation of acceptance and commitment therapy as a treatment for marijuana dependence in adults. Journal of Applied Behavior Analysis, 40, 619–632. doi:10.1901/jaba.2007.619-632

in children. Journal of Applied Behavior Analysis, 26, 53–61. doi:10.1901/jaba.1993.26-53 Wallace, M. D., Iwata, B. A., & Hanley, G. P. (2006). Establishment of mands following tact training as a function of reinforcer strength. Journal of Applied Behavior Analysis, 39, 17–24. doi:10.1901/ jaba.2006.119-04 Wilder, D. A., Zonneveld, K., Harris, C., Marcus, A., & Reagan, R. (2007). Further analysis of antecedent interventions on preschoolers’ compliance. Journal of Applied Behavior Analysis, 40, 535–539. doi:10.1901/ jaba.2007.40-535 Wolf, M. M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its heart. Journal of Applied Behavior Analysis, 11, 203–214. doi:10.1901/jaba.1978.11-203

VanWormer, J. J. (2004). Pedometers and brief e-counseling: Increasing physical activity for overweight adults. Journal of Applied Behavior Analysis, 37, 421–425. doi:10.1901/jaba.2004.37-421

Wong, C. J., Sheppard, J.-M., Dallery, J., Bedient, G., Robles, E., Svikis, D., & Silverman, K. (2003). Effects of reinforcer magnitude on data-entry productivity in chronically unemployed drug abusers participating in a therapeutic workplace. Experimental and Clinical Psychopharmacology, 11, 46–55. doi:10.1037/10641297.11.1.46

Vollmer, T. R., Marcus, B. A., Ringdahl, J. E., & Roane, H. S. (1995). Progressing from brief assessments to extended experimental analyses in the evaluation of aberrant behavior. Journal of Applied Behavior Analysis, 28, 561–576.

Wright, K. M., & Miltenberger, R. G. (1987). Awareness training in the treatment of head and facial tics. Journal of Behavior Therapy and Experimental Psychiatry, 18, 269–274. doi:10.1016/0005-7916(87) 90010-3

von Eye, A., & von Eye, M. (2008). On the marginal dependency of Cohen’s κ. European Psychologist, 13, 305–315. doi:10.1027/1016-9040.13.4.305

Young, J., Zarcone, J., Holsen, L., Anderson, M. C., Hall, S., Richman, D., . . . Thompson, T. (2006). A measure of food seeking in individuals with Prader-Willi syndrome. Journal of Intellectual Disability Research, 50, 18–24. doi:10.1111/j.1365-2788.2005.00724.x

Wagaman, J. R., Miltenberger, R. G., & Arndorfer, R. E. (1993). Analysis of a simplified treatment for stuttering

150

Chapter 7

Generality and Generalization of Research Findings Marc N. Branch and Henry S. Pennypacker

For generalization, psychologists must finally rely, as has been done in all the older sciences, on replication. (Cohen, 1994, p. 997) Confirmation comes from repetition. . . . Repetition is the basis for judging . . . significance and confidence. (Tukey, 1969, pp. 84–85) As the general psychology research community becomes increasingly aware (e.g., Cohen, 1994; Loftus, 1991, 1996; Wilkinson & Task Force on Statistical Inference, 1999) of the limitations of traditional group designs and statistical inference methods with regard to assessing reliability and generality of research findings, we present an alternative approach that has been substantially developed in the branch of psychology now known as behavior analysis. In this chapter, we outline how individual subject methods, that is, so-called single-case designs, provide straightforward and, in principle, simple methods to assess the reliability and generality of research findings. Overview The chapter consists of three major sections. In the first, we summarize the limitations of traditional methods, especially as they relate to assessing reliability and generality of research findings concerning behavior. We make the case that traditional methods have obscured an important distinction that has led to psychology’s consisting of

two related, but separable, subject matters, behavioral science and actuarial science. We also focus on the issue of generality across individuals and how traditional methods can give the illusion of such generality. In the second major section, we discuss dimensions of generality in addition to generality across individuals. Here we define scientific generality and several other forms of generality as well. In so doing, we introduce the roles of replication, both direct and systematic, in assessing generality of research results. We argue that replication, instead of statistical inference, is an alternative primary method for determining not only the reliability of results but also for assessing and characterizing the generality of scientific findings. In the third major section, we discuss generalization of treatment effects, the fundamentals of technology transfer, and the practices that characterize translational research. There, we write of programming for and assessment of generalizability of scientific findings to applied settings. We expand our view then to the engineering issues of technology development (or technology transfer and translational research) as a capstone demonstration of generalization based on an understanding of generality of research findings. Limitations of Traditional Methods The traditional group-mean, statistical-inference approach to analyzing research results has faced

Preparation of this chapter was supported by National Institute on Drug Abuse Grant DA004074. DOI: 10.1037/13937-007 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

151

Branch and Pennypacker

consistent criticism for more than 4 decades (e.g., Bakan, 1966; Carver, 1978; Cohen, 1994; Gigerenzer, Krauss, & Vitouch, 2004; Loftus, 1991, 1996; Meehl, 1967, 1978; Nickerson, 2000; Rozeboom, 1960). Most of that criticism has focused on what those methods have to say about the reliability of research findings, which is appropriate because if findings are not reliable, there is no need to assess their generality. These methods, however, have also been criticized with respect to theory testing and development, issues that directly relate to generality. We treat these two categories of criticism separately.

Significance Testing and Reliability After all of the carefully reasoned criticism of significance testing that has been published, one would hope that a clear understanding of its limits would exist among professional psychologists. That, however, appears not to be true, as noted by Cohen (1994), who lamented that after 4 decades of severe criticism, the ritual of null hypothesis significance testing . . . still persists. [As does] near universal misinterpretation of p as the probability that H-sub-0 is false, [and] the misinterpretation that its complement is the probability of successful replication. (p. 997) Cohen’s assertion is supported by survey evidence revealing that a substantial majority of academic research psychologists incorrectly interpret p values and statistical significance (Haller & Krauss, 2002; Kalinowski, Fidler, & Cumming, 2008; Oakes, 1986). That a significant proportion of professional psychologists do not appreciate what statistical significance and, especially, p values represent is apparent testimony to a weakness in the training of research psychologists, a failing that lies at the feet of those of us who are engaged in teaching them. In fact, Haller and Krauss (2002) included a sample of statistical methodology instructors in their study and found that 80% of them were mistaken in their understanding of p values, so it comes as less of a surprise that the misconceptions are widespread. The following discussion, therefore, is another attempt to make clear what a p value is and what it means. 152

A p value, which results from a significance test, is a conditional probability. Specifically, it is the probability, if the null hypothesis is true, of obtaining data of a particular sort. That is, in algebraic symbols, it is p = P(Data|H0). The important point is that p ≠ P(H0|Data), which is what a researcher would presumably really like to know. In other words, a p value does not provide quantitative information about whether the null hypothesis is true, which is apparently widely misunderstood. Because it does not provide the oft-assumed information about the likelihood of the null hypothesis being true, a p value of .01 does not mean that the probability of the null hypothesis being true is 1 in 100. In fact, it conveys nothing quantitative about the truth of the null hypothesis. To see why, note that changing the order of conditionality in a conditional probability is crucially important. Consider such examples as P(Dead|Electrocuted) versus P(Electrocuted|Dead) or P(Cloudy|Raining) versus P(Raining|Cloudy). The first probability in each pair tells nothing about the second, just as P(Data|H0) reveals nothing about P(H0|Data). A p value, therefore, has quantitative meaning only if the null hypothesis is true, but when performing statistical tests not only does one not know whether the null hypothesis is true, one probably assumes it is not. The important fact is that a finding of statistical significance, via a small p value, does not imply that the null hypothesis is unlikely to be true. The incorrect logic underlying the mistaken conclusion (cf. Falk & Greenbaum, 1995) apparently goes as follows: If the null hypothesis is true, data of a certain sort are unlikely. I obtained data of that sort, so therefore the null hypothesis is unlikely to be true. That so-called logic is precisely the same as the following: If the next person I meet is an American, he or she is unlikely to be the President. I just met the President. Therefore, he or she is unlikely to be an American. The fundamental misunderstanding of what a p value is leads directly to the more serious problem of assuming that it indicates something quantitative about the reliability, that is, the likelihood of replication, of the finding. A common misunderstanding (see Haller & Krauss, 2002, and Oakes, 1986, for evidence) is that a p value, for example of .01, is the

Generality and Generalization of Research Findings

complement of the probability of replication should the experiment be repeated. That is, the mistaken assumption is that if one conducted the experiment 100 times, one should replicate the result on 99 of those occasions (at least on average). If one knew that the null hypotheses were true, then that would be a correct interpretation of the p value. Of course, though, one does not know whether H0 is true (again, one usually hopes it is not). In fact, one conducts the statistical test so that one can make what one (mistakenly) hopes is an educated guess about whether it is true. Thus, to say on the basis of a small p value that a result is statistically reliable is to strain the meaning of reliable beyond reasonable limits. This limitation of statistical significance is not based on technical details of the null hypothesis. That is, the problem does not lie with whether the underlying distribution is formally normal or near normal or whether the statistical test involved is demonstrably robust with respect to violations of assumptions about the underlying distribution. The limitation is based in the logic of the approach. All the assumptions about the distributional characteristic null hypothesis might in fact be true, but that is not relevant when one is speaking of what a p value indicates. A major limitation of statistical significance, therefore, is that it does not provide direct information about the reliability of research findings. Without knowledge about reliability, no examination of generality can occur because repeatability is the most basic test of generality. Notwithstanding that limitation, however, significance testing based on group means may be seen, incorrectly, to have implications for generality of findings across subjects. Adherence to this view unfortunately gains strength as sample size increases. In fact, however, regardless of sample size, no information about intersubject generality can be extracted from a significance statement because no knowledge is afforded concerning the number of subjects for whom the effect actually occurred. We examine the implications of this fact in more detail below. Aside from the limits surrounding reliability just described, other characteristics of group-mean data warrant examination as we move into a discussion

of generality. It is here that we show that psychology, presumably because of the widespread use of significance testing, has developed two distinguishable subject matters.

Significance Testing and Generality Traditional significance testing approaches in psychology are generally based on data averaged across individuals. As is well known, the mean from a group of individuals (a sample) provides an estimate of the mean of the entire population from which the sample is drawn, and that estimate can be bounded by confidence intervals that provide information (not the probability, however, that the population mean falls within the interval; see Smithson, 2003) about how confident one can be that the population mean lies within such intervals. Thus, the sample mean provides information about a parameter that applies to the entire population. That fact appears to imply substantial generality; it applies to the entire population (however delimited), so generality appears maximized. This raises two important issues. First is the question of representativeness of the means, both sample and population. That is, identical or similar means can result from substantially different distributions of scores. Two examples that illustrate this fact are given in Figures 7.1 and 7.2. In Figure 7.1, four distributions of 20 scores are arrayed horizontally in the upper panel. In the top row, the values are arithmetically separated, whereas in the other three, they are clustered in various ways. Note that none of the four is particularly “normal” in appearance, that is, clustered in the middle. The four plots in the lower panel show—with the top plot corresponding to the top distribution in the upper panel, and so on—the means (solid points) and standard deviations (bars) of the four distributions. They are, as planned, identical. These data show that identical means and standard deviations, the stock in trade of inferential statistics, can be obtained from very different distributions of values. That is, in these cases the means and standard deviations do not provide a particularly informative or representative indication of what the individual values are, which implies that when dealing with averages of measures, or averages across individuals, attention must be paid to the representativeness of 153

Branch and Pennypacker

14

14

12

12

10

10

8

8

Y

Y

Figure 7.1. Upper panel: Four distributions of values, with each symbol representing one value on the x-axis. Lower panel: The corresponding means and standard deviations of the four corresponding distributions from the upper panel. From The Elements of Graphing Data (rev. ed., p. 215), by W. S. Cleveland, 1994, Summit, NJ: Hobart Press. Copyright 1994 by AT&T Bell Laboratories. Reprinted with permission.

6

6

4

4

2

2

0

0

5

10

15

0

20

0

5

14

14

12

12

10

10

8

8

6

6

4

4

2

2

0

0

5

10

X

10

15

20

15

20

X

Y

Y

X

15

20

0

0

5

10

X

Figure 7.2. Anscombe’s quartet. Each of the four graphs shows 11 x–y pairs and the bestfitting (least-squares estimate) straight line through the points. The slopes and intercepts of the lines are identical. From “Graphs in Statistical Analysis,” by F. J. Anscombe, 1973, American Statistician, 27, pp. 19–20. Copyright 1973 by the American Statistical Association. Adapted with permission. All rights reserved. 154

Generality and Generalization of Research Findings

the mean, not just its value, or even its standard deviation. Figure 7.2, which contains what is known as Anscombe’s quartet (Anscombe, 1973), provides an even more dramatic illustration of how focusing only on the average of a set of numbers can lead one to miss important features of that set. The four graphs in Figure 7.2 plot 11 values in x–y coordinates and show the best-fitting (via the least-squares method) straight line to the data. Obviously, the distributions of points are quite different in the four sets. Nevertheless, the means for the x values are all the same, as are their standard deviations. The same is true for the y values (yielding eight instances of the sort shown in Figure 7.1). In addition, the slopes and intercepts of the straight lines are identical for all four sets, as are the sums of squared errors and sums of squared residuals. Thus, all four would yield the same correlation coefficient describing the relation between x and y. The point of these illustrations is to indicate that a sample mean, even though a predictor of a population mean, is not necessarily a good description of individual values, so it is not necessarily a good indicator of the generality across individual measures. When the measures come from individual people (or other nonhuman animals), it follows that the average of the group may not reveal, and may well conceal, much about individuals. It is important to remember, therefore, that sample means from a group of individuals permit inferences about the population average, but these means do not permit inferences to individuals unless it is demonstrated that the mean is, in fact, representative of individuals. Surprisingly, it is rare in psychology to see the issue of representativeness of an average even mentioned, although recently, in the domain of randomized clinical trials, the limitations attendant to group averages have been gaining increased mention (e.g., Penston, 2005; Williams, 2010). Many experimental designs, nevertheless, involve comparison across groups with large numbers of subjects, which raises the question of the practicality of presenting the data of every individual. The concern is legitimate, but the problem is not solved by resorting to the study of group averages only. Excellent techniques for comparing distributions,

like stem-and-leaf plots, box plots, and quantile– quantile plots, are available (Cleveland, 1994; Tukey, 1977). They provide a more complete description of measures from individuals, or a useful subset (as can be the case with quantile–quantile plots), than do simple means and standard errors or means and confidence intervals. We presume that as null-hypothesis significance-testing approaches become less prevalent, more effort will be directed toward developing new and better techniques for comparing distributions, methods that will include and make evident the measures from individuals.

Two Separable Subject Matters for Psychology? In some instances, the difference between a population parameter, such as the population average, and the activity of an individual is obvious. For example, consider the average rate of pregnancy in women between 20 and 30 years old. Suppose that rate is 7%. That, of course, is a useful statistic and can be used to predict how many women in that age category will be pregnant. More important for the present purposes, however, is that the value, 7%, applies to no individual woman. That is, no woman is 7% pregnant. A woman is either pregnant or she is not. What of situations, however, in which an average is representative of the behavior of individuals? For example, suppose that a particular teaching technique is discovered to result in a 10% increase in performance on some examination and that the improvement is at or near 10% for every individual. Is that not a case in which a group average would permit estimation of a population mean that is, in fact, a good descriptor of the effect of the training for individuals and, because it applies to the population, has wide generality? The answer is yes and no. The point to be made here is somewhat subtle, and so we elaborate on it with an example. Consider a situation in which a scientist is trying to determine the relation between amount of practice at solving five-letter anagrams and subsequent speed at solving six-letter anagrams. Suppose, specifically, that no practice and 10, 50, 100, and 200 anagrams of practice are to be compared. After the practice, subjects who have never previously solved anagrams, except for those seen in the practice phase, are given 50 new 155

Branch and Pennypacker

anagrams to solve, and the time to complete is recorded. Because total practice might be a determinant of speed, the scientist opts to use a betweengroups design, with each group being exposed to one of the practice regimens. That is, the hope is to extract the seemingly pure relation between practice and later speed, uncontaminated by prior relevant practice. The scientist then averages the data from each group and uses those means to describe the function relating amount of practice to speed of solving the new, more difficult anagrams. In an actual case, variability would likely be found among individuals within each group, so one issue would be how representative the average is of each member of each group. For our example, however, assume that the average is representative, even perfectly so (i.e., every subject in a group gives exactly the same value). The scientist has generated a function, probably one that describes an increase in speed of solving anagrams as a function of amount of prior practice. In our example, that function allows us to predict exactly what an individual would do if exposed to a certain amount of practice. Even though the means for each group are representative and therefore permit prediction about individual behavior, the important point is that the function has no meaning for an individual. That is, the function does not describe something that would occur for an individual because no individual can be exposed to different amounts of practice for the first time. The function is an actuarial account, not a description of a behavioral process. It is, of course, to the extent that the means are representative, a useful finding. It is just not descriptive of a behavioral process in an individual. To examine the same issue at the level of an individual would require investigation of sequences of amounts of practice, and that examination would have to include experiments that factor in the role of repeated practice. Obviously, such an endeavor is considerably more complicated than the study that generated the actuarial curve, but it is the only way to develop a science of individual behavior. The ontogenetic roots of behavior cumulate over lifetimes. In later portions of this chapter, we discuss how the complications may be confronted.

156

The point is not to diminish the value of actuarial data, nor to suggest that psychologists abandon the collection and analysis of such data. If means are highly representative, such data can offer predictions at the individual subject level. Even if the means are not highly representative, organizations such as insurance companies and governments can and do make important use of such information in determining appropriate shared risk or regulatory policy, respectively. The point is, using insurance rates as an example, that just because you are in a particular group, for example, that of drivers between the ages of 16 and 25, for which the mean rate of accidents is higher than for another group, does not indicate that you personally are more likely to have an automobile accident. It does mean, however, that for the insurance company to remain profitable, insurance rates need to be higher for all members of the group. Similarly, with respect to health policy, even though most people who smoke cigarettes do not get lung cancer, the incidence of lung cancer, on a relative basis, is substantially greater, on average, in that group. Because the group is large, even a low incidence rate yields a substantial number of actual lung cancer cases, so it is in the government’s, and the population’s, interest to reduce the number of people who smoke cigarettes. The crux of the matter is that actuarial and behavioral data, although related in that the former depend on the latter, are distinguishable and, therefore, should be distinguished. Psychology, to the extent that it relies solely on the methods of inferential statistics that use averages across individuals, becomes an actuarial science, not a science of behavioral processes. The methods described in this chapter are aimed at including in psychology its oft-stated goal of being a science of behavior (or of the mind). Behavioral and inferred mental processes really make sense only at the level of the individual. (The same is true of physiology, which has become a rather exact science in part because of the influence of Claude Bernard, 1865/1957.) A person’s behavior, including thinking, imagining, and so forth, is particular to that person. That is, people do not share their minds or their behavior with others, just as they do not share their physiology. A counterargument

Generality and Generalization of Research Findings

is that behavior and mental activity are too variable from individual to individual to permit a scientific analysis. We based this chapter on the more optimistic view that such activity is amenable to study at the level of the individual. Because a good deal of application of psychological knowledge involves dealing with individuals, for example, as in psychotherapy, understanding at the level of the individual remains a worthy goal. Support for the viewpoint that a science of individual behavior is possible, however, requires an elaboration of how an individual subject–based analysis can yield information that is more generally applicable to more than one or a few individuals.

Let us compare how a more traditional betweengroups approach might fare in dealing with the issue. We apply music to one group and not to another. What will result will depend on the distribution of baseline accuracy across individuals. Figure 7.3 shows three possible population distributions. In B, most people have low accuracy, in C most have high accuracy, and in A people fall into two groups with respect to baseline accuracy. If one performed the experiment on groups and took the group mean to be the indicator of the effect of the independent variable, the conclusion would depend on the underlying distribution. In A, the conclusion

Why Single-Case Designs Do Not Mean That N = 1 Traditional approaches, with the attendant limitations described thus far, likely arose, at least in part, because of a legitimate concern about focusing research on individual subjects who are studied repeatedly through time (more on this later). Such research is usually performed with relatively few subjects, leaving open the possibility that effects seen might be limited with respect to generality across other individuals. An example, modeled after one offered by Sidman (1960), provides a response to such misgivings. Suppose we were interested in whether listening to classical music while solving arithmetic problems improves accuracy. Using a single-case approach, the study is started with a single subject. We might first establish a baseline of accuracy (more on this later) by measuring it over several successive exposures. Next, we would test the subject with the music present and then with it absent. Suppose we find that accuracy is increased when music is present and reverts to normal when it is not. Suppose also that unbeknownst to us, the effect music will have depends on the baseline level of accuracy; if accuracy is initially low, it is enhanced by the presence of music, whereas if it is initially high, it is reduced when the music is on. We might mistakenly conclude, on the basis of the results from the one subject, that music increases accuracy of solving the kinds of arithmetic problems used.

Figure 7.3. Three hypothetical frequency distributions characterizing the number of people displaying different baseline rates. From Tactics of Scientific Research: Evaluating Experimental Data in Psychology (p. 149), by M. Sidman, 1960, New York, NY: Basic Books. Copyright 1988 by Murray Sidman. Reprinted with permission.

157

Branch and Pennypacker

might well be that music has no effect, with the lowered accuracy in people with high baseline accuracy canceling out the increases that result among those with low baseline accuracy. If the population is distributed as in B, the conclusion would be that music increases accuracy because the mean would move in the direction of improved accuracy. The important point is that simply considering the group average makes it less likely that the baseline dependency that underlies the effect will be seen. Let us now compare what might transpire with the single-case approach, an approach based on replication. Having seen the effect in the first subject, we recruit a second and do the experiment again. Suppose that the population distribution is as depicted in Figure 7.3B. The most likely scenario is that the second subject will also have low baseline accuracy because someone sampled from the population is most likely to manifest modal characteristics. We get the same result and could, mistakenly, conclude that music enhances arithmetic accuracy. That is, we make the same mistake as with the group-average approach. The difference between the two approaches, however, is that the group mean approach makes it more difficult to discover the underlying, real effect. The single-case approach, however, if enough replications are done, will eventually and inevitably reveal the problem because sooner or later someone with high baseline accuracy will be examined and show a decrease. A key phrase in the previous sentence is “if enough replications are done.” Whether that happens is likely to depend on the perceived importance of the effect. If it is deemed important, it is likely to be subjected to additional research, which will, in turn, lead to additional replications. Thus, the single-case approach is not some sort of panacea with respect to identifying such relations, but it offers a direct path to corrective action. Of course, it is possible to ferret out the baseline dependency using a group-mean approach, but that will happen only if attention is paid to the data of individual subjects in a group. In the singlecase approach, those data are automatically scrutinized. A major point is that single case does not necessarily imply that only one or even only a few subjects be examined. Some research questions might involve examination of many subjects. (We 158

discuss later how to decide how many subjects to test.) What the approach involves is studying each subject essentially as an independent experiment. Generality across subjects is therefore examined directly by seeing how often the experiment’s effects are replicated. A second major point is that the apparent virtues of studying many subjects, a standard aspect of traditional research designs in psychology, are realized only if the data from each subject are individually analyzed.

Null-Hypothesis Significance Testing and Theory Development A major goal in any science is the development of theory, and there is a sense in which theory has clear relevance to generality. Effective theories are those that account for a wide array of research results. That is, they apply generally. The way in which significance testing is most commonly used in psychology, however, mitigates against the orderly development and testing of theories and against the analysis of competing theories. The problem was first identified as a paradox by Meehl (1967; see also Meehl, 1978). The problem is a logical one based largely on the choice of the null hypothesis as “no effect.” The logic of the common approach is as follows. An investigator has a hypothesis that imposition of a variable, X, will change another measure, Y. This hypothesis is sometimes called the alternative hypothesis. The null hypothesis is then chosen to be that X will not change Y, that is, that it will be without effect. Next, the X condition is imposed, and Y is measured. A comparison is then made of Y without X and Y with X. A statistic is then calculated that is generally a ratio of changes in Y as a result of X over changes in Y as a result of anything else. In more technical terms, the statistic is effect variance over error variance. The larger the statistic, the smaller the p value, and the more likely it is that statistical significance is achieved and the null hypothesis rejected. Standard teaching demands that even though one can decide to reject the null hypothesis, logic prevents one from accepting the alternative hypothesis. Instead, one would say that if the null hypothesis is rejected, the alternative hypothesis gains support. The paradox noted by Meehl (1967) arises from the nature of the statistic itself. The size of the

Generality and Generalization of Research Findings

statistic is controlled by two values, the effect size and the error variance, so it can be increased in two ways. The way of interest for this discussion is via a decrease in error variance, the denominator. A major way of decreasing error variance is through increased experimental rigor (one avenue of which is to increase the number of subjects). To the degree that extraneous variables (the “anything else” mentioned earlier) can be eliminated or held constant, error variance should decrease, making it more likely that that statistic will be large enough to warrant a decision as to statistical significance. The paradox, therefore, is that as experimental rigor is increased—that is, as experimental techniques are refined and improved—statistical significance becomes more likely, with the consequence that the alternative hypothesis gains support, no matter what the alternative hypothesis is. That does not seem like a recipe for cumulative progress in science. Simple null-hypothesis significance testing with the null hypothesis set at no effect cannot, by itself, help to develop theory. Meehl (1967) described one approach that can obviate this paradox, which is to use significance testing with a null hypothesis that is not “no effect.” Instead, the null hypothesis is what the theory (or alternative hypothesis) predicts. Consider how the logic operates when this tactic is used. As experimental rigor increases, error variance is decreased, making it more likely that the resulting statistic will reach a critical value. When that value is achieved, the null hypothesis is rejected, but in this case it is the investigator’s theory that is rejected. Rather than increased experimental rigor resulting in its being easier for one’s theory to gain support, it results in its being easier to reject one’s theory. Increasing experimental control puts the theory to a more rigorous test, not an easier one as is the case when using the no-effect, or no-difference, null hypothesis. The harder one works to reject a theory and fails to succeed, the more confidence one has in the theory. Training in statistical inference, at least for psychologists, does not usually emphasize that the null hypothesis need not be no effect. It can, nevertheless, as just noted, be some particular effect. Note that it has to be some specific value other than zero.

The use of a particular value as the null hypothesis therefore requires that one’s theory be quantitative enough to generate a specific value. This approach is what characterizes tests of goodness of fit (those that use significance tests) of quantitatively specified functions. This approach of setting the null hypothesis at a value predicted by theory is nevertheless not immune to the previously described weaknesses of significance testing in general. If, however, significance testing is used to make decisions, at least this latter approach does not suffer from the weakness of making it easier to support a researcher’s theory, regardless of what it is, as methods improve. In this section of the chapter, we have made the case, we hope, that commonly used psychology research methods have limitations in assessing reliability and generality of research findings. In addition, the methods have resulted in many areas of psychology being largely actuarial, group-average– focused science rather than aimed at the behavior of individuals. In the next section, we describe the basics of an alternative approach that is based on replication rather than significance testing and group averages. It is useful to remember that important science was conducted before the invention of significance testing, and what follows is a description of the application of methods used to establish most of modern physics and chemistry (and physiology) to the study of behavior. The approach focuses on understanding behavioral processes, rather than actuarial ones, and has already yielded a good deal of success, as other chapters in Volume 2 of this handbook illustrate. We should note, nevertheless, that even if the goal is actuarial prediction and influence, the methods of statistical inference are limited in what they can achieve with respect to reliability of research findings. As we argue, the only sure way to examine reliability of results is to repeat them, so replication is the key strategy for both subject matters of psychology. Assessing Reliability and Generality via Replication The two distinguishable categories of replication are direct replication and systematic replication, 159

Branch and Pennypacker

Direct Replication: Within-Subject Reliability and Baselines In the first part of this section, we describe the methods and roles of direct replication with the same experimental subject (i.e., a truly single-case experiment). We open with this simplest case, and with an example, not only to illustrate how the strategy can be used, but also to deal more clearly with reservations about and limitations of the approach as well as how decisions about characteristics of the replicative process may be made. For our example, suppose that we want to measure the amount of a certain kind of food eaten after some period without food. We let our subject eat after 12 hours of fasting; suppose that she eats 250 grams. Direct replication of this observation would require that we do the same test again. One possible, but unlikely, result would be that she would eat 250 grams again, providing an exact replication. The amount eaten would more likely be slightly different, say 245 grams. We might then conduct another replication to see whether the trend toward eating less was replicable. Suppose on that occasion our subject eats 257 grams, making it less likely that there is a trend toward less ingestion with successive tests. We could repeat the process again and again. By repeatedly observing the amount eaten after a 12-hour fast, we gain more confidence with each successive measurement about how much our 160

subject will eat of that particular food after 12 hours of not eating. One thing that direct replication can provide, via a sequence of direct, intrasubject replications such as that just described, is a baseline. The left segment of Figure 7.4 shows that there appears to be a steady baseline amount of intake in our example. A question that might arise is how many observations are needed to establish a baseline, that is, to come up with a convincing assessment? The answer is that it depends. There is no rule or convention about how many replications are needed to render an outcome considered reliable in the eyes of the scientific community. One factor of importance is how much is already known. In some of the more advanced physical sciences, a single replication (usually by a different research team) might be adequate. In our example, the researcher might have conducted similar research previously, discovered that the baseline value does not change after 10 observations, and thus deemed 10 replications enough. The researcher who chooses replication as a strategy to determine reliability of findings, therefore, does not have the comfort of a set of conventions (akin to those available to investigators who use conventional levels of statistical significance) to decide whether to conclude if an effect is reliable enough to warrant reporting to the scientific community. Instead, the investigator’s judgment plays a role, and his or her scientific reputation is dependent to some degree on 300 250 Grams Eaten

although, as we show, the distinction is not a sharp one. Most researchers are familiar with the concept of direct replication, which refers to repeating an experiment as exactly as possible. If the results are the same or similar enough, the initial effect is said to be replicated. Direct replication, therefore, is mainly used to assess the reliability of a research finding, but as we show, there is a sense in which it also provides information about generality. Systematic replication is the designation for a repetition of the experiment with something altered to see whether the effect can be observed in changed circumstances. If the results are replicated, then the generality of the finding is extended to the new circumstances. Many varieties of systematic replication exist, and it is the strategy most relevant to examining the generality of research findings.

200 150

Baseline - Food 1

Food 2

Food 1

100 50 0

0

2

4

6

8 10 12 14 16 18 20 22 24 26 Successive Tests

Figure 7.4. Hypothetical data from a series of observations of eating. The first 10 points and last six points are amounts eaten of Food 1. The middle six points are amounts eaten of Food 2.

Generality and Generalization of Research Findings

that judgment. One of the comforts of a set of conventions is that if a researcher abides by them and results are later discovered, via failed attempts at replication, not to be reliable, that researcher’s reputation suffers little. In contrast, one can argue that there are both advantages and disadvantages to relying on replication. Important advantages are having the benefit of informed judgment, especially of a seasoned investigator, and the fact that social pressure rides more directly on the researcher’s reputation. The disadvantage comes from the lack of an agreed-on set of conventions. Principled arguments about which is better for science can be made for both positions, but we favor the view that science, as a social–behavioral activity, will fare better, or at least no worse, if researchers are held more accountable for their conclusions about reliability and generality than for their adherence to a set of arbitrary, often misunderstood conventions. Returning to the role of a baseline construed as a set of intrasubject replications, such baselines can serve as bases of comparison for effects of experimental changes. For example, after establishing a baseline of eating of the first food, we could change what the food is, perhaps less tasty or more or less calorie laden. The second set of points in Figure 7.4, which in essence depict measures from a second set of replications, have been chosen to indicate a decrease. The reliability of the effect is illustrated by the successive similarity of values, and judgments about how many replications are needed would be based on the same sorts of considerations as involved in the original baseline. A usual check would involve return to the original food, and the third set of points indicates a likely result, once again with a series of replications. The overall experiment, therefore, is an example of the ubiquitous A-B-A design (see Chapter 1, this volume). Replication, of course, need not refer only to a series of successive measurements under identical conditions to produce a baseline. If the type of finding summarized in Figure 7.4 were especially counterintuitive or at considerable odds with existing knowledge, one might well repeat the entire project, Food 1 to Food 2 to Food 1, and that, too, would constitute a direct intrasubject replication. In fact, the entire project could be carried out multiple

times if, in the investigator’s judgment, such confirmation was necessary. Each successful replication increases confidence that the independent variable, change of food type, is responsible for the change in eating.

Direct Replication: Between-Subjects Reliability and Generality After all this work, an immediate limitation is that the findings, so far as we know, may well apply only to the one person studied. Our first result is based on intrasubject replication. If the goal of the research was to see whether the change in food can influence eating, then it may be the case that no further replication is needed. It is likely, however, that our interest extends beyond what is possible to what is common. In that case, additional replication is in order, which brings us to the next type of direct replication, replication with different subjects, or intersubject replication. Intersubject replication is used to examine generality, in this case across subjects, and in this single-case design N is extended to more than 1. Intersubject replication makes clear the fuzziness of the distinction between direct and systematic replication. The latter is generally defined as a replication with something changed (see below), and a new subject is certainly a change. We also suggest that systematic replication is a main strategy for assessing generality, and by studying a second subject, generality across individuals is on trial. It is even possible to suggest that most replications, even intrasubject replications, are, in fact, systematic. For example, in the intrasubject replication described above, time is different for successive observations, and the subject brings a different history to each observation period. It nevertheless has become standard to characterize replications in which the procedures are essentially the same as direct replications. As we outline shortly, systematic replications are characterized by changes in procedure or conditions that can be quite substantial. As noted in the section Significance Testing and Generality earlier in this chapter, an emphasis on replication with individual subjects approaches the issue of subject generality by increasing the number of subjects studied. Suppose, for the sake of our example, we study a second subject, performing the 161

Branch and Pennypacker

entire experiment, baseline, new food, baseline, and the whole sequence, over again. There are two major classes of outcomes. One, we get the same effect. Two, we do not. Let us deal initially with the former possibility. The first issue is what we would accept as “same.” The second person’s baseline level would likely not be exactly the same, and in fact, it might be notably different, averaging, say, 200 grams. Should we count that as a failure to replicate? The answer is (again), it depends. If our major concern was the exact amount eaten and the factors contributing to that, then the result might well be considered a failure to replicate. We will hold off for a bit, however, on what to do in the face of such failures, and move forward on the assumption that we are not so much concerned with the exact amount eaten as with whether the change in food results in a change in amount eaten. In that case, we might replicate, with the second subject, the whole sequence of conditions, Food 1, Food 2, and back to Food 1. Two possibilities exist: The results are the same as for the first subject or they are not, and again, consequently, an important issue is what is meant by same. The results are unlikely, especially in behavioral science, to be identical quantitatively, and, in fact, if the baseline is different, the change in intake cannot be identical in both absolute and relative terms, so we are left to decide whether to focus on what is different or on what is similar. In this stage of the discussion, let us assume that intake decreased, as it had for the first subject. In that case, we might feel confident that an important feature of the data has been replicated. A next question, then, would be whether additional replication with other subjects is needed. In this particular example, the answer would most likely be yes, but as is generally the case, the real answer is that it depends on what the goals of the experiment are. Behavioral scientists, by and large, tend to focus on similarities rather than differences, so if features of data reveal similarity across individuals, those similarities are likely to be pursued. Consider, therefore, a situation in which the data for the second subject are dissimilar, not only in quantitative terms but in qualitative ones as well. For example, suppose that for the second subject the change from Food 1 to Food 2 results in an increase in amount 162

eaten rather than a decrease. Here, there is no question that an important aspect of the first result has not been replicated. What is to be done then? The answer lies in the assumption of determinism that is at the core of behavioral science. If there is a difference observed between Subject 1 and Subject 2, that difference is the result of some other influence. That is, people do not differ for no reason. In fact, the failure to replicate the exact intake levels at baseline must also be a result of some factor. Failure to replicate, therefore, is an occasion on which to initiate a search for the variable or variables responsible for the differences in outcomes. Suppose, for example, that Subject 1 was female, and Subject 2 was male. Tests with other men and women (note the expansion of N) could reveal whether this factor was important in determining the outcome. Similarly, we have already assumed different baseline levels, so it might be the case that baseline level is related to the direction of change in intake, a hypothesis that can be examined by studying additional subjects. It is interesting that examination of this second possibility could be aided if the issue of different baselines between Subject 1 and Subject 2 had been assumed to be a failure to replicate. In that case, we would have focused on reasons for the difference and may have identified factors that determine baseline level. If that were so, it might be possible to control the baseline levels and to change them systematically, thus providing a direct method for studying the relation between baseline level and the effect of changing the food. Another possible reason that disparate effects are observed between subjects is differing sensitivity to the particular value of the independent variable used. In the example just described, the independent variable was characterized qualitatively as a change in food type, making assessment of sensitivity to it difficult to assess. If, however, the independent variable can be characterized quantitatively, for instance by carbohydrate content in our example, the technique of systematic replication, elaborated below, can be used to examine the possibility. An important issue in considering direct replication arises when intersubject replication succeeds but intrasubject replication does not. Taking our example, suppose that when the conditions were

Generality and Generalization of Research Findings

changed back to Food 1 with our first subject (cf. Figure 7.4), eating remained at the lower level, which would prevent replication of the effect in Subject 1. Such a result indicates either that some variable other than the change of food was responsible for the decrease in eating or that the exposure to Food 2 has produced a long-lasting change in eating. Support for the second view can come from attempts at intersubject replication. If experiments with subsequent subjects reveal that a shift from Food 1 to Food 2 results in a relatively permanent decrease in eating, the effect is verified. When initial effects are not recaptured after intervening experience that produces a change, the change is said to be irreversible. Using replication to examine irreversible effects requires intersubject replication, so we have here another instance in which N = 1 does not mean that only one subject need be studied. Many effects in psychology are irreversible, for example, those that we call learning, so the individual subject approach requires that intersubject replication be used to assess the reliability of such effects, and in so doing the generality of the effect across subjects is automatically examined. A focus on each subject individually, of course, does not prevent the use of traditional data analysis approaches, should an investigator be so inclined (for inferential statistical analyses appropriate to single-case research designs, see Chapters 11 and 12, this volume). Some, for example, might want to present group averages so that actuarial predictions can be made. Standard techniques can be used simply by engaging in the usual sorts of data manipulation. An emphasis on the data from individuals, nevertheless, can be used to enhance the presentation. For example, consider a study by Dunn, Sigmon, Thomas, Heil, and Higgins (2008), who compared two conditions aimed at reducing cigarette smoking. In one, vouchers were given contingent on breath samples that indicated that no smoking had occurred, whereas in the other the vouchers were given independently of whether the subject had smoked. Figure 7.5 shows some of the results. The bars show group means, and the dots show data from each individual, illustrating the degree to which effects were replicable across patients and the representativeness of the group

Figure 7.5. Number of days of continuous abstinence from smoking cigarettes in two groups of subjects. Circles are data from individuals. Open bars and brackets show the group means and standard errors of those means. Subjects represented by the left bar received vouchers contingent on abstinence, whereas those represented by the right bar received vouchers independent of their behavior. The top bracket and asterisk indicate that the mean difference was statistically significant at the .01 level. From “Voucher-Based Contingent Reinforcement of Smoking Abstinence Among Methadone-Maintained Patients: A Pilot Study,” by K. E. Dunn, S. C. Sigmon, C. S. Thomas, S. H. Heil, and S. T. Higgins, 2008, Journal of Applied Behavior Analysis, 41, p. 533. Copyright 2008 by the Society for the Experimental Analysis of Behavior, Inc. Reprinted with permission.

averages. Such a display of data provides considerably more useful information than do presentations that include only means or results of tests of statistical significance.

Systematic Replication: Parametric Experiments To this point, our emphasis has been on the intra- and intersubject generality and reliability of effects, and we have argued that individual subject approaches can be effectively used to assess it. Generality of effects, however, is not limited to generality across individuals, and it is to other forms of generality, culminating with scientific generality, to which we now turn. As noted earlier, systematic replication refers to replication with something changed, and, as also noted, a case can be made that replication with a new subject is a form of systematic replication in 163

Branch and Pennypacker

that it is an experiment with something changed, namely the experimental subject. From such replications come assessments of the across-subject generality of effects. In this section, we discuss other sorts of changes between experiments that constitute systematic replication. To do so, let us begin again with our example of effects of food type on eating. Suppose that after obtaining the data in Figure 7.4, we perform a systematic replication of the study rather than a direct repetition. For example, we might notice that Food 2’s carbohydrate content is higher than that of Food 1. We decide, therefore, to alter the carbohydrate content of Food 2 (and let us assume, likely impossible, without changing the taste) so that it matches that of Food 1, and repeat the experiment. Such an experiment would examine the generality of Food 2’s effect on eating to a new carbohydrate level. If adjusting Food 2’s carbohydrate amount to equal that of Food 1 resulted in the switch in foods having no effect on eating, two things can be concluded. One, the original result was not replicated. In such cases, it is often wise to replicate the original experiment to determine whether unknown variables might have been responsible. Two, carbohydrate amount is identified as a likely important variable. Thus, systematic replication is not only a method for discovering generality of effects, it is also an approach that can lead to finding controlling variables. Continuing our description of types of systematic replication, let us assume we decide to examine more fully the role of carbohydrates in eating. Our original experiment may be conducted several times but with a different carbohydrate mix for Food 2 on each occasion. Each repetition of the experiment, then, constitutes a systematic replication because a new value of carbohydrate is used for each instance. Experiments that systematically vary the value of a variable are called parametric experiments, and they play an especially important role in assessing generality. Consider the data in Figure 7.6, which are constructed to emulate what might result if several intersubject replications of a parametric experiment were conducted. Parametric examination provides a number of advantages when assessing the reliability and generality of results. First, had only a single value of the 164

Figure 7.6. Hypothetical data for three subjects showing the relationship between carbohydrate content and amount eaten.

independent variable been assessed, we might have been less than impressed with the degree of intersubject replicability of the data. The results of parametric examination, however, reveal a good deal of similarity across the three subjects: All show the same basic relation. At low percentages, the amount eaten is roughly constant within each individual. As the percentage increases, the amount eaten decreases until the percentage reaches a value above which further increases are associated with no changes in amount eaten. Second, and this is a key characteristic of parametric evaluation, the data suggest that only a range of levels of the independent variable result in a change in behavior. That is, parametric experiments permit the identification of boundary conditions, or limiting conditions, outside of which a variable is relatively ineffective. As we show later when dealing with the issue of scientific generality, information about boundary conditions can be extremely important. Figure 7.6 also illustrates how parametric experiments can help deal with the problem of lack of intersubject replicability when a single value of an independent variable is examined. Recalling our original example of comparison of food types, consider what could have happened if our first two subjects were Subjects 1 and 3 of Figure 7.6 and Food 1 had contained 20% carbohydrate and Food 2 had contained 25%. Changing the food type would have produced a change for Subject 1 but not for Subject 3,

Generality and Generalization of Research Findings

leading to a conclusion that we had failed to replicate the food change effect across subjects. The parametric examination, however, shows that both subjects are similar in how food intake was influenced by carbohydrate content, except that behavior of the two subjects was sensitive in a slightly different range. One of the most satisfying outcomes of parametric experiments is when they reveal similarities that cannot be judged when only a single value of an independent variable is tested. It is worth noting, too, that parametric experiments can reveal that apparent intersubject replicability can be misleading regarding how a variable influences behavior. It is possible that tests with a single value of an independent variable might lead to very similar quantitative results for several subjects, whereas a parametric analysis reveals that very different functions describing the relation between the independent variable happen to cross or come close together at the particular value of the independent variable evaluated. Parametric experiments illustrate one of the strengths of being able to characterize independent variables quantitatively. Experiments that determine how much of this yields how much of that provide more information about generality than do experiments that simply test whether a particular value of an independent variable has an effect. They can identify similarity where none is evident with a single value of an independent variable, and they can also determine whether apparent similarity is unrepresentative. We should note that parametric experiments are not limited in application to only primary independent variables, such as that shown in our fictitious example. Any variable associated with an experiment can be systematically varied. As an example, the experiment just described could be conducted under a range of temperatures, a range of degrees of hydration of the subjects, a range of times without food before the test, and any of several other variables. Those experiments, too, would provide information about the range of conditions under which the independent variable of carbohydrate content exerts its effects in the circumstances of the experiment. Parametric experiments, although very important, are not the only kind of systematic replications. One

other type involves using earlier findings as a starting point, or baseline, for examination of other variables. As an example, consider the phenomenon of false memory in the laboratory, produced by a procedure originally developed by Deese (1959) and later elaborated by Roediger and McDermott (1995). In these studies, subjects said they recalled or recognized words that were not presented. A great deal of research followed the original demonstrations, and these experiments varied procedural details, measurement techniques, subject characteristics, and so forth. In each instance, therefore, in which the false memory effect was reproduced, the reliability of the phenomenon was demonstrated and its generality extended. Using the reproduction of previous findings as a starting point for subsequent research, therefore, is a useful and productive technique for examining reliability and generality of research outcomes. Sidman (1960), in his characterization of techniques of systematic replication, described a type he called “systematic replication by affirming the consequent” (p. 127). Essentially, this approach is very similar to the idea of hypothesis testing because the systematic replication is not based on simply changing some aspect of the experiment to see whether effects can still be reproduced but rather on what the investigator sees to be the implications of previous results. That is, the replication may be based on the investigator’s interpretation of what the data mean. For example, consider our fictitious study of the effects of carbohydrate content on eating. That result, and perhaps those of other experiments, might suggest that the phenomenon is not specific to eating. Carbohydrate ingestion possibly leads to general lethargy or low motivation for voluntary behavior. If we suspect that, we might devise other experiments that could be viewed as systematic replications based on the possible implications of the previous findings. If the results were consistent with the lethargy interpretation, the view would gain in credence; if they were not, the view might well be abandoned. As Sidman (1960) noted, definite conclusions may not be drawn from successful replications by affirming the consequent, but, as he also noted, the approach is essential to science. The degree to which one’s confidence in an interpretation of data grows with successful replications 165

Branch and Pennypacker

depends on many things, not the least of which is how counterintuitive the predicted outcome is.

Types of Generality Assessed and Established by Systematic Replication Johnston and Pennypacker (2009) offered a useful characterization of the dimensions along which generality can be examined. They initially suggested a dichotomy between “generality of” and “generality across.” Generality across is simple to understand. As we have already noted, replication can be used to determine generality across subjects or situations, a type of generality usually of considerable interest. Systematic replication comes to the fore in the assessment of generality across species and across settings. By definition, systematic replication is an attempt at replication with something different, so if the species is changed, or if something (or a lot) about the setting is altered, the replication attempt is a systematic one. In both cases, the issue of what constitutes a successful replication may arise. Consider, for example, if we decided to attempt a crossspecies replication of our experiments with food types, and our new species was a mouse. Obviously, mice would eat considerably less, and therefore a precise, quantitative replication would not be possible. We might (actually, probably would), however, argue that the replication was successful if the relation between carbohydrate content and eating was replicated, that is, if at low concentrations there was little effect on eating, but as carbohydrate content increased, the amount eaten decreased until some level is reached above which further decreases were not seen (cf. Figure 7.6). What if the content values at which the decreases begin and end differ between the species? For example, mice may begin to show a decline when the food reaches 15% carbohydrate, whereas with the humans, decreases are not evident until the food contains 25% carbohydrate. Is that a failure to replicate? Again, the answer is yes and no. The business of science is to find regularities in nature, so emphasis is properly placed on similarities. Differences virtually always exist, so they are easy to find. Nevertheless, they cannot be ignored entirely, but their main role is not to indicate that the similarities evident are somehow unimportant, but rather to 166

promote further research into the origins of the differences if the differences are judged to be important. The scientist and the scientific community make judgments about the need for further investigation of the differences that are always present in replications. Generality of also plays an essential role in science. Johnston and Pennypacker (2009) described several categories of generality of, but here we focus on one in hopes of making the concept clear: generality of process. Our example is a behavioral process familiar to most psychologists, specifically the process of reinforcement of operant (purposive) behavior. Reinforcement refers to the increase in likelihood of behavior as a result of earlier instances being followed by certain consequences, which is the process. Systematic replications across an immense range of both behavioral activities and a very large range of consequences have been shown to provide instances of the process. For example, in addition to the traditional lever press and key peck, activities ranging from the electrical activity of an imperceptible movement of the thumb (de Hefferline, Keenan, & Harford, 1959), to vocal responses of chicks (Lane, 1960), to generalized imitation in children with developmental delays (Baer, Peterson, & Sherman, 1967), to the extensive range of activities described in the use of reinforcement in the treatment of behavior disorders (e.g., Martin & Pear, 2007; Ullman & Krasner, 1966) have all been shown as instances of the process. Similarly, the range of events used as effective consequences to produce reinforcement is also broad. Consequences such as praise, food, intravenous drug administration, opening a window, reducing a loud noise, access to exercise, and many, many others have been effectively used to produce reinforcement. All the reports may be viewed as describing systematic replications of the earliest experiments on the process (e.g., Skinner, 1932; Thorndike, 1898). This generality of process is what stands as the rationale for speaking of reinforcement theory. The argument is similar to that offered for the motion of objects. Whatever those objects are, and whether they are falling, floating, being ballistically projected, or orbiting in outer space, they can be subsumed under the notion of gravitational attraction,

Generality and Generalization of Research Findings

Newton’s theory of gravity. An even more dramatic example is provided by living things. All manner of plants and animals populate the earth, and their differences are obvious and virtually countless. What is less obvious but explains the variety is that all life can be considered to have developed from the operation of three processes: variation, selection, and retention (Darwin, 1859). The sameness of cellular architecture, including nuclear material (e.g., DNA and RNA), also attests to the similarity. Likewise, all the myriad instances of reinforcement suggest that considering them instances of a single process is reasonable. As noted earlier, an important goal of science is to discover uniformities. In fact, as Duhem (1954) noted, one of the key features of explanation is identification of the like in the unlike. Objects look different, are made of different substances, and may or may not be moving in variety of ways, but they are similar in how they are affected by gravity. Behavioral activities take on many forms, and as just noted, so can the consequences of those activities. Nevertheless, they can (on many occasions) exhibit the phenomenon known theoretically as reinforcement, an instance of generality of process.

Scientific Generality Another extremely important concept is scientific generality, a type of generality that has some counterintuitive characteristics. Scientific generality is important for at least two reasons. One, scientific generality speaks to scientists’ ability to reproduce their own findings and those of other scientists, as well. Two, scientific generality speaks directly to the possibility of effective application and translation of laboratory findings to the world at large, as discussed more fully later in the last section of this chapter. Scientific generality is defined by knowledgeable reproducibility. That is, it is not characterized in terms of breadth of applicability, but instead in terms of identification of factors that are required for a phenomenon to occur. To illustrate the difference between scientific generality and, for example, generality across people, consider again the fictitious experiment on food types. Suppose that the original experiments were all performed with male subjects. On an attempt at replication with female subjects, it is discovered that food type, or carbohydrate

c omposition, has no effect at all on eating. That, of course, would be clear indication of a limit to the across-subjects generality of the effect on eating. It would, however, represent an increase in scientific generality because it specifies more clearly the conditions required to produce the phenomenon of reduced food intake. As stated by Johnston and Pennypacker (2009), “A procedure can be quite valuable even though it is effective under a narrow range of conditions, as long as we know what those conditions are” (pp. 343–344). The vital role that systematic replication, and even failures of systematic replication, can play in establishing scientific generality therefore becomes evident. Scientific generality represents an understanding of the variables responsible for a phenomenon. Generalization, Technology Transfer, and Translational Research The function of any science is the acquisition of basic knowledge. A secondary benefit is often the possibility of applying that knowledge in ways that impart benefit to some element of the culture at large. For example, Galileo’s basic astronomic observations eventually led to improved navigation procedures with attendant benefits to the colonial powers of 17th-century Europe. Pasteur’s discovery in 1863 of the microorganisms that sour wine and turn it into vinegar, and the observation that heat would kill them, led eventually to the germ theory of disease and the development of vaccines. In the case of behavior analysis, a relatively young science, sufficient basic knowledge has been acquired to permit vigorous attempts at application. A discipline known as applied behavior analysis, discussed extensively elsewhere in Volume 2 of this handbook, is the primary result of these efforts, although application of the findings of behavior analysis are to be found in a variety of other disciplines including medicine, education, and management, to name but a few. In this section, we describe issues surrounding attempts to apply laboratory research findings in the wider world at large. Specifically, we discuss topics related to applying research findings from controlled 167

Branch and Pennypacker

laboratory or therapeutic settings to new situations or less controlled environments. First, we describe the issue of generalization of behavioral treatment effects from treatment settings to real-world circumstances. Then we outline basic general strategies for effective transfer of technologies, taking into account the known scientific generality of behavioral processes. Finally, we offer comments on the notion of translational research, a matter of much contemporary interest.

Generalization of Applications One of the earliest subjects of discussion that arose with the development of behavior therapy and behavior modification techniques was the issue referred to as generalization (e.g., Yates, 1970). Specifically, there was concern about whether improvements produced in a therapy setting would also appear in other, nontherapy (e.g., everyday life) situations. The term generalization was borrowed from a core behavioral process discovered by experimental psychologists, that after learning to respond in a particular way in the presence of a particular stimulus, say frequency of a tone, the same behavior may also occur in the presence of other more or less similar stimuli, say, other frequencies of the tone. It is an apparently simple logical step to suggest that behavior learned in a therapy environment might also appear in nontherapy, real-world environments, and when it does so, the result can be called generalization (but see Johnston, 1979, for problems with such a simple extrapolation). Because applied behavior analysis generally involves establishing conditions that alter behavior, the issue of whether those changes are restricted to the learning situations arranged or whether they become evident in other situations is usually important. For example, if a child who engages in aggressive behavior is exposed to a treatment program to reduce aggression, a goal would be to decrease aggression not only in the treatment setting but in all settings. In a seminal article, Stokes and Baer (1977) discussed the issue of generalization of treatment effects. A key contribution of their article was to indicate that in general, if effects of a treatment are to be manifested in a variety of circumstances, achieving that outcome must be considered in 168

designing the intervention intended to effect the change in behavior. That is, it is not always sufficient to simply arrange circumstances that produce a desired change in behavior in the circumscribed environment in which the treatment is undertaken. Instead, procedures should be used that increase the probability that the change will be enduring and manifested in those parts of a client’s environment in which the changes are useful. That insight has been followed by the development of general strategies to enhance the likelihood that behavior changes occur not only in the treatment environment but also in other appropriate ones. For example, Miltenberger (2008) described several general strategies that can be used to promote generalization of treatment effects. The most direct strategy is to arrange for rewards to occur immediately after instances of generalization occur. Such an approach essentially entails taking treatment to the environments in which it is hoped the new behavioral patterns will occur. That is, the training environment is not explicitly delimited. Such an approach is now widespread in applied behavior analysis partly as a consequence of an emphasis on analyzing reinforcement functions before implementing treatment (see Iwata, Dorsey, Slifer, Bauman, & Richman, 1982). This approach to problem behavior entails discovering whether the behavior is maintained by reinforcement, and if it is, identifying what the reinforcers are in the environments in which the problem behavior occurs. Once the reinforcers responsible for the maintenance of the problem behavior are identified, then procedures based on that knowledge are implemented in the situations in which the behavior occurs. A related second strategy identified by Miltenberger (2008) is consideration of the conditions operating in the environments in which the changed behavior would be occurring. The idea here is that behavior that is changed in the therapeutic setting, for example learning effective social skills, will lead, if performed, to more satisfying social interactions in the nontherapy environment, and those successes will help to solidify the gains made in the therapy sessions. In designing the therapeutic goals, therefore, consideration is given to what sorts of behavior

Generality and Generalization of Research Findings

are most likely to be successful in the nontherapy environment. A less obvious strategy applies when the nontherapy environment appears to offer little or no support for the changed behavior. An example is when therapy is aimed at training an adolescent to walk away from aggressive provocation in a schoolyard. Behaving in such an aggression-thwarting manner is not likely to result in positive outcomes with peers, who are instead likely to provide taunts and jeers after such actions. In such a case, it may be prudent to try to change the normal consequences in the schoolyard by having teachers or other monitors provide positive consequences (perhaps in the form of privileges, praise, etc.) for such actions when they occur. That is, the strategy here involves altering the contingencies operating in the nontherapy environment. A fourth general strategy is to try to make the therapy setting more like the nontherapy environment in which the changed behavior is to occur. A study by Poche, Brouwer, and Swearingen (1981) illustrated this approach. They taught abduction prevention skills to preschool children, but in so doing incorporated a relatively large number of abduction lures in the training. The intent was that by including a wide variety of possible lures that might be used by would-be kidnappers, the training would be more effective in real-world situations than if it had not involved those variations. The general strategy in this case was to train with as many likely relevant situations as possible. Another way to view this strategy is that it involves incorporating stimuli that are present in the nontherapy environment into the training. A fifth approach is somewhat less intuitive, but research has suggested that it may be effective. The core idea is that if a variety of different forms of effective behavior are established by the therapy or training, the chance of effective behavior occurring in the nontherapy environment is better, and as a result the successful behavior will be supported and continue to occur. As a simple illustration, Miltenberger (2008) offered the example of teaching a shy person a variety of specific ways to ask for a date, which provides the person with several actions to try, some of which are likely to be successful outside of therapy.

In this section, we focused on particular strategies for ensuring that desired changes in behavior established through therapeutic methods occur and persist in nontraining or nontherapy environments, that is, in the everyday world. Employment of tactics emerging from the strategies described has yielded many successes, and the methods are part of the armamentarium of applied behavior analysts. These techniques to promote generalization of behavior changes have emerged from a consideration of fundamental behavioral processes that have been identified and analyzed in basic research and then subsequently validated as effective through applied research. They represent, consequently, what can be called successful transfer from basic science to effective technology, namely, an instance of what has come to be called technology transfer. In the next section, we discuss some general principles of effective technology transfer.

Technology Transfer People often use the term technology to refer to the body of applied knowledge and practices that emanate from basic science. The term technology transfer refers to the process by which the discoveries or inventions of basic science actually make their way into the body of technology and become available for use outside the laboratory by any individual willing to undergo the expense of acquiring the technology. Technology transfer can occur with its basis in any science, so general principles exist that apply across scientific disciplines. The process is somewhat complex, and pausing to review some of the basic details that apply when a technology is brought to the commercial level will be helpful. The criteria, both legal and practical, that must be met for successful technology transfer set an upper limit against which applications of science can be evaluated. A discovery or invention is an item of intellectual property. It belongs to the inventor or discoverer or whoever sponsored the research that led to its existence. The transfer process usually involves a second party taking ownership of the property and thus the right to exploit it for commercial gain. It therefore must be protected, usually by means of a patent or copyright. Once ownership is secured, it can be 169

Branch and Pennypacker

transferred in exchange for a lump sum payment or, more often, a license agreement whereby the licensee acquires the exclusive right to produce and distribute the technology and the licensor is entitled to a royalty based on sales. Thus, for example, the Quaker Oats Company executed a licensing agreement with the University of Florida that allowed the company to produce and distribute Gatorade exclusively in exchange for a royalty in the form of a percentage of revenue.

Requirements for Technology Transfer For a candidate technology to meet the eligibility requirements for transfer, it must meet three criteria: quantification, repetition, and verification. These terms arise from the engineering literature (e.g., Hench, 1990); their counterparts in the behavioral literature will become obvious in a moment. Let us consider these characteristics in more detail and see how they conform to the products of behavioranalytic research. First, we discuss quantification. Behavior analysis has long used the measurement strategies of the natural sciences (Johnston & Pennypacker, 1980) with the result that, as Osborne (1995) stated, Physical standards of measurement bind behavior analysis to the physical and natural sciences. Interpretation of dependent variables need not change from experiment to experiment. It is a feature of . . . idemnotic measures that response frequencies on a particular parameter of a fixed-ratio schedule of reinforcement can be compared validly within sessions and across sessions, within laboratories and across laboratories, within species and across species. (p. 249) We are therefore able to state precisely and unambiguously the quantitative characteristics of behavior resulting from application of a particular procedure. Repetition is the practical use of replication as discussed earlier. The phenomenon must be able to be reproduced at will for it to serve as a component of a transferrable technology. An early (late 1950s and early 1960s) example of this feature is the 170

application by pharmaceutical companies of the reproducible effects of reinforcement schedules in evaluating drugs. A standard approach was to establish a known baseline of performance by an animal using a particular schedule of reinforcement, then evaluate the perturbation, if any, of an experimental compound on that performance. If the perturbation could be reliably reproduced (repeated), the relation between the compound and its effect on behavior was affirmed (cf. McKearney, 1975; Sudilovsky, Gershon, & Beer, 1975). Verification is most akin to the concept of generality. In establishing the generality of a behavioral process or phenomenon, researchers seek to specify the range of conditions under which the phenomenon occurs while eliminating those that are irrelevant or extraneous. Similarly, when transferring a technology, the recipient must be afforded a complete specification of the necessary and sufficient conditions for reproduction of the effects of the technology. Extensive research to nail down the parameters of generality is the only way to achieve this objective. It cannot be obtained by appeal to the results of significance tests for all of the reasons detailed earlier. A simple yet elegant example of a well-established behavioral technology that has easily transferred is the Invisible Fence for animal control, which is a direct application of the principles of signaled avoidance. A wire is buried underground around the perimeter of the enclosed area in which the animals are to remain. The animal wears a collar that receives auditory signals (beeps) and delivers electric shocks through two electrodes that contact the animal’s neck. As the animal comes within a few feet of the wire, a beep sounds. If the animal proceeds, it receives a shock. The owner teaches the animal to withdraw on hearing the beep by carrying (or leading) the animal into the proximity of the beep, then shouting “No!” and carrying or leading the animal away from the beep. After several such trials, the animal is released and will eventually receive the shock. Its escape response has been well learned, and it will very likely never contact the shock again. Rather, it will avoid it when it hears the beep. A more elaborate example of a behavioral technology that has been successfully transferred is the

Generality and Generalization of Research Findings

MammaCare Method of manual breast examination as a means of early detection of breast cancer (Pennypacker, 1986). This example is unusual because the basic research was conducted by the same individuals who eventually took the technology to market. A capsule history of the development of MammaCare and its subsequent transfer is available online (Pennypacker, 2008). In brief, a high-fidelity simulation of human breast tissue was created with the help of materials science engineers. Patent protection for this device was obtained. Basic psychophysical research using this simulation was conducted to determine the limits of detectability by human fingers of lifelike simulations of breast tumors and other abnormalities. Once these limits were known, behavioral studies isolated the components of a technique that allowed an examiner to approach the established psychophysical limits. Early translational research established that practicing the resulting techniques on the simulation enabled examiners to detect real lumps in live breast tissue at about twice their baseline accuracy. Extensive research was then undertaken to establish procedures for teaching the new examination technique, which became known as MammaCare. Technology transfer became possible when standards of performance were established and training methods were devised that could be readily repeated and whose results could be verified. As a result, individuals wanting to offer such training either to the public or to other medical professionals (people who routinely perform clinical breast examinations, e.g.) may now become certified and can operate independently. Technology transfer was greatly accelerated in the United States by the passage in 1980 of the Bayh–Dole Act, which made it possible for institutions conducting research with federal funds to retain the products of that research. Offices of licensing and technology soon appeared in all of the major research universities, which in turn began licensing to private organizations the products of their sponsored research. The resulting revenue in the form of fees and royalties has been of significant benefit to these institutions. Most of this activity, however, has taken place in the hard sciences, engineering and medicine. Fields such as behavior

analysis were not ready to enjoy the stimulative effects of the Bayh–Dole Act at the time it became law. Analogous federal attention to the clinical and behavioral sciences has emerged in the form of the National Institutes of Health’s National Center for Research Resources, which makes Clinical and Translational Science Awards. Mace and Critchfield (2010) cited a statement by the acting director of the National Institutes of Health’s Office of Behavioral and Social Sciences to the effect that “its [the institute’s] mission is science in pursuit of fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to extend healthy life and reduce the burdens on illness and disability” (p. 307). The aim is to accelerate the application of basic biomedical research to clinical problems. We now turn our attention to this type of research.

Translational Research From our perspective, translational research is a node along the continuum from basic bench science to the sort of application that results from technology transfer. The continuum is abstractly defined by generality. The ultimate goal of translational research may be broadly seen as establishing the limits of generality of whatever variable, process, or procedure is of interest. Translational research is therefore a somewhat less stringent endeavor than full technology transfer to bridge the gap from bench to bedside. A distinguishing feature of this approach is that the basic scientist and clinician often collaborate in a synergistic manner. This practice will likely accelerate the development of candidate technologies because the applied aspect is undergoing constant examination and refinement. Lerman (2003) has provided an excellent overview of translational research in behavior analysis. She correctly observed that the bulk of the literature on applied behavior analysis consists of reports of translational research. She went on to describe a series of maturational stages of this endeavor, beginning with the early demonstrations of the generality of the process of reinforcement across species, individuals, and settings. From these emerged concerns with other basic processes such as extinction, 171

Branch and Pennypacker

stimulus generalization, discrimination, and the effects of basic contingencies of reinforcement. As these types of demonstrations increasingly proved beneficial to clinical populations, a new dimension of this research emerged that focused on issues of training, maintenance of benefits, and even the implications of such practices for public policy. Concurrently, focus shifted from individual cases to larger entities such as schools, corporations, and even military units. At the same time, a small but growing body of translational research will explicitly hasten the development of mature technologies that can be transferred with the usual attendant financial and cultural benefits. One such effort is a study by St. Peter Pipkin, Vollmer, and Sloman (2010), who examined the effects of decreasing adherence to a schedule of differential reinforcement of alternative behavior, first in a laboratory setting and then, using the exact same reinforcement schedule parameters, in an educational setting with two individuals with educational handicaps. They explored the generality of a procedure across settings and populations and further documented the effects of deliberate failure to impose the differential reinforcement of alternative behavior schedule as specified, either by not delivering reinforcement when required or by “accidentally” allowing undesirable behavior to contact reinforcement at various times. These manipulations constitute an attempt to demonstrate the consequences of breakdowns in treatment integrity, which in some cases can be highly destructive and in others may be negligible. The type of translational research just described constitutes an important step toward the development of a technology that can be transferred in the sense discussed earlier. Treatment integrity is directly analogous to what the engineers call verification. St. Peter Pipkin et al. (2010) provided guidance for establishing a range of allowable treatment integrity failure within which effectiveness may be maintained, which is akin to specifying tolerances in a manufacturing process or allowable failure rates of components in complex equipment.

General Considerations for Translational Research Mace and Critchfield (2010) offered an informative perspective on the current role of translational 172

research in behavior analysis. They stressed the importance of conducting research that can be of more or less immediate benefit to society if substantial societal support for such research is to occur. In our view, the likelihood of such research actually attaining that criterion would be augmented to the extent that researchers keep as their ultimate goal development of a technology that can be transferred as we have discussed. In fact, very few examples of commercially transferrable technology have yet to emerge from behavior analysis (Pennypacker & Hench, 1997). There is, however, sufficient promise in the replicability of the discipline’s basic findings to encourage development of transferrable technologies, and the availability of substantial research support is critical. More translational research aimed at identifying and isolating the conditions under which specified procedures can be assured to consistently generate measurable and desirable effects on the behavior of individuals will hasten the emergence of such technologies. Summing Up The subdiscipline known as behavior analysis offers an alternative approach to assessing reliability and generality of research findings, that is, an approach that is different from that used by most psychological researchers today. The methods that provide avenues to assessing reliability and generality may be of interest to psychologists who approach the field of behavior (or mind) from perspectives other than those shared by behavior analysts. At this juncture in the history of behavioral science, the methods might be especially attractive to researchers who are coming into contact with the substantial limitations of traditional methods that rely on group averages and null-hypothesis significance testing. In the early sections of this chapter, we reiterated those weaknesses because it appears that many behavioral researchers are not aware of them. Our main thrust in this chapter has been to describe and characterize types of replication and the roles that they play in determining the reliability and generality of research outcomes. We have especially emphasized the role replication can play in assessing the generality of research findings, both

Generality and Generalization of Research Findings

across subjects and conditions and of theoretical assertions. We have argued, in fact, that replication, both direct and systematic, currently represents the only set of methods that can determine whether results are reliable and how general they are. Our claim is a strong one, and it is not that replication is an alternative set of methods but rather that it is the only way to determine reliability and generality given current knowledge. Replication has served the more developed sciences very effectively, and it is our contention that it can serve behavioral science, too. At the very least, we hope we have convinced the reader that paying greater attention to replication will advance behavioral science more surely and more rapidly than the methods currently in fashion. In the final section of the chapter, we focused on issues of applying science to problems the world faces. Once reliability and generality of research findings have been established to an appropriate degree, it is sometimes possible to take advantage of that knowledge for the betterment of people and society. There are guideposts about how best to do that, and we have discussed some of them. The other chapters in this handbook present a wide-ranging description of the research and application domains that constitute behavior analysis. We see those chapters as testament to the coherent science and technology that can be developed when the markers of reliability and generality have been established through research founded on direct and systematic replication.

References Anscombe, F. J. (1973). Graphs in statistical analysis. American Statistician, 27, 17–21. doi:10.2307/2682899 Baer, D. M., Peterson, R. F., & Sherman, J. A. (1967). The development of imitation by reinforcing behavioral similarity to a model. Journal of the Experimental Analysis of Behavior, 10, 405–416. doi:10.1901/ jeab.1967.10-405

Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review, 48, 378–399. Cleveland, W. (1994). The elements of graphing data. Summit, NJ: Hobart Press. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003. doi:10.1037/0003-066X. 49.12.997 Darwin, C. (1859). On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. London, England: John Murray. Deese, J. (1959). On the prediction of occurrence of particular verbal intrusions in immediate recall. Journal of Experimental Psychology, 58, 17–22. doi:10.1037/ h0046671 de Hefferline, R. F., Keenan, B., & Harford, R. A. (1959). Escape and avoidance conditioning in human subjects without their observation of the response. Science, 130, 1338–1339. Duhem, P. (1954). The aim and structure of physical theory (P. P. Wiener, Trans.). New York, NY: Princeton University Press. Dunn, K. E., Sigmon, S. C., Thomas, C. S., Heil, S. H., & Higgins, S. C. (2008). Voucher-based contingent reinforcement of smoking abstinence among methadone-maintained patients: A pilot study. Journal of Applied Behavior Analysis, 41, 527–538. doi:10.1901/jaba.2008.41-527 Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory and Psychology, 5, 75–98. doi:10.1177/0959354395051004 Gigerenzer, G., Krauss, S., & Vitouch, O. (2004). The null ritual: What you always wanted to know about significance testing but were afraid to ask. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 391–408). Thousand Oaks, CA: Sage. Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers? Methods of Psychological Research—Online, 7(1), 1–20. Hench, L. L. (1990, August). From concept to commerce: The challenge of technology transfer in materials. MRS Bulletin, pp. 49–53.

Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66, 423–437. doi:10.1037/h0020412

Iwata, B. A., Dorsey, M. F., Slifer, K. J., Bauman, K. E., & Richman, G. S. (1982). Toward a functional analysis of self-injury. Analysis and Intervention in Developmental Disabilities, 2 3–20.

Bayh-Dole Act, Pub. L. 96-517, § 6(a), 94 Stat. 3018. (1980).

Johnston, J. M. (1979). On the relation between generalization and generality. Behavior Analyst, 2, 1–6.

Bernard, C. (1957). An introduction to the study of experimental medicine. New York, NY: Dover. (Original work published 1865)

Johnston, J. M., & Pennypacker, H. S. (1980). Strategies and tactics of human behavioral research. Hillsdale, NJ: Erlbaum. 173

Branch and Pennypacker

Johnston, J. M., & Pennypacker, H. S. (2009). Strategies and tactics of behavioral research (3rd ed.). New York, NY: Routledge. Kalinowski, P., Fidler, F., & Cumming, G. (2008). Overcoming the inverse probability fallacy: A comparison of two teaching interventions. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 4, 152–158. Lane, H. (1960). Control of vocal responding in chickens. Science, 132, 37–38. doi:10.1126/science.132.3418.37 Lerman, D. C. (2003). From the laboratory to community application: Translational research in behavior analysis. Journal of Applied Behavior Analysis, 36, 415–419. doi:10.1901/jaba.2003.36-415 Loftus, G. R. (1991). On the tyranny of hypothesis testing in the social sciences [Review of The empire of chance: How probability changed science and everyday life]. Contemporary Psychology, 36, 102–105. Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 5, 161–171. doi:10.1111/1467-8721.ep11512376 Mace, F. C., & Critchfield, T. S. (2010). Translational research in behavior analysis: Historical traditions and imperative for the future. Journal of the Experimental Analysis of Behavior, 93, 293–312. doi:10.1901/jeab.2010.93-293 Martin, G., & Pear, J. (2007). Behavior modification: What it is and how to do it (8th ed.). Upper Saddle River, NJ: Pearson. McKearney, J. W. (1975). Drug effects and the environmental control of behavior. Pharmacological Reviews, 27, 429–436. Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34, 103–115. Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834. doi:10.1037/0022-006X. 46.4.806 Miltenberger, R. G. (2008). Behavior modification: Principles and procedures (4th ed.). Belmont, CA: Thompson. Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301. doi:10.1037/ 1082-989X.5.2.241 Oakes, M. (1986). Statistical inference: A commentary for the social and behavioral sciences. Chichester, England: Wiley. Osborne, J. G. (1995). Reading and writing about research methods in behavior analysis: A personal 174

account of a review of Johnston and Pennypacker’s Strategies and Tactics of Behavioral Research (2nd ed.) and others. Journal of the Experimental Analysis of Behavior, 64, 247–255. doi:10.1901/jeab.1995.64-247 Pennypacker, H. S. (1986). The challenge of technology transfer: Buying in without selling out. Behavior Analyst, 9, 147–156. Pennypacker, H. S. (2008). A funny thing happened on the way to the fortune, or lessons learned during 25 years of trying to transfer a behavioral technology. Behavioral Technology Today, 5, 1–31. Retrieved from http://www.behavior.org/resource.php?id=188 Pennypacker, H. S., & Hench, L. L. (1997). Making behavioral technology transferrable. Behavior Analyst, 20, 97–108. Penston, J. (2005). Large-scale randomized trials—A misguided approach to clinical research. Medical Hypotheses, 64, 651–657. doi:10.1016/j.mehy.2004.09.006 Poche, C., Brouwer, R., & Swearingen, M. (1981). Teaching self-protection to young children. Journal of Applied Behavior Analysis, 14, 169–175. doi:10.1901/ jaba.1981.14-169 Roediger, H., & McDermott, K. (1995). Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 803–814. doi:10.1037/ 0278-7393.21.4.803 Rozeboom, W. W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin, 57, 416–428. doi:10.1037/h0042040 Sidman, M. (1960). Tactics of scientific research. New York, NY: Basic Books. Skinner, B. F. (1932). On the rate of formation of a conditioned reflex. Journal of General Psychology, 7, 274–286. doi:10.1080/00221309.1932.9918467 Smithson, M. (2003). Confidence intervals. London, England: Sage. Stokes, T. F., & Baer, D. M. (1977). An implicit technology of generalization. Journal of Applied Behavior Analysis, 10, 349–367. doi:10.1901/jaba.1977.10-349 St. Peter Pipkin, C., Vollmer, T. R., & Sloman, K. N. (2010). Effects of treatment integrity failures during differential reinforcement of alternative behavior: A translational model. Journal of Applied Behavior Analysis, 43, 47–70. doi:10.1901/jaba.2010.43-47 Sudilovsky, A., Gershon, S., & Beer, B. (Eds.). (1975). Predictability in psychopharmacology: Preclinical and clinical correlations. New York, NY: Raven Press. Thorndike, E. L. (1898) Animal intelligence: An experimental study of the associative processes in animals. Psychological Review, 11(4, Whole No. 8). Tukey, J. W. (1969). Analyzing data: Sanctification or detective work? American Psychologist, 24, 83–91. doi:10.1037/h0027108

Generality and Generalization of Research Findings

Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.

and explanations. American Psychologist, 54, 594–604. doi:10.1037/0003-066X.54.8.594

Ullman, L. P., & Krasner, L. (Eds.). (1966). Case studies in behavior modification. New York, NY: Holt, Rinehart & Winston.

Williams, B. A. (2010). Perils of evidence-based medicine. Perspectives in Biology and Medicine, 53, 106–120. doi:10.1353/pbm.0.0132

Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines

Yates, A. J. (1970). Behavior therapy. New York, NY: Wiley.

175

Chapter 8

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology Neville M. Blampied

Darwinism implies . . . an intense awareness that all categorical or essentialist claims about living things are overdrawn— anyone who says that all cases of this thing or that thing are naturally one way or another are saying something that isn’t so. . . . Repetition is the habit of nature, but variation is the rule of life. . . . Belief in the primacy of the single case is not an illusion nurtured by fancy but a hope quietly underscored . . . by science. The general case is the tentative abstract hypothesis; the case right there is the real thing. (Gopnik, 2009, pp. 197–198) How to cope with variation within repetition; how to balance the abstract and the particular—these issues, so deftly stated by Gopnik (2009), challenge all science, including behavioral science. In this chapter, I consider how psychology responded to these challenges by adopting the now-dominant paradigm for research design and analysis within the discipline and how this strongly influenced the adoption of an ideal model—the scientist-practitioner model—for applying science. I then consider problems and difficulties that have arisen with both research and its application through adherence to

this model and argue that the adoption of singlecase research strategies provides an effective solution to many of these problems. Psychology Defines Science—The Inference Revolution John Arbuthnot (Gigerenzer, 1991; Kendall & Plackett, 1977), a physician and mathematician in the household of Queen Anne, is said to have proved the existence of God (a “wise creator”) in 1710 by way of a kind of significance test, but it was not until the middle years of the 20th century that psychology made significance testing into a kind of god. Before these developments, psychologists had used a range of tabular and especially graphical techniques for analyzing what were often complex data with large sample sizes (Smith, Best, Cylke, & Stubbs, 2000). Smith et al. (2000) noted that “history clearly shows that worthy contributions to psychology do not inherently depend on the benediction of p values. Well into the 20th century psychologists continued to produce lasting, even canonical, achievements without using inferential statistics” (p. 262). The changes in both research practice and data analysis that occurred in (Western) psychology in

I acknowledge my profound debt to Murray Sidman and thank him and Rita Sidman for their many kindnesses. I also acknowledge my enduring gratitude to the academic staff of the Department of Psychology, University of Auckland, 1964–1969, for their rigorous introduction to the science of behavior, and especially to John Irwin, who introduced me to the writings of B. F. Skinner. I am also grateful for the opportunity to discuss research methodology with my friend and colleague Brian Haig and for his comments on this chapter. I also acknowledge the assistance provided by an unpublished thesis by K. J. Newman (2000) in providing very useful background on the scientist-practitioner model of clinical psychology; helpful comments from Martin Dorhay; and many excellent suggestions from the editors, Kennon A. Lattal and Gregory J. Madden. Assistance in the preparation of this chapter was provided by Emma Marshall, who was supported by a Summer Scholarship jointly funded by the University of Canterbury and the Tertiary Education Commission of New Zealand. DOI: 10.1037/13937-008 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

177

Neville M. Blampied

the period from 1935 to 1955 have been called an inference revolution (Gigerenzer, 1991) and have been claimed by some to have had many of the properties of a Kuhnian paradigm shift (Rucci & Tweney, 1980). The rapidity and thoroughness of this revolution was remarkable, and it had profound and lasting consequences, not least because it comprehensively redefined, in an operational way, what “science” was and is and ought to be in the context of psychological research and, therefore, what the science part of a scientist-practitioner should be. The inference revolution was foreshadowed toward the end of the 19th century by Galton, the inventor of the correlation coefficient and the control group (Dehue, 2000; Gigerenzer et al., 1989), who wrote approvingly about “scientific men” who would “devise [statistical] tests by which the value of beliefs may be ascertained” and who would “discard contemptuously whatever may be found to be untrue” (as quoted in Schlinger, 1996, p. 72). The foundations of the revolution were, however, not laid until the 1920s when R. A. (Sir Ronald) Fisher, statistician, geneticist, and evolutionary biologist, published his profoundly influential works (Wright, 2009). Statistical Methods for Research Workers (Fisher, 1925) and The Design of Experiments (Fisher, 1935) gave the world factorial designs, randomization of research elements to treatment conditions, the analysis of variance, the analysis of covariance, the null hypothesis (i.e., the hypothesis to be nullified), and null-hypothesis significance tests (NHST; Yates & Mather, 1963). Although some fellow statisticians were suspicious (Yates & Mather, 1963), Fisher’s ideas spread rapidly, especially in biology (see Appendix 8.1 for further information on, and definition of terms for, NHST). Given that Fisher was working as an agricultural scientist and geneticist at the time he wrote these books, the spread of his ideas to biology was explicable. Rather more surprising was the rapid adoption of Fisher’s methods by psychologists. Rucci and Tweney (1980) identified 1935 as the first year psychological research using analysis of variance was published, with 16 further examples published by

1940. These early examples were largely of applied research, and it was from applied psychology that inferential statistics spread to experimental research (Gigerenzer et al., 1989). Fisher’s statistical tests were not the only practices adopted. His advocacy of factorial designs, that is, experiments investigating more than one level of an independent variable, was also influential, reflecting his view that experimental design and statistical analysis are “different aspects of the same whole” (Fisher, 1935, p. 3). There was a hiatus in the dissemination and adoption of Fisher’s statistical methods during World War II, but there was then an acceleration in the postwar years, so by the mid-1950s, Fisher’s statistical tests were widely reported in published articles across many psychology journals; many textbooks had been published to assist with the teaching of these methods; and leading universities were requiring graduate students to take courses in these methods (Hubbard, Parsa, & Luthy, 1997; Rucci & Tweney, 1980). Since that time, more than 80% of published articles in psychology journals have typically reported significance tests (Hubbard et al., 1997). Thus, by the time the inference revolution was complete—in the mid-1950s—psychology, or at least the academic, English-speaking domain of the discipline, had developed a consensus about what it meant to be a science and to do science.1 The key attributes of this model of science are outlined in Exhibit 8.1. In a mutually strengthening cycle, they were taught in the methods courses, written about in the methods textbooks, practiced in the laboratory, required by editors, published in the journals, and imitated by other researchers. Until recently, this standard model of psychological science (henceforth termed the standard group statistical model, or standard model for short) has gone largely unchallenged in the mainstream of the discipline. As it also happened, this consensus about scientific methods was achieved at about the same time as major developments were also occurring in the understanding of clinical psychology, the topic to which I now turn.

This consensus adopted most of Fisher’s approach to statistics and research design but also incorporated, in an ad hoc and anonymous way, aspects of Jerzy Neyman and Egon Pearson’s perspective (Gigerenzer, 1991; Gigerenzer et al., 1989; Hubbard, 2004; see Appendix 8.1).

1

178

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology

Exhibit 8.1 Key Characteristics of the Standard Model of Research in Psychology Recruit as manya participantsb as possible. (Quasi)randomly allocate these participants to experimental conditions. Acquire one or a few measures of the dependent variable from each participant. Use measures of central tendency (mean, median, etc) and variance to aggregate data for each dependent variable. Compute inferential statistics one dependent variable at a time, comparing the aggregated measures for each group. Operate under a null hypothesis (H0) for which the population mean differences are zero. Use p values relative to a criterion to make an accept– reject decision concerning the null hypothesis and any alternative hypothesis. Regard results of the study as being of scientific value only if H0 has been rejected at the criterion level.

Power analysis permits the computation of a minimum sample size for a given level of power. Investigators are still encouraged to regard larger samples as preferable to smaller samples (e.g., Streiner, 2006). bIn principle, these participants are regarded as being drawn from some specified population, but in psychological research the population of interest from which the sample is considered to have been drawn is rarely specified with any precision (Wilkinson & Task Force on Statistical Inference, 1999), an omission that strictly negates the drawing of inferences from the sample. a

Clinical Psychology and the Rise of the Scientist-Practitioner Ideal O’Donnell (1985) argued that in the period from about 1870 until World War I, the emergent discipline of psychology had to surmount three major challenges: to differentiate itself from philosophy, to avoid becoming a branch of physiology, and to become seen as sufficiently useful to society that both the discipline and its practitioners garnered institutional, social, and economic support. From this matrix of historical processes, especially the last, emerged clinical psychology. Consistent with this argument, there is agreement among historians of psychology that clinical psychology had its immediate origins in the late 1900s (e.g., Benjamin, 2005;

Bootzin, 2007; Korchin, 1983; O’Donnell, 1985; Reisman, 1966, 1991; Routh, 1998), although its origins can be traced back to antiquity (Porter, 1997; Routh, 1998). By the beginning of the 20th century, clinics in both Europe and the United States were devoted to the study of mental disease, mental retardation, and children’s educational and adjustment problems, in which a nascent clinical psychology was evident (Bootzin, 2007). The most unequivocally psychological of these was the clinic established by Lightner Witmer at the University of Pennsylvania in 1896 (McReynolds, 1997). Witmer’s clinic focused on children (McReynolds, 1997; Reisman, 1966), and it was a place where they could be assessed by psychologists (and social workers), drawing on the psychologists’ knowledge of sensory, perceptual, and cognitive systems and, where possible, remedial actions could be prescribed. Witmer presciently emphasized two points: one, as a teacher, the clinician was to conduct demonstrations . . . so that [students] would be instructed in the art and science of psychology; two, as a scientist, the clinician was to regard each case as in part a research experiment in which the effects of his procedures and recommendations were to be discovered. (Reisman, 1966, p. 86). Bootzin (2007) noted that these small experiments anticipated the development of single-case research methods in psychology, with Witmer himself emphasizing his “belief that we shall more profitably investigate these causes [of development] by the study of the individual” and that he studied “not the abnormal child but . . . the individual child” (Witmer, 1908, as quoted in McReynolds, 1997, p. 142). Witmer established courses for training graduate students in practical psychology at the University of Pennsylvania in 1904–1905 and founded a journal— The Psychological Clinic—in March 1907. In the initial issue of this journal, Witmer (1907/1996, p. 251) first used the term clinical psychology—“The methods of clinical psychology are . . . invoked wherever the status of an individual mind is determined by observation and experiment, and pedagogical treatment 179

Neville M. Blampied

applied to affect a change”—and thus signaled the founding of a new field of psychology. Other U.S. universities soon established clinics in the Witmer model; a few hardy professional psychologists worked outside academia, all of them in institutions serving people with intellectual disabilities; and at least one institution offered an internship (Reisman, 1966; Routh, 2000). So was clinical psychology born. The mature form with which psychologists and the public are familiar did not develop, however, until toward the end of World War II. As a profession, psychology had grown in numbers and influence during the war. The end of the war produced a rapidly growing gap between professional resources, especially the number of clinical psychologists, and the needs of postwar society. To meet these needs, more graduates in clinical and applied psychology in general were required. A committee of the American Psychological Association, chaired by David Shakow, worked to develop appropriate graduate curricula, accreditation processes, and internship opportunities and funding (Committee on Training in Clinical Psychology, 1947; Reisman, 1991). The committee defined clinical psychology as having “systematic knowledge of human personality and . . . principles and methods by which it may use this knowledge to increase the mental well being of the individual” (Committee on Training in Clinical Psychology, 1947, p. 540). The report further defined clinical psychology as “both a science and an art” and stated that the graduate must have “applied and theoretical knowledge in three major areas: diagnosis, therapy, and research” (Committee on Training in Clinical Psychology, 1947, p. 540). Clinical training programs should include psychological research methods and statistics as well as clinical courses and incorporate a research dissertation leading to a doctorate (Committee on Training in Clinical Psychology, 1947; Thorne, 1945). Between 1947 and 1949, the number of American Psychological Association– accredited doctoral programs in clinical psychology

in the United States doubled, and this increase continued apace in subsequent years (Reisman, 1991). This growth stimulated further institutional activity, and in August 1949 a famous conference, the Boulder Conference on Graduate Education in Clinical Psychology, was held at the University of Colorado in Boulder (Raimy, 1950). This conference endorsed the recommendations of the Shakow committee and gave an official imprimatur to what is now termed the Boulder model for the training of applied and clinical psychologists as scientistpractitioners (Bootzin, 2007). This scientistpractitioner ideal, although contested and disputed, has ever since remained the dominant ideal of the clinical psychologist throughout the Englishspeaking world (e.g., Martin, 1989; Shapiro, 2002). What this ideal entails has been specified in different ways, but Barlow, Hayes, and Nelson (1984) suggested that it specified three aspects or roles for practitioners as (a) consumers of new research, (b) empirical evaluators of their own practice, and (c) producers of scientifically credible research from their own practice setting.2 Although developed specifically in regard to clinical psychology, the model in principle embraced a wide understanding of applied psychology, so that many forms of practice could be included (B. B. Baker & Benjamin, 2000; Raimy, 1950). It has been adopted by other applied areas, such as counseling psychology (e.g., Corrie & Callahan, 2000; Vespia & Sauer, 2006), health psychology (e.g., Sheridan et al., 1989), neuropsychology (e.g., Rourke, 1995), organizational and personnel psychology (e.g., Jex & Britt, 2008), and school (educational) psychology (e.g., Edwards, 1987). Given the close coincidence in time of the completion of the inference revolution that operationally defined the scientific method in psychology, the affirmation of the scientist-practitioner ideal at the Boulder conference, and the rapid growth of graduate training programs accredited according to the Boulder model, it is hardly surprising that the

Consistent with the ethos of the time (the 1950s), those who formulated the scientist-practitioner model appear to have endorsed an essentially linear, or one-way, model of the link between basic science and applied science (Reich, 2008). In psychology, this model appears not to have been much debated or challenged, but in the history and philosophy of science, the debate has been extensive (Balaram, 2008). Stokes (1997) developed a multidimensional typology for characterizing the relationship between science and its application, and it has been suggested that psychology belongs in Stokes’s “Pasteurian quadrant” (Reich, 2008; see also Price & Behrens, 2003). This domain is characterized by high concurrent interest in both basic science and its applications (Reich, 2008) and clearly has links to the idea of translational research as well (Lerman, 2003).

2

180

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology

science part of the scientist-practitioner ideal came to be identified with the standard group statistical model (Aiken, West, & Millsap, 2008; Aiken, West, Sechrest, & Reno, 1990; Rossen & Oakland, 2008). Frank (1984) has suggested that the conspicuous fact that most of clinical practice had not been derived from science was what led the field to seize on method, as endorsed in academe by researchers, as the common element linking psychologists together. By the late 1960s, influential expositions of research designs in clinical research (e.g., Kiesler, 1971) endorsed group statistical research as the primary scientific method. Although the scientist-practitioner ideal has been criticized, disputed, modified, and lamented (e.g., Albee, 2000; Lilienfeld, Lynn, & Lohr, 2003; Peterson, 2003; Stricker, 1975; Tavaris, 2003), it continues to be vigorously affirmed as an enduring ideal in applied psychology (e.g., B. B. Baker & Benjamin, 2000; T. B. Baker, McFall, & Shoham, 2008; Belar & Perry, 1992; Kihlstrom & Kihlstrom, 1998; McFall, 2007; Soldz & McCullogh, 2000) as both informing and inspiring graduate training and professional practice. Thus, by the midpoint of the 20th century a convergence of two powerful movements within psychology had occurred. One defined what science was and how it should be done; the other specified that having each graduate personally become both a scientist and a practitioner through training and supervised practice in both research and application was the way in which professional applied psychology should be conducted. As it happened and still happens, training of graduate students in mainstream settings, particularly the major universities, whether of general experimentalists or of aspiring practitioners, emphasized the standard group statistical model as the primary and necessary pathway to pure and applied knowledge (Aiken et al., 1990, 2008; Rossen & Oakland, 2008). Some Consequences of this History The preceding section, in brief, concerned how history unfolded for psychology in the 20th century, specifically in the combination of a prescription for doing science with an ideal form of applied science practitioner. How have things turned out in the

ensuing 50-plus years? In answering this question, I first reflect briefly on some of the consequences that followed from this development. Second, I consider whether there was any alternative to history as it happened, because if there was no alternative, then no matter the consequences, psychologists have had the best of all possible worlds. The first and most obvious consequence to note is that clinical science, the foundation for the scientistpractitioner, necessarily enjoyed both the benefits and the problems (if any) of the underpinning Fisherian model, because there can be no doubt that clinical research has overwhelmingly been conducted within this tradition. Problems, however, there were. Dar, Serlin, and Omer (1994) reported an extensive methodological review of the use of statistical tests in research published in the Journal of Consulting and Clinical Psychology in the three decades spanning from 1967 to 1988 and showed that this research almost without exception was conducted within the standard NHST paradigm. They also documented the occurrence of a large number and variety of misinterpretations of statistical tests in this research, including the abuse of p values as having meaning they do not have (see Chapter 7, this volume), the absence of confidence intervals, and the gross inflation of Type I error. Where trends existed over the decades, they were often toward the frequency of misinterpretation getting worse rather than better. Dar et al. (1994) concluded that there was “a growing misuse of null hypothesis tests” (p. 79) in psychotherapy research. In this, clinical research echoed the situation in psychological research as a whole, where widespread misinterpretation of statistical tests has been notorious for decades (Cohen, 1990, 1994; see Balluerka, Gomez, & Hidalgo, 2005, and Nickerson, 2000, for comprehensive reviews). Given the centrality of conclusions based on the outcome of NHST for determining the scientific status of research and its likelihood of publication, this state of affairs is serious for all research, including clinical research. Has the situation improved more recently? Fidler et al. (2005) reported on a similar survey of empirical articles published in the same eminent journal for five time periods spanning from 1993 to 2001. NHST-based methods continued to dominate 181

Neville M. Blampied

research, but misinterpretations and misapplications of statistical methods also continued to be conspicuous. Some improvement over the Dar et al. (1994) survey was noted, but Fidler et al. (2005) commented tartly, “In a major journal dedicated to the research of psychotherapy . . . clinical significance [rather than statistical significance] should be relevant to more than 40% of articles” (p. 139). Similar surveys of other major research journals have suggested a common picture of continuing misinterpretation of Fisherian statistics and resistance to change (Schatz, Jay, McComb, & McLaughlin, 2005; Vacha-Haase, Nilsson, Reetz, Lance, & Thompson, 2000) despite much effort to improve statistical methods (e.g., ErcegHurn & Mirosevich, 2008; Rodgers, 2010; Wilkinson & Task Force on Statistical Inference, 1999). Given this lamentable state of affairs from the mid-1960s to the present, little reason exists to suppose that clinical science has been spared the deleterious outcomes some critics have suggested all psychological research has suffered as a result of using the dominant statistical paradigm (Michael, 1974; Rosenthal, 1995). These outcomes include serious lack of power to detect experimental effects (e.g., Cohen, 1962; Sedlmeier & Gigerenzer, 1989), the lack of genuinely cumulative knowledge in psychology (e.g., F. L. Schmidt, 1996), the conspicuous divergence of the methods of psychology from those used in other natural and social sciences (Blaich & Barreto, 2001; Meehl, 1978), and the failure of psychology to develop unambiguously falsifiable theories (e.g., Meehl, 1978). Indeed, Meehl (1978) said of the adoption of NHST that it was “a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology [emphasis added]” (p. 817). A second, and perhaps less documented consequence of the joining of clinical science with the standard research model has been the ubiquitous use of averaging across participants. One of the direct, remarkable, but little-commented-on features of the inference revolution was the almost universal and unquestioned adoption of experimental groups necessitating group averaging as almost the first step in any data analysis procedure. Most psychologists today and for the past 50-plus years would find it 182

extraordinarily difficult to imagine doing research that did not entail the computation of group averages as the initial step in any analysis (e.g., Cairns, 1986). There was nothing particularly remarkable about this use of averaging in the case of Fisher’s own research, dealing as he did with agricultural produce for which interindividual differences in the properties of individual grains of rice or kernels of wheat and so forth are not of particular interest, for which such items are so numerous as to preclude examination of each one anyway, and for which the commercial and economic system dealt with agricultural commodities in bulk. Given this, the development and use of sampling theory that permitted estimation of population parameters and the making of inferences from samples to populations made sense and continues to be appropriate when actuarial or populationlevel issues are at stake. Critically, however, for psychology, these practices in agricultural research meshed neatly with the pursuit by Quetelet, a century before Fisher, of “ideal” aspects of humankind according to the normal distribution (the normal law of error; Gigerenzer et al., 1989). Quetelet’s search for the ideal human led directly to Fisher’s inferential statistics (Johnston & Pennypacker, 1993). The question of whether this combination of teleological assumptions about perfect human types and a focus on the estimation of aggregate properties of commodities was a good model for clinical and other applied psychologies, in which the cases being served are often diverse and individual differences are potentially of great importance, seems not to have been asked during the inference revolution, during the Boulder conference, or since. Nevertheless, a range of scholars have criticized psychology’s addiction to averaging across cases. The evolutionary biologist Steven Jay Gould (1985), in “The Median Isn’t the Message,” a touching autobiographical memoir of his first cancer diagnosis, recounted how his realization that the distribution of survival time postdiagnosis was skewed beyond the median 8 months gave him hope. The point of his writing about this experience, however, was to make a more general point: “I believe that the fallacy of reified variation—or failure to consider the ‘full house’ of all cases—plunges us into serious error

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology

again and again” (Gould, 1997, p. 46). The error Gould was referring to is to focus exclusively on the mean (or median) and ignore variability, forgetting that variability is as central a phenomenon in biological systems as is any central tendency. The same point had been made by the physiologist Claude Bernard (1865/1957) more than a century before (Thompson, 1984). Developmental researchers and theorists are notable among psychologists who have criticized averaging across cases (e.g., Bornstein & Lamb, 1992; Cairns, 1986), perhaps because their subject matter involves dynamic changes over time that have different trajectories in different individuals. Among these scholars, Valsiner (1986) has noted both the double standard prevailing in psychology— purported deep interest in individuals combined with the constant practice of averaging across individuals—and has cogently explained the dangers of this. He wrote, In psychological discourse (both scientific and applied), the individual subject is constantly being given high relevance. In contrast, the individual case is usually forgotten in the practice of psychological research because it is being replaced by samples of subjects that are assumed to represent some general population. Overwhelmingly, psychologists study samples of subjects and often proceed to generalise their findings to the ideal (generic, i.e., average or prototypical) abstract individual. Furthermore, characteristics of that abstracted individual may easily become attributed to particular, concrete individuals with whom [they] work and interact. The inductive inference from samples of concrete subjects to the abstract individual and from it (now already deductively) back to the multitude of concrete human beings is guided by a number of implicit assumptions . . . that obscure insight into the science and hamper its applications. (p. 2) The inductive inference from the sample to the population is consistent with, and indeed mandated

by, the underlying statistical model because hypotheses are about populations, not samples or individuals (Hays, 1963, p. 248), and yield useful knowledge when it is the population that is of concern. As Valsiner (1986) noted, however, major difficulties arise with the (largely unacknowledged) deductive steps by which psychologists get back from the population to the individual, as they must do in applied work. The problem, another version of the uniformity myth identified by Kiesler (1966), is that the induction from the sample to the population is assumed rather than demonstrated to have generality to any individual case, but the assumption is often false unless the population mean has very small variance (Rorer, 1991), which impairs the validity of the subsequent deductions. Without this deductive step, however, individual application of group research–based scientific knowledge is not possible. Was There Any Alternative to the Standard Model? I turn now to the question of whether there were, and are, alternatives to the adoption of the standard model for research and its incorporation in the scientist-practitioner model. As noted earlier, Witmer’s conceptualization of both clinical science and clinical practice was focused on the individual, and an emphasis on the individual was evident in the early definitions of clinical psychology (Reisman, 1966, 1991; see the section Clinical Psychology and the Rise of the Scientist-Practitioner Ideal). Moreover, during the time of both the early adoption of statistical methods and the development of clinical psychology, Gordon Allport emphasized the need for psychology to be both nomothetic (concerned with general laws) and ideographic (the study of unique individuals), noting, “The application of knowledge is always to the single case” (Allport, 1942, p. 58; also see Molenaar, 2004). For the most part, however, these influences did not withstand the power of the inference revolution (Barlow & Nock, 2009). There was an exception. Since the 1930s, B. F. Skinner and the science of behavior analysis he founded have sustained trenchant criticism of 183

Neville M. Blampied

averaging across cases and rejection of NHST while maintaining a commitment to experimentation and quantification (Gigerenzer et al., 1989; Hersen & Barlow, 1976). Skinner maintained an unwavering conviction that psychology should be a science of individual behavior—“Individual prediction is of tremendous importance, so long as the organism is to be treated scientifically as a lawful system” (Skinner, 1938, p. 444)—and eschewed averaging across cases. In his magisterial exposition of behavior-analytic research design principles, Sidman (1960) noted, Reproducible group data describe some kind of order in the universe, and as such may well form the basis of a science. It cannot, however, be a science of individual behavior except of the crudest sort. And it is not a science of group behavior in the sense that the term “group” is employed by the social psychologist. It is a science of averaged behavior of individuals who are linked together only by the averaging process itself. Where this science fits in the scheme of natural phenomena is a matter for conjecture. My own feeling is that it belongs to the actuarial statistician, and not to the investigator of behavioral processes. (pp. 274–275). Skinner (1938) and Sidman (1960) also maintained that any natural behavioral process must be demonstrable in the behavior of an individual (Sidman, 1960) and that group averaging risked creating synthetic rather than real phenomena. The principles and research practices developed by Skinner and other behavior analysts have clearly demonstrated the possibility of both a basic and an applied science of individual behavior (e.g., Cooper, Heron, & Heward, 2007; Mazur, 2009) without the use of group-average data or statistical inference procedures in the single-subject–single-case research tradition (many other terms have been used, including N = 1 and time-series designs; see Hayes, 1981). Use of the word case signals that the thing studied need not be a single individual but might be a group entity of some kind, such as a 184

f amily, class, work group, organization, or community (Blampied, 1999; Valsiner, 1986). Development of Applied Single-Case Research Designs The experimental research designs that Skinner and his colleagues and students developed for their experimental laboratory investigations and that were systematized in Sidman (1960) are termed intrasubject replication designs. They were strongly influenced by the concept of the steady state in physical chemistry and experimental physiology (Bernard, 1865/1957; Skinner, 1956; Thompson, 1984). To understand the influence of environmental variables on behavior, the performance of an individual subject was measured repeatedly in a standardized apparatus until performance was judged to be stable. If variability persisted, the sources of this variability in the environment were investigated so that it could be minimized. Sources of variability were thus to be understood and experimentally reduced rather than statistically controlled (Sidman, 1960), thus generating an experimental phase called the baseline. Once baseline stability was achieved, an independent variable could be introduced and behavior observed until stability returned. Direct visual comparisons, by way of graphs, of the performance in the baseline and experimental conditions permitted the detection and quantification of the experimental effect. Replications (S. Schmidt, 2009) by way of return to baseline and subsequent reinstatement of the experimental variable demonstrated the reliability of the experimental control achieved and were the basis of inferences about the independent variable being the agent of the change. Further replications were performed with the same or a small number of other subjects in the same experiment to strengthen conclusions (see Chapter 7, this volume). Systematic replications of the procedure using parametric variations of the independent variable permitted a functional analysis of the relation between the independent variable and the target behavior (Sidman, 1960), further strengthening causal inferences. Using these procedures, important processes in behavior are revealed in a continuous, orderly, and

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology

reproducible fashion. Concepts and laws derived from such data are immediately applicable to the behavior of the individual, and they should permit us to move on to the interpretation of behavior in the world at large with the greatest possible speed. (Skinner, 1953, p. 78) Beginning in the early 1950s, Skinner, assisted by Lindsley, extended research in operant behavior from investigations of animals such as rats and pigeons to that of adult humans, particularly those hospitalized with a diagnosis of schizophrenia (Lindsley & Skinner, 1954; Skinner, 1954). Other researchers, notably Bijou (Kazdin, 1978; Morris, 2008), began to deploy similar procedures to research and assess the behavior of children. As Kazdin (1978) noted, this research had an increasingly applied focus that moved it from basic research to evaluation, and then inexorably toward intervention, using basic techniques such as shaping, reinforcement, and extinction to change symptomatic and problem behavior (Kazdin, 1978). This focus generated the field now called applied behavior analysis (Baer, Wolf, & Risley, 1968). O’Donnell’s (1985) observation that “nearly every facet of applied work originally derived from . . . interest in the child” (p. 237) was as true of the emergence of applied behavior analysis as it had been of Witmer’s clinical psychology. In moving from basic operant research to clinical applications, behavior analysis adapted the intrasubject research designs used in the laboratory. Early studies often reported data in cumulative graph form (see Ullmann & Krasner, 1966, and Ulrich, Stachnik, & Mabry, 1966, for examples), but by the early 1960s reversal designs were being presented in what is now the conventional form (e.g., Ayllon & Haughton, 1964). Problems with reversal designs led to the development of other applied designs such as the multiple baseline (Baer et al., 1968; Kazdin, 1978). All these designs involve time-series data from one or a few participants, and all involve experimental phases designated baseline and intervention. They differ in the way in which replication is used within and across participants to establish the reliability of the behavior changes detected by

comparing baseline and intervention phases and to make causal inferences. Hersen and Barlow (1976) published the first book-length systematic exposition of these applied single-case designs, and only a little innovation, although much refinement, has occurred since (McDougall, 2005). Several things are notable about this sequence of events. First, the development of experimental single-case research occurred at almost the same time as the widespread adoption of Fisherian statistics in psychology in the immediate postwar years and was explicitly regarded by its protagonists as an alternative to the Fisherian tradition (Gigerenzer et al., 1989; Skinner, 1956). Although the first textbook for teaching Fisherian statistics to psychologists appeared in 1940 (Lindquist, 1940), most universities offering graduate degrees did not begin systematically teaching statistics to students until after World War II (Gigerenzer et al., 1989; Rucci & Tweney, 1980), by which time Skinner’s (1938) first book had appeared, single-case research was being published (albeit not as prolifically as it was after the founding of the Journal of the Experimental Analysis of Behavior in 1958), and Sidman’s (1960) Tactics of Scientific Research was only a decade away. The opportunity then was for the scientist part of the scientist-practitioner ideal to be based on singlecase research rather than, or as well as (Jones, 1978), Fisherian inferential statistics. This alternative research design tradition has, however, had negligible influence on psychological research as a whole or on applied professional psychology in particular, and it has never been part of the graduate curriculum (Aiken et al., 1990, 2008; Rossen & Oakland, 2008). Why single-case research designs had such little influence on mainstream psychology and how science and practice might be different had they had more impact are interesting questions for historians and sociologists of science. Gigerenzer et al. (1989) suggested that several factors led to the dominance of Fisher’s ideas in psychology in the critical period. Many leading North American experimentalists were taught by Fisher himself, and Fisher’s influence was enhanced by his holding a visiting professorship in the United States in the 1930s. The relatively small size of the community of experimental psychologists assisted the rapid dissemination of the new statistics, 185

Neville M. Blampied

a process facilitated by the publication of information about them in textbooks and influential journals (Rucci & Tweney, 1980). This process was enhanced when editors of important journals, such as the Journal of Experimental Psychology, made the use of inferential statistics a requirement for acceptance of manuscripts, which happened from about 1950 onward (Gigerenzer et al., 1989). Fisher’s writings were in English, and much of the critical scholarly work about statistics and research design was published in French and German (Gigerenzer et al., 1989) and was, therefore, less accessible to Englishspeaking workers, although Bernard’s work was available in English translation by 1949. It is also true that Skinner’s ideas about experimentation and inference, although broadly contemporaneous, were not as well developed or as effectively disseminated at the critical juncture, so by an initially narrow lead, Fisherian ideas had the opportunity to become dominant in psychology. Second, in contrast to the inference revolution in psychology, in which the key ideas about research design and analysis were imported from other fields and then applied to the subject matter of the discipline, applied behavior analysis and applied singlecase research designs grew organically and directly from the underpinning experimental research in the field, both with respect to the subject matter studied (the effects of the environment on the behavior of individuals) and the methods used, which is surely a significant virtue when one considers what science a scientist-practitioner may need by way of both the content and the methods of science and how singlecase research designs may help resolve problems in the field. What Single-Case Research Can Offer the Scientist-Practitioner In their initial exposition of applied single-case designs, Hersen and Barlow (1976) stated that singlecase research was highly relevant to the scientistpractitioner ideal and that it provided an alternative to the prevailing standard model of research. They drew on the work of Bergin and Strupp (1970, 1972), who claimed that among psychotherapy researchers, “there is a growing disaffection from traditional 186

experimental designs and statistical procedures which are held inappropriate to the subject matter under study” (Bergin & Strupp, 1970, p. 25). The evidence presented by Bergin and Strupp (1970, 1972) provided an opportunity for psychology in general, and clinical psychology in particular, to reevaluate its commitment to the standard model of research, but this did not happen other than within the single-case research tradition. Hersen and Barlow agreed that the scientist-practitioner ideal had largely failed in practice because of a growing split between science and practice and that the split was substantively, but not exclusively, the result of the mismatch between the realities of clinical practice and what scientific research methods based on the standard model could inherently accomplish. Instead of the standard Fisherian model of research, Hersen and Barlow suggested that single-case designs were much better suited to the needs of clinical research, especially the evaluation of interventions. The argument developed by Hersen and Barlow (1976) has been further extended by Barlow et al. (1984); Hayes, Barlow, and Nelson-Gray (1999); and Barlow, Nock, and Hersen (2009) and endorsed by others (e.g., Blampied, 1999, 2000, 2001; Hayes, 1981; Kazdin, 2008; Morgan & Morgan, 2001, 2009). If one poses the question “What benefits does single-case research offer the scientist-practitioner?” one can find answers by considering how single-case research obviates some of the major problems inherent in reliance on the standard model, and second, from considering further positive contributions that single-case research can make to applied science.

Single-Case Research Is an Alternative to NHST-Based Research The fact that most psychologists past and present have ignored the enduring, cogent, and powerful criticisms of the standard NHST model of psychological research is not evidence that these criticisms do not matter (Harlow, Mulaik, & Steiger, 1997; Nickerson, 2000; F. L. Schmidt & Hunter, 1997). Even the defenders of NHST and related practices have acknowledged that the criticisms have substantive merit (e.g., Cortina & Dunlap, 1997; Frick, 1996; Hagen, 1997; Harlow et al., 1997; Wainer, 1999), and proponents of reform such as the Task

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology

Force on Statistical Inference (Wilkinson & Task Force on Statistical Inference, 1999) have sought changes in research practice, data analysis, and reporting that acknowledge much of the criticism. Unfortunately, as noted earlier, the practice of research has been glacially slow in adopting these reforms. This slowness may, in part, be because one effect of the proposed reforms is to continue to embed group statistical procedures and NHST, with all of their inchoate aspects (Gigerenzer, 1991; Hubbard, 2004), into the heart of the research enterprise while requiring that extensive additional complex procedures be added to the research toolkit. This is demanding enough for researchers, and it is even more so for practitioners. Given that single-case research principles and practice resonate with many of the criticisms of the standard model, for instance in the emphasis on graphical techniques, replication, and the rejection of NHST-based inference (Sidman, 1960), it is odd that few if any of the critics or reformers have considered the potential of single-case research as an alternative approach (Blampied, 1999, 2000). Indeed, the task force did not consider single-case research until it replied to comments on its report (American Psychological Association Board of Scientific Affairs, 2000), and even then the response was brief and unenthusiastic. Notwithstanding, single-case research designs remain a complete, quantitative, coherent, internally consistent, and proven alternative to the standard model of experimentation.

Single-Case Research Is an Effective Alternative to Group-Based Research As suggested earlier, psychology’s commitment to seeking ideal types by way of averaging over individuals is as deep rooted as its commitment to NHST-based inference. Yet this commitment poses profound difficulties, both conceptual and practical, for applied science, as Valsiner (1986) and others have so cogently noted. The conceptual difficulties (discussed earlier) relate to how abstract principles and hypotheses stated at the level of the population are to be properly applied to the highly variable individual cases with which applied scientists must deal. The practical difficulties arise from the demand

for large numbers of participants (Streiner, 2006). As the number of participants in a research project multiplies, so too do the cost, complexity, and time required to complete the research (a point made by Bergin & Strupp, 1972). Multisite, multiinvestigator, multimillion-dollar randomized controlled trial research designs are now the gold standard for psychotherapy research (e.g., Beutler & Crago, 1991; Wessley, 2001). Participation in such research is largely out of the question for most scientistpractitioners (indeed, it is out of reach of most researchers), so it is hardly surprising that they do not actively produce research in their practice (Barlow et al., 1984; Hayes et al., 1999). Single-case research, in contrast, done by individual practitioners with individual cases, can make useful contributions to science.

Single-Case Research Avoids Confusing Clinical With Statistical Significance As many critics have noted, the overwhelming focus on NHST p values has seriously distorted psychological research in general and clinical research in particular (e.g., Cohen, 1990, 1994; Dar et al., 1994; Fidler et al., 2005; Hubbard & Lindsay, 2008; Meehl, 1978). The obsession with statistical significance has distorted judgment about research findings and led to persistent confusion between statistical and clinical significance, fundamentally because p is a function of sample size and, with a sufficiently large sample, infinitesimally small group-mean differences may be statistically significant (Hubbard & Lindsay, 2008). Clinical significance, in contrast, is determined at the individual level, and rather than typically being achieved by small degrees of change, requires substantive alleviation of distress, restoration of function, and attainment of quality-of-life goals (Blampied, 1999; Jacobson & Truax, 1991). Clinically, psychologists need to know that interventions have produced effective change, but null-hypothesis–based statistics do not reliably indicate this. Although techniques such as the use of effect size estimates and meta- analysis of group research (e.g., Bornstein, 1998; Kline, 2004) are an improvement over p values, large effect sizes may be associated with little clinical change (Hayes et al., 1999). Techniques to compute 187

Neville M. Blampied

the magnitude of clinically significant change (e.g., Jacobson & Truax, 1991) require analysis at the level of the individual rather than the group, suggesting that determining clinical significance is an intractable problem for group research. Furthermore, contributing to the group mean may be individuals whose scores indicate large-magnitude change, those whose scores have changed only slightly, and those whose scores have deteriorated (Hersen & Barlow, 1976). For clinical work to progress, it is essential to know about changes that are more than negligible in a positive way but that are still less than clinically significant and about deterioration; otherwise, it is impossible to engage in systematic improvement of therapies or to move toward the oft-stated goal of matching therapies to clients (Hayes et al., 1999; Kazdin, 2008). Concealing a range of therapy outcomes within a small but statistically significant mean effect does not assist with this objective. Because single-case research does not amalgamate case data but keeps the data separate and identifiable throughout the research and because it has no equivalent to the ubiquitous p value, it cannot mistake statistical for clinical significance (although it can either over- or understate the import of its findings). It can relate the magnitude of change or lack of change to each individual participant relative to that participant’s own baseline. It heeds the admonition by Cohen (1994) that there is “no objective, mechanical ritual” (p. 1001) for determining the importance of research findings and substitutes the scientist-practitioner’s trained, professional judgment as to the importance of the outcomes achieved in the research (Hammond, 1996). That the key data are presented graphically so that the changes achieved have to be obvious to visual inspection provides some assurance that only substantive changes will be reported and permits all other viewers to apply their judgment, thus subjecting conclusions to ongoing peer review and continuous revision as understanding of clinical significance changes with time and context.

Single-Case Research Enhances Ethics and Accountability One of the persisting ethical problems with standard group research protocols in which interventions are 188

being assessed is the necessity of assigning sometimes large numbers of individuals to control conditions, which are, by definition, not expected to produce much in the way of therapeutic effect. At the same time, those in the treatment group are exposed to whatever risks are entailed in the novel treatment (Hersen & Barlow, 1976). Single-case research does not eliminate these risks, because it imposes baseline and treatment phases that are the equivalent of control and experimental conditions. It does so, however, with one or a few participants, and so the effects of the treatment can be assessed while neither withholding treatment from a large number of individuals nor exposing a substantial number of individuals to whatever risks are inherent in the research. For this reason, at the very least, ethical principles should insist that a novel therapy be shown to be effective in a series of well-conducted single-case studies before any large-scale randomized controlled trials are embarked on. In contrast, the general practice is typically to go from uncontrolled case studies to randomized controlled trials in one step. Note also that wait-list control designs can be re-formed as multiple-baseline-across-cases designs, with a consequent reduction in waiting time for therapy for most wait-listed cases and increases in internal validity (see Chapter 5, this volume, for a more comprehensive consideration of the multiple-baseline design). Moreover, Kratochwill and Levin (2010) have made recommendations about how single-case designs might be modified by the inclusion of randomization so as to make them more credible within the wider community of applied researchers. Institutional, policy, and political factors have all long been recognized to strongly influence the conduct of research (e.g., Bergin & Strupp, 1970, 1972), often to the detriment of innovation and conceptual advances, especially through decisions about what research to fund, publish, and incorporate in policy and application (Rozin, 2009). Changes in research practice that would lead to the wider use and dissemination of single-case research within applied and clinical science will require attention to these institutional and political factors. The powerful contingencies residing in research funding, particularly involving the assessment of the scientific merit of

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology

proposals incorporating single-case designs, will have to change. These changes clearly cannot be accomplished by individuals acting alone. They may require concerted action by larger groups representing both scientists and practitioners in advocating changes to institutions, funding agencies, and government. Perhaps it is time to consider another task force, sponsored jointly by the American Psychological Association, the Association for Psychological Science, and the Association for Behavior Analysis International to address these inherently political issues and to advocate for reform both within and beyond the discipline (Orlitzky, 2011). A further strength of single-case research is that it enhances accountability in research and, even more important, in practice (Acierno, Hersen, & Van Hasselt, 1996). The data for each individual participant can be examined by whoever wants to judge the outcomes claimed to have been achieved. Nobody, including any participants, critics, ethical review bodies, insurers, and family members, has to rely on understanding the possibly abstruse complexities of statistical analyses to understand what the treatment achieved (Blampied, 1999). This strength does not remove the possibility of conflict and disagreement, but it does change the character of any such debate, with transparency, openness, and comprehensibility likely to be of long-term benefit to all who are concerned about the ethics of research and practice, clients and practitioners above all (F. L. Newman & Tejeda, 1996). Wolf’s (1978) seminal work on the social validity of behavior analysis and the assessment of the acceptability of interventions to multiple constituencies, including clients, families, and other interested parties, has further extended the ways in which accountability can be achieved in both research and practice.

Single-Case Research Enhances Innovation and the Exploration of Generality Implicit in the preceding argument is the utility of single-case research in the early phases of therapy innovation. Because of its small scale, it can be used to evaluate novel ideas and new procedures with low cost and little risk. This style of research is adaptable, and protocols can be changed as developments in the research warrant (Hayes, 1981). If

treatments fail in any case, that failure is detected, and new approaches can be tried. In addition to being ideally adapted to the initial phases of research and development, single-case research is ideally suited for what is increasingly being called translational research—the systematic, collaborative pursuit of applications of science linked to basic research (Lerman, 2003; Thompson & Hackenberg, 2009). Ironically, Olatunji, Feldner, Witte, and Sorrell (2004), in discussing the link between the scientist-practitioner model and translational research, recommended that even more rigorous training in the standard model of research is necessary before the scientist-practitioner model can embrace translational research. To the contrary, I would assert that training in single-case research is needed before translational research can be widely undertaken, because much translational research, as with other applied research, needs to be applied at the individual rather than the population level. There is no point in having translational research recapitulate the errors of the past through slavish adherence to the standard model. Single-case research designs also have an underappreciated contribution to make to the establishment of generality of treatment. Consider the situation that prevails after a successful randomized trial of some new therapy. As has been widely noted (e.g., Kazdin, 2008; Seligman, 1995), the very rigor of this research protocol, requiring as it does highly selected participants with a single, carefully established diagnosis, highly trained and monitored therapists normally working from a manual, and atypical therapy settings (e.g., university research clinics), limits the generality of any findings. Indeed, such research is now termed efficacy research to distinguish it from effectiveness research—the research that establishes the utility of the therapy in typical therapy settings with typical therapists and typical clients, now also referred to as research into clinical utility (Howard, Moras, Brill, Martinovich, & Lutz, 1996). Although Chambless and Hollon (1998) acknowledged the role that single-case research might play in establishing the efficacy of psychotherapy, the focus of efficacy and effectiveness research has been almost exclusively on randomized trials (e.g., Kazdin, 2008; Kratochwill & Levin, 2010). 189

Neville M. Blampied

Using group-based randomized trials and systematic exploration of every dimension along which the participants, the therapy, the therapists, and the therapy context might be varied in the search for generality would entail a lifetime of research and a fortune just to establish the domain of general effectiveness of a single therapy. Staines (2008) argued that deficiencies in the way in which most standard psychological research is conducted severely limit the generality of its findings because of what he terms the generalization paradox, and he recommended the use of multiple studies and multiple methods as ways out of the paradox. He did not explicitly recommend the use of single-case research but might well have done so. Although still a potentially large undertaking, single-case research can be used to map the generality of any established therapy (see Drabman, Hammer, & Rosenbaum, 1979, for the initial idea of the generalization map). It does this through replication (see Chapter 7, this volume). Initially, direct replication, in which the attributes of the cases, therapy, and context are kept constant, is used to establish the reliability of the research findings. Generality can then be explored by systematic replication (Hayes et al., 1999; Sidman, 1960), in which various attributes of the cases (e.g., age, gender, symptom severity), of the treatment (e.g., number of sessions), and of the context of intervention (e.g., homes, classrooms) are systematically varied. As these dimensions are explored, the initial treatment outcome may be maintained, enhanced, or diminish or even disappear. Additional clinical replication, combining multicomponent treatment programs and clinically diverse clients and settings, further extends the evidence for generality (Barlow et al., 1984, 2009). As the generalization space is mapped in this way, the therapy can be adjusted for maximally effective combinations of participant, procedure, and context. Replicated failures are also important because they mark possible boundary conditions and stimulate new innovations and a new cycle of research. Note that for this reason, the reporting of such failures is to be encouraged, in contrast with the practice in the standard research tradition of not reporting failures to reject H0. 190

Single-Case Research Resolves the Double Standard Around Psychology’s Focus on the Individual If one believes, as the founders of clinical and applied psychology clearly did, that “the individual is of paramount importance in the clinical science of human behavior change” (Hersen & Barlow, 1976, p. 1), then single-case research is essential because it delivers a scientific understanding of phenomena at the individual level and removes the need for the inductive–deductive contortions exposed by Valsiner (1986; see the section Some Consequences of This History). It can, in practice, remove the distinction between the scientist and the practitioner; science and practice become one coherent enterprise. This grounding of applied clinical science in the individual—both the individual client and the individual practitioner—is probably the most important contribution of single-case research to the scientistpractitioner model, because it gives the ideal model real substance (Barlow et al., 1984, 2009; Blampied, 1999, 2001; Hayes et al., 1999; Hersen & Barlow, 1976). “Repetition is the habit of nature, but variation is the rule of life” (Gopnik, 2009, p. 197), and whether the cases dealt with by a scientist- practitioner are individuals, families, or social groups or are larger and more complex entities such as classes, work groups, and organizations or even health systems (Hayes et al., 1999; Morgan & Morgan, 2009), each case will be inherently unique. Yet, no matter how exceptional and singular a particular case may be, single-case research designs permit a scientific analysis to be done—a claim that cannot be made about the standard model. This is why there is such goodness of fit between the scientistpractitioner model and single-case research and why the failure to recognize the synergies between them has been a seriously consequential missed opportunity for the discipline. Conclusion I can confidently conclude that the goodness of fit between single-case research principles and practice and the scientist-practitioner model of applied practice is multifaceted, substantial, and potentially

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology

highly beneficial, although to achieve these benefits will require many changes to the way psychologists teach students, fund, analyze and publish research, and practice psychology (Blampied, 2001; Rozin, 2009). Historically, this alliance could have happened, but it did not, which can be seen as a very considerable misfortune for the discipline and its aspirations to enhance human welfare. Instead, the scientist-practitioner ideal has been distorted, distressed, and thwarted by attempting to pursue science with a scientific methodology inappropriate to the purpose. History is the story of what has happened, not what might have happened or what one wishes or thinks should have happened. The generation and adoption of the scientist-practitioner model for clinical and applied psychology is deservedly seen as an inspired and noble choice. It continues as an inspiring ideal to the present day. That the conceptualization of what the science part of the scientistpractitioner duality was thought to be should have been captured by the view of science prevailing in the wider discipline was probably inevitable. What is much more regrettable is how little consideration the discipline’s forebears gave to the aptness of their choice of scientific method, especially in the light of Bergin and Strupp’s (1970, 1972) research. Equally as regrettable is how they, and we, have persistently ignored criticism of the chosen method and how fervent, enduring, and exclusive adherence to the method has been (Ziliak & McCloskey, 2008). Also regrettable has been the continuing blindness of those entrusted with the scientist-practitioner ideal to the mismatch between the methods of the adopted science and the needs of practice, despite the warning of prophets such as Bergin and Strupp and Meehl and despite the existence of an alternative in the work of Skinner and those who developed and practiced applied single-case research. But it is not too late to change. I’ve said this before (quoting Stricker, 2000; see Blampied, 2001), but I will say it again: Gandhi, once asked what he thought about Western civilization, replied that it was a good idea and that somebody should try it sometime. Even more so, an alliance between the scientist-practitioner model of clinical practice and single-case research is a

very good idea, and it should be adopted now, for “scientist-practitioners [will] have no difficulty finding interesting work in [the] future. They are trained to solve behavioral problems, and the world promises to provide no shortage of those” (Benjamin, 2005, p. 27). Appendix 8.1: Terminology and Key Aspects of Null-Hypothesis Statistical Testing For any set of numerical data, one can compute measures of central tendency (medians, means, etc.) and of variation (standard deviations, etc.) as well as other measures such as correlations. These measures are called descriptive statistics. If one takes the view that the individuals who provided these data are a sample from some population of interest (and one does not necessarily have to take this view), then one can regard these descriptive statistics as estimators of the corresponding “true” population scores and can compute further statistics, such as the standard error of the mean and a confidence interval, that tell one how good an approximation these estimations are. Going a step further, if one has two or more data sets and wants to know whether they represent samples from the same population or from different populations, one uses inferential statistics. As noted in the chapter’s introduction, thanks to the work of Fisher and others, the core of such inferential statistics are null-hypothesis significance tests (NHST), of which Student’s t is the prototype. The null hypothesis (termed H0) is normally (in psychology) a nil hypothesis, that is, the samples are from the same population and therefore the true mean difference is zero (nil), and is so to an infinite number of decimal places. NHST assumes H0 to be true, and computes a probability (p), which is the theoretical probability that had samples of the same size as those observed been drawn from the same population, the test statistic (e.g., the value of t) would be as large (or larger) as it is observed to be. For Fisher, the p value was one important piece of evidence, to be considered along with other evidence, in the judgment made by the experimenter as to the implausibility of the observations assuming H0. If the p value suggests that the data are rare 191

Neville M. Blampied

(under H0), this constitutes inductive evidence against H0 and H0 can be rejected, and the smaller the value of p (other things being equal), the stronger this inductive evidence (cf. Wagenmakers, 2007). Fisher came to accept that p < .05 was a convenient (but not sanctified) value for experimenters to use in making such judgments in single experiments, but he also agreed that facts were best established by multiple experiments that permitted consistent rejection of H0 (Wright, 2009). The alternative Neyman–Pearson paradigm is termed hypothesis testing because it involved the identification of two hypotheses, H0 and an alternative hypothesis, HA. Fisher bitterly contested this viewpoint and never accepted the alternative paradigm, but it is the version that has become dominant in psychology, generally without reference to the initial controversy. This paradigm postulates both H0 and HA. HA is generally an assertion that some known factor or treatment was responsible for the observed mean difference. In this paradigm, if the obtained value of p is smaller than some long-run, prespecified error rate criterion (e.g., p < .05), called alpha, then H0 may be rejected, a result that is said to be statistically significant at the alpha level, and one may accept HA. With two hypotheses available, two errors may be made. One may reject H0 when it is true (Type I error) or one may fail to reject H0 and accept HA when H0 is false (Type II error), the probability of which error is beta. The power of a test to accept HA given that it is true or, better stated, the power to detect an experimental effect, is 1 − β. Neyman (1950) argued that control of Type I errors was more important than control of Type II errors, hence the emphasis in psychology on alpha level and continuing indifference to the power of studies (Cohen, 1994; Sedlmeier & Gigerenzer, 1989). The Neyman–Pearson paradigm does not permit inductive inference about hypotheses to be made. Rather, it permits inductive behavior, that is, the making of decisions on the basis of evidence from statistical tests, these decisions being akin to those made in industrial quality-control contexts to control the long-run rate of production of defective products (Hubbard, 2004; Hubbard & Lindsay, 2008; see also Nickerson, 2000, and Wagenmakers, 2007). 192

References Acierno, R., Hersen, M., & van Hasselt, V. B. (1996). Accountability in psychological treatment. In V. B. van Hasselt & M. Hersen (Eds.), Sourcebook of psychological treatment manuals for adult disorders (pp. 3–20). New York, NY: Plenum Press. Aiken, L. S., West, S. G., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology in psychology: Replication and extension of Aiken, West, Sechrest, and Reno’s (1990) survey of PhD programs in North America. American Psychologist, 63, 32–50. doi:10.1037/0003-066X.63.1.32 Aiken, L. S., West, S. G., Sechrest, L., & Reno, R. R. (1990). Graduate training in statistics methodology and measurement in psychology. American Psychologist, 45, 721–734. doi:10.1037/0003-066X.45.6.721 Albee, G. W. (2000). The Boulder model’s fatal flaw. American Psychologist, 55, 247–248. doi:10.1037/ 0003-066X.55.2.247 Allport, G. W. (1942). The use of personal documents in psychological science. New York, NY: Social Science Research Council. American Psychological Association Board of Scientific Affairs, Task Force on Statistical Inference. (2000). Narrow and shallow. American Psychologist, 55, 965–966. doi:10.1037/0003-066X.55.8.965 Ayllon, T., & Haughton, E. (1964). Modification of symptomatic verbal behavior of mental patients. Behaviour Research and Therapy, 2, 87–97. doi:10.1016/00057967(64)90001-4 Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91–97. doi:10.1901/jaba.1968.1-91 Baker, B. B., & Benjamin, L. T. (2000). The affirmation of the scientist-practitioner: A look back at Boulder. American Psychologist, 55, 241–247. doi:10.1037/0003-066X.55.2.241 Baker, T. B., McFall, R. M., & Shoham, V. (2008). Current status and future prospects of clinical psychology: Toward a scientifically principled approach to mental and behavioral health care. Psychological Science in the Public Interest, 9, 67–103. Balaram, P. (2008). Science, invention, and Pasteur’s quadrant. Current Science, 94, 961–962. Balluerka, N., Gomez, J., & Hidalgo, D. (2005). The controversy over null hypothesis significance testing revisited. Methodology: European Journal of Research Methods for the Behavioural and Social Sciences, 1, 55–70. doi:10.1027/1614-1881.1.2.55 Barlow, D. H., Hayes, S. C., & Nelson, R. O. (1984). The scientist practitioner: Research and accountability in educational settings. New York, NY: Pergamon Press.

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology

Barlow, D. H., & Nock, M. K. (2009). Why can’t we be more idiographic in our research? Perspectives on Psychological Science, 4, 19–21. doi:10.1111/j.17456924.2009.01088.x Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Singlecase experimental designs: Strategies for studying behavior change (3rd ed.). Boston, MA: Pearson. Belar, C. D., & Perry, N. W. (1992). The National Conference on Scientist-Practitioner Education and Training for the Professional Practice of Psychology. American Psychologist, 47, 71–75. doi:10.1037/0003066X.47.1.71 Benjamin, L. T. (2005). A history of clinical psychology as a profession in America (and a glimpse of its future). Annual Review of Clinical Psychology, 1, 1–30. doi:10.1146/annurev.clinpsy.1.102803.143758 Bergin, A. E., & Strupp, H. H. (1970). New directions in psychotherapy research. Journal of Abnormal Psychology, 76, 13–26. doi:10.1037/h0029634 Bergin, A. E., & Strupp, H. H. (1972). Changing frontiers in the science of psychotherapy. New York, NY: Aldine. Bernard, C. (1957). An introduction to the study of experimental medicine (H. C. Green, Trans.). New York, NY: Dover. (Original work published 1865) Beutler, L. E., & Crago, M. (Eds.). (1991). Psychotherapy research: An international review of programmatic studies. Washington, DC: American Psychological Association. doi:10.1037/10092-000 Blaich, C. F., & Barreto, H. (2001). Typological thinking, statistical significance, and the methodological divergence of experimental psychology and economics. Behavioral and Brain Sciences, 24, 405.

Bornstein, M. H., & Lamb, M. E. (1992). Development in infancy: An introduction (3rd ed.). New York, NY: McGraw-Hill. Cairns, R. B. (1986). Phenomena lost: Issues in the study of development. In J. Valsiner (Ed.), The individual subject and scientific psychology (pp. 97–111). New York, NY: Plenum Press. Chambless, D. L., & Hollon, S. D. (1998). Defining empirically supported therapies. Journal of Consulting and Clinical Psychology, 66, 7–18. doi:10.1037/0022006X.66.1.7 Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145–153. doi:10.1037/ h0045186 Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304–1312. doi:10.1037/0003-066X. 45.12.1304 Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003. doi:10.1037/0003-066X. 49.12.997 Committee on Training in Clinical Psychology. (1947). Recommended graduate training program in clinical psychology. American Psychologist, 2, 539–558. doi:10.1037/h0058236 Cooper, J. O., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis (2nd ed.). Upper Saddle River, NJ: Pearson. Corrie, S., & Callahan, M. M. (2000). A review of the scientist-practitioner model: Reflections on its potential contribution to counselling psychology within the context of current health care trends. British Journal of Medical Psychology, 73, 413–427. doi:10.1348/000711200160507

Blampied, N. M. (1999). A legacy neglected: Restating the case for single-case research in cognitive-behaviour therapy. Behaviour Change, 16, 89–104. doi:10.1375/ bech.16.2.89

Cortina, J. M., & Dunlap, W. P. (1997). On the logic and purpose of significance testing. Psychological Methods, 2, 161–172. doi:10.1037/1082-989X. 2.2.161

Blampied, N. M. (2000). Comment: Single-case research designs: A neglected alternative. American Psychologist, 55, 960. doi:10.1037/0003-066X.55.8.960

Dar, R., Serlin, R. C., & Omer, H. (1994). Misuse of statistical tests in three decades of psychotherapy research. Journal of Consulting and Clinical Psychology, 62, 75–82. doi:10.1037/0022-006X.62.1.75

Blampied, N. M. (2001). The third way: Single-case research, training, and practice in clinical psychology. Australian Psychologist, 36, 157–163. doi:10.1080/00050060108259648 Bootzin, R. R. (2007). Psychological clinical science: Why and how we got to where we are. In T. R. Treat, R. R. Bootzin, & T. B. Baker (Eds.), Psychological clinical science (pp. 1–28). New York, NY: Taylor & Francis. Bornstein, M. (1998). The shift from significance testing to effect size estimation. In N. R. Schooler (Ed.), Comprehensive clinical psychology: Vol. 3. Research and methods (pp. 313–349). Amsterdam, the Netherlands: Elsevier.

Dehue, T. (2000). From deception trials to control reagents: The introduction of the control group about a century ago. American Psychologist, 55, 264–268. doi:10.1037/0003-066X.55.2.264 Drabman, R. S., Hammer, D., & Rosenbaum, M. S. (1979). Assessing generalization in behavior modification with children: The generalization map. Behavioral Assessment, 1, 203–219. Edwards, R. (1987). Implementing the scientistpractitioner model: The school psychologist as databased problem solver. Professional School Psychology, 2, 155–161. doi:10.1037/h0090541 193

Neville M. Blampied

Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods: An easy way to maximize the accuracy and power of your results. American Psychologist, 63, 591–601. doi:10.1037/0003-066X. 63.7.591 Fidler, F., Cumming, G., Thomason, N., Pannuzzo, D., Smith, J., Fyffe, P., . . . Schmitt, R. (2005). Toward improved statistical reporting in the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 73, 136–143. doi:10.1037/0022-006X.73.1.136 Fisher, R. A. (1925). Statistical methods for research workers. London, England: Oliver & Boyd. Fisher, R. A. (1935). The design of experiments. London, England: Oliver & Boyd. Frank, G. (1984). The Boulder model: History, rationale, and critique. Professional Psychology: Research and Practice, 15, 417–435. doi:10.1037/0735-7028. 15.3.417 Frick, R. W. (1996). The appropriate use of null hypothesis testing. Psychological Methods, 1, 379–390. doi:10.1037/1082-989X.1.4.379 Gigerenzer, G. (1991). From tools to theories: A heuristic of discovery in cognitive psychology. Psychological Review, 98, 254–267. doi:10.1037/0033-295X. 98.2.254 Gigerenzer, G., Swijtink, Z., Porter, T., Datson, L., Beatty, J., & Kruger, L. (1989). The empire of chance: How probability changed science and everyday life. Cambridge, England: Cambridge University Press. Gopnik, A. (2009). Angels and ages: A short book about Darwin, Lincoln, and modern life. London, England: Quercus. Gould, S. J. (1985). The median isn’t the message. Discover, 6, 40–42. Gould, S. J. (1997). Life’s grandeur. London, England: Vintage. Hagen, R. L. (1997). In praise of the null hypothesis statistical test. American Psychologist, 52, 15–24. doi:10.1037/0003-066X.52.1.15 Hammond, G. (1996). The objections to null hypothesis testing as a means of analysing psychological data. Australian Journal of Psychology, 48, 104–106. doi:10.1080/00049539608259513 Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (Eds.). (1997). What if there were no significance tests? Mahwah, NJ: Erlbaum. Hayes, S. C. (1981). Single-case research designs and empirical clinical practice. Journal of Consulting and Clinical Psychology, 49, 193–211. doi:10.1037/0022006X.49.2.193 Hayes, S. C., Barlow, D. H., & Nelson-Gray, R. O. (1999). The scientist practitioner: Research and accountability 194

in the age of managed care (2nd ed.). Boston, MA: Allyn & Bacon. Hays, W. L. (1963). Statistics for psychologists. New York, NY: Holt, Rinehart & Winston. Hersen, M., & Barlow, D. H. (1976). Single-case experimental designs: Strategies for studying behavior change. Oxford, England: Pergamon Press. Howard, K. I., Moras, K., Brill, P. L., Martinovich, Z., & Lutz, W. (1996). Evaluation of psychotherapy: Efficacy, effectiveness, and patient progress. American Psychologist, 51, 1059–1064. doi:10.1037/ 0003-066X.51.10.1059 Hubbard, R. (2004). Alphabet soup: Blurring the distinctions between p’s and α’s in psychological research. Theory and Psychology, 14, 295–327. doi:10.1177/0959354304043638 Hubbard, R., & Lindsay, R. M. (2008). Why p values are not a useful measure of evidence in statistical significance testing. Theory and Psychology, 18, 69–88. doi:10.1177/0959354307086923 Hubbard, R., Parsa, R. A., & Luthy, M. R. (1997). The spread of statistical testing in psychology. Theory and Psychology, 7, 545–554. doi:10.1177/0959354 397074006 Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12–19. doi:10.1037/0022006X.59.1.12 Jex, S. M., & Britt, T. W. (2008). Organizational psychology: A scientist-practitioner approach (2nd ed.). Hoboken, NJ: Wiley. Johnston, J. M., & Pennypacker, H. S. (1993). Readings for strategies and tactics of behavioral research (2nd ed.). Hillsdale, NJ: Erlbaum. Jones, R. R. (1978). A review of: Single-case experimental designs: Strategies for studying behavior change by Michel Hersen and David H. Barlow. Journal of Applied Behavior Analysis, 11, 309–313. doi:10.1901/ jaba.1978.11-309 Kazdin, A. E. (1978). History of behavior modification: Experimental foundations of contemporary research. Baltimore, MD: University Park Press. Kazdin, A. E. (2008). Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care. American Psychologist, 63, 146–159. doi:10.1037/0003-066X.63.3.146 Kendall, M. G., & Plackett, R. L. (Eds.). (1977). Studies in the history of statistics and probability (Volume 2). High Wycombe, England: Griffin. Kiesler, D. J. (1966). Some myths of psychotherapy research and the search for a paradigm. Psychological Bulletin, 65, 110–136. doi:10.1037/h0022911

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology

Kiesler, D. J. (1971). Experimental designs in psychotherapy research. In A. E. Bergin & S. L. Garfield (Eds.), Handbook of psychotherapy and behavior change (pp. 36–74). London, England: Wiley. Kihlstrom, J. F., & Kihlstrom, L. C. (1998). Integrating science and practice in an environment of managed care. In D. K. Routh & R. J. DeRubes (Eds.), The science of clinical psychology: Accomplishments and future directions (pp. 281–293). Washington, DC: American Psychological Association. doi:10.1037/10280-012 Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association. doi:10.1037/10693-000 Korchin, S. J. (1983). The history of clinical psychology: A personal view. In M. Hersen, A. E. Kazdin, & A. S. Bellack (Eds.), The clinical psychology handbook (pp. 5–19). New York, NY: Pergamon Press. Kratochwill, T. R., & Levin, J. R. (2010). Enhancing the credibility of single-case intervention research: Randomization to the rescue. Psychological Methods, 15, 124–144. doi:10.1037/a0017736 Lerman, D. C. (2003). From the laboratory to community application: Translational research in behavior analysis. Journal of Applied Behavior Analysis, 36, 415–419. doi:10.1901/jaba.2003.36-415

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834. doi:10.1037/0022-006X. 46.4.806 Michael, J. (1974). Statistical inference for individual organism research: Mixed blessing or curse. Journal of Applied Behavior Analysis, 7, 647–653. doi:10.1901/jaba.1974.7-647 Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific psychology, this time forever. Measurement: Interdisciplinary Research and Perspective, 2, 201–218. Morgan, D. L., & Morgan, R. K. (2001). Singleparticipant research design: Bringing science to managed care. American Psychologist, 56, 119–127. doi:10.1037/0003-066X.56.2.119 Morgan, D. L., & Morgan, R. K. (2009). Single-case research methods for the behavioral and health sciences. Los Angeles, CA: Sage. Morris, E. K. (2008). Sidney W. Bijou: The Illinois years, 1965–1975. Behavior Analyst, 31, 179–203. Newman, F. L., & Tejeda, M. J. (1996). The need for research that is designed to support decisions in the delivery of mental health services. American Psychologist, 51, 1040–1049. doi:10.1037/0003066X.51.10.1040

Lilienfeld, S. O., Lynn, S. J., & Lohr, J. M. (2003). Science and pseudoscience in clinical psychology. In S. O. Lilienfeld, S. J. Lynn, & J. M. Mohr (Eds.), Science and pseudoscience in clinical psychology (pp. 1–14). New York, NY: Guilford Press.

Newman, K. J. (2000). The current implementation status of the Boulder model. Unpublished master’s thesis, University of Canterbury, Christchurch, New Zealand.

Lindquist, F. F. (1940). Statistical analysis in educational research. Boston, MA: Houghton Mifflin.

Neyman, J. (1950). First course in probability and statistics. New York, NY: Holt.

Lindsley, O. R., & Skinner, B. F. (1954). A method for the experimental analysis of the behavior of psychotic patients. American Psychologist, 9, 419–420.

Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301. doi:10.1037/1082-989X.5.2.241

Martin, P. R. (1989). The scientist-practitioner model and clinical psychology: Time for a change? Australian Psychologist, 24, 71–92. doi:10.1080/00050068 908259551 Mazur, J. (2009). Learning and behavior. Upper Saddle River, NJ: Prentice Hall. McDougall, D. (2005). The range-bound changing criterion design. Behavioral Interventions, 20, 129–137. doi:10.1002/bin.189 McFall, R. M. (2007). On psychological clinical science. In T. A. Treat, R. R. Bootzin, & T. B. Baker (Eds.), Psychological clinical science: Papers in honor of Richard M. McFall (pp. 363–396). New York, NY: Psychology Press. McReynolds, P. (1997). Lightner Witmer: His life and times. Washington, DC: American Psychological Association. doi:10.1037/10253-000

O’Donnell, J. M. (1985). The origins of behaviorism: American psychology, 1870–1920. New York, NY: New York University Press. Olatunji, B. O., Feldner, M. T., Witte, T. H., & Sorrell, J. T. (2004). Graduate training of the scientistpractitioner: Issues in translational research and statistical analysis. Behavior Therapist, 27, 45–50. Orlitzky, M. (2011). How can significance tests be deinstitutionalized? Organizational Research Methods. Advance online publication. doi:10.1177/1094428 111428356 Peterson, D. R. (2003). Unintended consequences: Ventures and misadventures in the education of professional psychologists. American Psychologist, 58, 791–800. doi:10.1037/0003-066X.58.10.791 Porter, R. (1997). The greatest benefit to mankind. London, England: HarperCollins. 195

Neville M. Blampied

Price, R. H., & Behrens, T. (2003). Working Pasteur’s quadrant: Harnessing science and action for community change. American Journal of Community Psychology, 31, 219–223. doi:10.1023/A:10239 50402338

Schatz, P., Jay, K. A., McComb, J., & McLaughlin, J. R. (2005). Misuse of statistical tests in Archives of Clinical Neuropsychology publications. Archives of Clinical Neuropsychology, 20, 1053–1059. doi:10.1016/j.acn.2005.06.006

Raimy, V. C. (Ed.). (1950). Training in clinical psychology (Boulder conference). New York, NY: Prentice Hall.

Schlinger, H. D. (1996). How the human got its spots. Skeptic, 4, 68–76.

Reich, J. W. (2008). Integrating science and practice: Adopting the Pasteurian model. Review of General Psychology, 12, 365–377. doi:10.1037/1089-2680. 12.4.365

Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 115–129. doi:10.1037/1082-989X.1.2.115

Reisman, J. M. (1966). The development of clinical psychology. New York, NY: Appleton-Century-Crofts.

Schmidt, F. L., & Hunter, J. E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 38–64). Mahwah, NJ: Erlbaum.

Reisman, J. M. (1991). A history of clinical psychology (2nd ed.). New York, NY: Taylor & Francis. Rodgers, J. L. (2010). The epistemology of mathematical and statistical modeling: A quiet methodological revolution. American Psychologist, 65, 1–12. doi:10.1037/a0018326 Rorer, L. G. (1991). Some myths of science in psychology. In D. Cicchetti & W. M. Grove (Eds.), Thinking clearly about psychology: Vol. 1. Matters of public interest (pp. 61–87). Minneapolis: University of Minnesota Press. Rosenthal, R. (1995). Progress in clinical psychology: Is there any? Clinical Psychology: Science and Practice, 2, 133–150. doi:10.1111/j.1468-2850.1995.tb00035.x Rossen, E., & Oakland, T. (2008). Graduate preparation in research methods: The current status of APAaccredited professional programs in psychology. Training and Education in Professional Psychology, 2, 42–49. doi:10.1037/1931-3918.2.1.42 Rourke, B. P. (1995). The science of practice and the practice of science: The scientist-practitioner model in clinical neuropsychology. Canadian Psychology/ Psychology Canadienne, 36, 259–277. Routh, D. K. (1998). Hippocrates meets Democritus: A history of psychiatry and clinical psychology. In A. S. Bellack & M. Hersen (Eds.), Comprehensive clinical psychology: Vol. 1. Foundations (pp. 2–48). Oxford, England: Elsevier. Routh, D. K. (2000). Clinical psychology training: A history of ideas and practices prior to 1946. American Psychologist, 55, 236–241. doi:10.1037/0003-066X. 55.2.236

Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90–100. doi:10.1037/a0015108 Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309–316. doi:10.1037/ 0033-2909.105.2.309 Seligman, M. E. P. (1995). The effectiveness of psychotherapy: The Consumer Reports study. American Psychologist, 50, 965–974. doi:10.1037/0003-066X. 50.12.965 Shapiro, D. (2002). Renewing the scientist-practitioner model. Psychologist, 15, 232–234. Sheridan, E. P., Perry, N. W., Johnson, S. B., Clayman, D., Ulmer, R., Prohaska, T., . . . Beckman, L. (1989). Research and practice in health psychology. Health Psychology, 8, 777–779. doi:10.1037/h0090321 Sidman, M. (1960). Tactics of scientific research. New York, NY: Basic Books. Skinner, B. F. (1938). The behavior of organisms. New York, NY: Appleton-Century-Crofts. Skinner, B. F. (1953). Some contributions of an experimental analysis of behavior to psychology as a whole. American Psychologist, 8, 69–78. doi:10.1037/ h0054118 Skinner, B. F. (1954). A new method for the experimental analysis of the behavior of psychotic patients. Journal of Nervous and Mental Disease, 120, 403–406.

Rozin, R. (2009). What kind of empirical research should we publish, fund, and reward. Perspectives on Psychological Science, 4, 435–439. doi:10.1111/ j.1745-6924.2009.01151.x

Skinner, B. F. (1956). A case history in scientific method. American Psychologist, 11, 221–233. doi:10.1037/ h0047662

Rucci, A. J., & Tweney, R. D. (1980). Analysis of variance and the “second discipline” of scientific psychology: A historical account. Psychological Bulletin, 87, 166–184. doi:10.1037/0033-2909.87.1.166

Smith, L. D., Best, L. A., Cylke, V. A., & Stubbs, A. D. (2000). Psychology without p values: Data analysis at the turn of the 19th century. American Psychologist, 55, 260–263. doi:10.1037/0003-066X.55.2.260

196

Single-Case Research Designs and the Scientist-Practitioner Ideal in Applied Psychology

Soldz, S., & McCullogh, L. (Eds.). (2000). Reconciling empirical knowledge and clinical experience: The art and science of psychotherapy. Washington, DC: American Psychological Association. doi:10.1037/ 10567-000 Staines, G. L. (2008). The causal generalization paradox: The case of treatment outcome research. Review of General Psychology, 12, 236–252. doi:10.1037/10892680.12.3.236

editorial policies regarding statistical significance and effect size. Theory and Psychology, 10, 413–425. doi:10.1177/0959354300103006 Valsiner, J. (1986). Where is the individual subject in scientific psychology? In J. Valsiner (Ed.), The individual subject and scientific psychology (pp. 1–14). New York, NY: Plenum Press.

Stokes, D. E. (1997). Pasteur’s quadrant: Basic science and technological innovation. Washington, DC: Brookings Institution Press.

Vespia, K. M., & Sauer, E. M. (2006). Defining characteristic or unrealistic ideal: Historical and contemporary perspectives on scientist-practitioner training in counselling psychology. Counselling Psychology Quarterly, 19, 229–251. doi:10.1080/09515070600960449

Streiner, D. L. (2006). Sample size in clinical research: When is enough enough? Journal of Personality Assessment, 87, 259–260. doi:10.1207/s15327752jpa8703_06

Wagenmakers, E. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin and Review, 14, 779–804.

Stricker, G. (1975). On professional schools and professional degrees. American Psychologist, 30, 1062–1066. doi:10.1037/0003-066X.30.11.1062 Stricker, G. (2000). The scientist-practitioner model: Gandhi was right again. American Psychologist, 55, 253–254. doi:10.1037/0003-066X.55.2.253 Tavaris, C. (2003). The widening scientist-practitioner gap. In S. O. Lilienfeld, S. J. Lynn, & J. M. Mohr (Eds.), Science and pseudoscience in clinical psychology (pp. ix–xviii). New York, NY: Guilford Press. Thompson, T. (1984). The examining magistrate for nature: A retrospective review of Claude Bernard’s An introduction to the study of experimental medicine. Journal of the Experimental Analysis of Behavior, 41, 211–216. doi:10.1901/jeab.1984.41-211 Thompson, T., & Hackenberg, T. D. (2009). Introduction: Translational science lectures. Behavior Analyst, 32, 269–271. Thorne, F. C. (1945). The field of clinical psychology, past, present, future [Editorial]. Journal of Clinical Psychology, 1, 1–20. Ullmann, L. P., & Krasner, L. (Eds.). (1966). Case studies in behavior modification. New York, NY: Holt, Rinehart & Winston. Ulrich, R., Stachnik, T., & Mabry, J. (1966). Control of human behavior. Glenview, IL: Scott, Foresman. Vacha-Haase, T., Nilsson, J. E., Reetz, D. R., & Thompson, B. (2000). Reporting practices and APA

Wainer, H. (1999). One cheer for null hypothesis significance testing. Psychological Methods, 4, 212–213. doi:10.1037/1082-989X.4.2.212 Wessley, S. (2001). Randomised controlled-trials: The gold standard. In C. Mace, S. Moorey, & B. Roberts (Eds.), Evidence in the psychological therapies (pp. 46–60). Hove, England: Brunner-Routledge. Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604. doi:10.1037/0003-066X.54.8.594 Witmer, L. (1996). Clinical psychology. American Psychologist, 51, 248–251. (Original work published 1907) doi:10.1037/0003-066X.51.3.248 Wolf, M. M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its heart. Journal of Applied Behavior Analysis, 11, 203–214. doi:10.1901/jaba.1978.11-203 Wright, D. B. (2009). Ten statisticians and their impacts for psychologists. Perspectives on Psychological Science, 4, 587–597. doi:10.1111/j.1745-6924.2009.01167.x Yates, F., & Mather, K. (1963). Ronald Aylmer Fisher. Biographical Memoirs of Fellows of the Royal Society of London, 9, 91–120. Ziliak, S. T., & McCloskey, D. N. (2008). The cult of statistical significance. Ann Arbor: University of Michigan Press.

197

Chapter 9

Visual Analysis in Single-Case Research Jason C. Bourret and Cynthia J. Pietras

The visual analysis, or inspection, of graphs showing the relation between environmental (independent) variables and behavior is the principal method of analyzing data in behavior analysis. This chapter is an introduction to such visual analysis. We begin by describing the components, and construction, of some common types of graphs used in behavior analysis. We then describe some techniques for analyzing graphic data, including graphs from common single-subject experimental designs. Types of Graphs and Their Construction Of the many ways to graph data (see Harris, 1996), the graph types most frequently used by behavior analysts are cumulative frequency graphs, bar graphs, line graphs, and scatterplots. Each of these is described in more detail in this section.

Cumulative Frequency Graphs Cumulative frequency graphs show the cumulative number of responses across observation periods. The earliest, and most common, such graph used by behavior analysts is the cumulative record. In the other graph types discussed in this section, measures of behavior during an observation period are collapsed into a single quantity (e.g., mean response rate during a session) that is represented on a graph by a single data point. By contrast, cumulative records show each response and when it occurred during an observation period. Thus, cumulative records provide a detailed picture of within-session

behavior patterns (see Ferster & Skinner, 1957/1997). An example of a cumulative record is shown in Figure 9.1. On a cumulative record, equal horizontal distances represent equal lengths of time; equal vertical distances represent equal numbers of responses. The slope of the curve in a cumulative record indicates the rate of responding. Researchers sometimes include an inset scale on cumulative records to indicate the rate of responding represented by different slopes, although usually more precise calculations of rate are also provided. Small vertical lines oblique to the prevailing slope of the line, traditionally called pips, typically indicate reinforcer deliveries. When the response pen reaches the top of the page, it resets to the bottom, producing a straight vertical line. Researchers may also program the response pen to reset at designated times (e.g., when a schedule change occurs), to visually separate data collected under different conditions. Cumulative records were traditionally generated by nowobsolete, specially designed machines (cumulative recorders). More contemporarily, computer software programs that record and plot each response as it occurs have been used to generate these records. Cumulative records can also be constructed after data collection is complete, if the time of occurrence of each response and all other relevant events during a session are recorded. Although the cumulative record was one of the most commonly used graphs in the early years of the experimental analysis of behavior, it has since fallen out of favor as researchers increasingly present data averaged across a single or multiple sessions. It is

DOI: 10.1037/13937-009 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

199

Bourret and Pietras

RESPONSES

HIGHER RATE “SMOOTH” RESPONDING

“GRAINY” RESPONDING

LOWER RATE NO RESPONDING

PIPS EVENT PIN

TIME

Figure 9.1. Example of patterns that may be observed on a cumulative record. Cumulative responses are shown on the vertical axis, and time is shown on the horizontal axis. Each response moves the response pen a constant distance in the vertical direction, and the paper moves vertically at a constant speed. Shown (right to left) are smooth curves indicating constant rates of responding, grainy curves indicating irregular patterns of responding, shallow curves indicating low rates of responding, and steep curves indicating high rates of responding. Flat portions indicate no responding. Pips (downward deflections of the response pen) usually indicate reinforcer deliveries. Movements of the event pen are used to signal changes in experimental contingencies or stimulus conditions. Data are hypothetical.

useful in the experimental analysis of behavior not as a primary means of data analysis, but as a means of monitoring within-session performance. Another notable exception to the decline of the cumulative record is research published by Gallistel and his colleagues (e.g., Gallistel et al., 2007). In this research line, cumulative records (some of them quite creatively constructed with more than simple responses on the vertical axis) are used extensively to better understand the process by which organisms allocate their behavior between concurrently available sources of food.

Bar Graphs Bar graphs show discrete categories (i.e., a nominal scale; Stevens, 1946) along the horizontal (x) axis and values of the dependent variable on the vertical (y) axis (Shah, Freedman, & Vekiri, 2005). In behavior analysis, bar graphs are often used to show percentages (e.g., of correct responses) or average performance across stable sessions or conditions. As shown in Figure 9.2, bar graphs facilitate comparisons of performances (i.e., the height of each bar) 200

across conditions. Typically, bars are separated from each other, but related bars may be grouped together. One variation of the standard (vertical) bar graph is a horizontal bar graph, in which the categorical variable is plotted on the y-axis. On these graphs, the length of the bar along the x-axis shows the value of the dependent variable. Bars graphs may also be drawn so that bars can deviate in either direction from a center line. Such graphs may be used, for example, to show increases or decreases from baseline values that are represented by a center horizontal line at zero. Bar graphs are similar to histograms, but histograms (in which the bars touch each other) have interval x-axis values and typically show frequency distributions (see Figure 9.3). In bar graphs, the y-axis scale usually begins at the lowest possible value (e.g., zero) but may begin at a higher value if the low range would be devoid of data. Constraining the lower range of y-axis values will make differences across conditions appear bigger than they would have been had the range extended to zero, a factor to consider when evaluating the clinical (or other) importance of the visually

Visual Analysis in Single-Case Research

100

When group statistical designs are used, bar graphs frequently summarize the differences in group mean performances. Reliance on statistical analyses of grouped data may lead to the omission of error bars from such graphs, a practice that obscures the size of individual differences within groups. Figure 9.4 shows a variation on the between-groups bar graph that displays the performance of individual participants (individual data points) while maintaining the ease of comparing measures of central tendency (height of the bars). Graphs constructed in this way make it possible to evaluate to what extent the height of the bar describes the behavior of individual participants. A second advantage of this type of bar graph is that it allows readers to evaluate whether the within-group variance is normally distributed, an important factor when evaluating the appropriateness of the statistical analyses used. Finally, the graphing conventions of Figure 9.4 encourage the researchers to consider those individuals in the treatment group for whom the treatment produced no positive effect. As often noted by

Percent Correct

80 60 40 20 0 No Timeout

5-s Timeout

20-s Timeout

No Timeout

5-s Timeout

20-s Timeout

100

Percent Correct

80 60 40 20 0

Figure 9.2. Examples of bar graphs. Data are hypothetical.

Proportion of Responses

0.32 0.28 0.24 0.20 0.16 0.12 0.08 0.04 0.00

0

1

2

3

4

5

6

7

8

9 >10

Interresponse Time Bins (s)

Figure 9.3. Example of a histogram. Data are hypothetical.

apparent difference. When bar graphs show measures of central tendency (e.g., means), error bars (the vertical lines in Figure 9.2) should be included to depict the variance in the data (e.g., betweensession differences in percentage correct).

Figure 9.4. A bar graph that shows mean performance and the performance of individuals making up the mean. The height of the bar shows the mean percentage of urine samples negative for cocaine and opiates, and the closed circles show the percentage of negative samples for each individual undergoing employmentbased abstinence reinforcement treatment for cocaine dependence. From “Employment-Based Abstinence Reinforcement as a Maintenance Intervention for the Treatment of Cocaine Dependence: A Randomized Controlled Trial,” by A. DeFulio, W. D. Donlin, C. J. Wong, and K. Silverman, 2009, Addiction, 104, p. 1535. Copyright 2009 by John Wiley & Sons, Ltd. Used with permission. 201

Bourret and Pietras

Sidman (1960), the data from these individuals serve to illustrate that the behavior is not fully understood and, by investigating the factors affecting these individuals’ behavior further, more effective interventions will follow.

Line Graphs (Time Series and Relational) Line graphs are used to depict behavior across time (time-series line graphs) or relations between two variables (relational line graphs; see Tufte, 1983). With time-series line graphs (e.g., see Figure 9.5), the horizontal axis (referred to as the abscissa or the x-axis) illustrates the time point at which each data point was collected, and behavior is plotted at each of these time points on the vertical y-axis (ordinate). On relational line graphs, the x-axis shows values of the independent variable, and the y-axis shows a central tendency measure of the dependent variable. Data points in both types of line graphs are connected with straight lines. Figure 9.5 shows the parts of a time-series line graph. Axis labels indicate the variables plotted. If multiple graphs appear in a figure with the same axes, then, to reduce the amount of text in the figure, only the x- and y-axes on the bottom and leftmost graphs, respectively, are labeled (for an example, see Figure 9.6). The scales of both the x- and y-axes are divided into intervals. Successive intervals on an axis are equally sized and are marked with short lines, called tick marks, that intersect the

Figure 9.5. Diagram of parts of a time-series line graph. Data are hypothetical. 202

axis. Tick marks can point inward or outward, but they should point outward if inward-pointing ticks would interfere with the data. Tick marks are labeled to indicate the interval value. Tick-mark intervals should be frequent enough that a reader can determine the value of a data point, but not so frequent that the axis becomes cluttered. Intervals between major ticks may be marked with unlabeled minor ticks. In Figure 9.5, for example, every fifth interval is labeled, and minor tick marks indicate intervals between them. If multiple graphs appear in a figure with the same axis scale, then only the tick marks on the bottom and left graphs, respectively, are labeled. The x- and y-axis scales should be large enough to encompass the full range of the data; however, they should not exceed the data range, or the graph will contain empty space and compress the data unnecessarily. Starting the axis scales at values below the minimum data range may make data points at the minimum value (e.g., zero) more visible by preventing them from falling directly on the x- or y-axis. As with bar graphs, line graphs normally start at zero, but if the data range is great, there may be a break in the axis with the larger numbers indicated after the break. When plotting data for individual participants in separate graphs within a single figure, it is sometimes not realistic to represent data for each

Figure 9.6. Examples of linear and logarithmic (log) scales. The upper two graphs show data plotted using linear y-axis scales. The lower two graphs show the same data, but plotted using log (base 10) y-axis scales. Data are hypothetical.

Visual Analysis in Single-Case Research

articipant within a single range on the y-axis (e.g., p the range for one participant may be between one and 10 responses and for another, between 400 and 450 responses). It is always better to use the same ranges on the y-axis, but when this is not possible, different ranges for different participants may be used, and this deviation must be noted. The aspect ratio, or the height-to-width ratio of the y- and x-axes, should not distort the data. Too great an aspect ratio (i.e., a very tall graph) may magnify variability, or small effects, and too small an aspect ratio (i.e., a very long graph) may obscure important changes in behavior or variability in a data set (Parsonson & Baer, 1986). A 2:3 y:x aspect ratio (Parsonson & Baer, 1978) or a 1.0:1.618 aspect ratio (Tufte, 1983) has been recommended. Breaks on the y-axis may be used if there are outlier data points. Outliers are idiosyncratic data points that far exceed the range of other data (e.g., data points more than 3 standard deviations from the mean). Breaks on the x-axis may be used if there are breaks in data collection. Data points are marked with symbols, and a data path is created by connecting the data points with straight lines. When multiple data paths are shown on a graph, each data type is represented by a distinct symbol, and a central figure legend provides a concise description of each path. A common graphing convention in applied behavior analysis is to describe each data path with text and an arrow pointing from the text to the corresponding data path. Using a central legend, as in Figure 9.5, facilitates the transmission of information because scanning the graph is not required to find figure legend information. A second advantage of the central legend is that it avoids the possibility that arrows pointing to a particularly high or low data point may influence the visual analysis of the data. Phase changes in time-series line graphs are denoted by vertical lines extending the length of the y-axes. Phase-change lines are placed between the last data point of a condition and the first data point of the new condition. Data paths are broken across phase-change lines to avoid the appearance that behavior change produced by an independent variable occurred before the change in condition. Descriptive phase labels are centered at the top of

the space allocated to each phase. Figure legends, and phase labels, are usually placed within the rectangular space created by the x- and y-axes. Figure captions are placed below graphs and describe what is plotted, the axes, any abbreviations or symbols that appear in the graph, and any axis breaks. Linear interval scales are the most common scales used in time-series line graphs, but logarithmic (log) interval scales and semi-log interval scales (in which the x- or y-axis is log scaled and the other is linearly scaled) are also used. Log scales are helpful in more normally distributing data sets that are skewed toward large values (Cleveland, 1994), transforming curvilinear data into linear data (Shull, 1991) and showing proportional changes in behavior (Cooper, Heron, & Heward, 1987). Because the logarithm of zero is undefined, log scales have no zero. Log base 10, base 2, and base e (natural logs) are the most common log scales (see Cleveland, 1994, for some recommendations for the use of various log bases). Illustrations of data plotted on both a linear scale and a semi-log (base 10) scale are shown in Figure 9.6. In the upper left graph, response rates in Phase B appear to increase more quickly between sessions than in Phase A. This difference, however, may be attributed to the greater absolute value of the response rate. Plotted on a log scale (lower left graph), it is visually apparent that the rate of change is similar in both phases. In the upper right graph, there appears to be a large shift in performance from Phase A to Phase B. The arithmetic scale of the y-axis, however, compresses the low rates in Phase A. When data are plotted on a log scale, the low rates are more visible and the increase in responding in Phase B can be seen to be part of an increasing trend that began in Phase A.

Scatterplots Scatterplots present a dependent variable in relation to either an independent variable (in which case the graph may be described as a relational graph; see Tufte, 1983) or another dependent variable. When both measures are dependent variables, either measure can be plotted on the horizontal axis, although if one measure is conceptualized as a predictor variable, it is plotted on the x-axis, and the other, the 203

Bourret and Pietras

Figure 9.7. Example of a scatterplot. Data are hypothetical.

criterion variable, is plotted on the y-axis. An example of a scatterplot is shown in Figure 9.7. In this figure, which shows data from a hypothetical experiment investigating choice between two concurrently available reinforcement schedules, the log of the ratio of response rates on the right and left alternatives is plotted on the y-axis and the log of the ratio of reinforcement rates is plotted on the x-axis. In scatterplots, data points are not connected with lines, usually because measures are independent of each other (e.g., they are data points from different conditions or participants) or because they are not sequentially related. Sometimes, however, lines or curves are fit to data on scatterplots to indicate the form of the relation between the two variables (see Interpreting Relational Graphs section). The line in Figure 9.7 shows the best-fitting linear regression line. That data points fall near this line indicates a linear relation (matching) between response rates and reinforcement rates.

Other Types of Graphs The types of graphs we have discussed do not represent an exhaustive list of the types of graphs used by behavior analysts and subjected to visual analysis. For example, sometimes examining data across time and within individual sessions is useful, in which case a three-dimensional graph would be appropriate, with the dependent variable on the y-axis, within-session time on the x-axis, and successive sessions on the third (z) axis (e.g., Cançado & Lattal, 2011). Three-dimensional graphs may also 204

be used to show other types of interactions, such as changes in interresponse time distributions on a reinforcement schedule across sessions (Gentry, Weiss, & Laties, 1983) or effects of different drug doses on response run length distributions (Galbicka, Fowler, & Ritch, 1991). Other graphing techniques have been used to depict specific kinds of relations. Staddon and Simmelhag (1971), for example, used detailed flow charts to graphically show the conditional probabilities of different responses given an initial response. Davison and Baum (2006) depicted the number of responses to different alternatives in a choice experiment as different-sized circles (bubbles). This technique could also be useful in showing, for example, time allocated to playing with multiple toys by a child across successive time periods. These examples are but a few of specialized graphs that may be useful in enhancing the visual depiction of specific data sets or aspects of data sets. For a more complete description of graph types, see Harris (1996). Cleveland and McGill (1984, 1985) offered some useful advice on how to choose graph types to show data with maximum clarity. In undertaking a graphical analysis of data, there are no immutable rules concerning which graphs to use for depicting what. Use is based on precedence, but investigators also need to think outside the axes (so to speak) in using graphs to tell the story of their data.

General Recommendations for Graph Construction Many features of a graph influence a reader’s reaction to the data. Even small details such as tick marks, axis scaling, data symbols, aspect (y:x) ratio, and so forth can affect a graph’s impact, and poor graphing methods can lead to misinterpretations of results (Cleveland, 1994). Creating graphs that are accurate, meaningful, rich in information, yet readily interpretable, therefore, requires planning, experimenting, reviewing, and close attention to detail (Cleveland, 1994; Parsonson & Baer, 1992). For some additional recommendations on producing useful graphs, see Baron and Perone (1998), Cleveland (1994), Johnston and Pennypacker (1993), Parsonson and Baer (1978, 1986), and Tufte (1983).

Visual Analysis in Single-Case Research

When preparing graphs for publication, the Publication Manual of the American Psychological Association (American Psychological Association, 2010) also offers valuable advice. Interpreting Graphical Data In the sections that follow, we describe some strategies for visually analyzing graphical data presented in cumulative records, bar graphs, time-series line graphs, and scatterplots. We also discuss the visual analysis of graphical data generated by some commonly used single-subject experimental designs. If the graph is published, the first step in visual analysis is to determine what is plotted by reading all of the text describing the graph, including the axis labels, condition labels, figure legend, and figure caption. The next step is the analysis of patterns in the graphical data.

Interpreting Cumulative Records In cumulative records, changes in rate of responding and variability in responding are analyzed by examining changes in the slope of the records (Johnston & Pennypacker, 1993). Several patterns that may be distinguished in cumulative records are shown in Figure 9.1, a hypothetical cumulative record. The first smooth curve shows responding occurring at a steady, constant rate, whereas the second shows grainy responding, or responding occurring in unsystematic bouts of high and low rates separated by varying periods of not responding. The flat portion of the third curve indicates no responding. The greater slope of the fourth curve compared with the third curve indicates a higher rate of responding. Cumulative records also allow an analysis of responding at a more local level. For example, in Figure 9.1, the second curve from the left, between the third and fourth pip, shows that responding occurred first at a low rate, then rapidly increased, then gradually decreased again before the reinforcer delivery. Such a fine-grained analysis is not possible with other graph types.

Interpreting Bar Graphs When visually analyzing bar graphs, the relative heights of bars are compared across conditions (see

Figure 9.2). When making this comparison, attention should be given to the y-axis scale to determine whether the range has been truncated. In bar graphs depicting average performance within a phase, a critical element to evaluate is the length of the error bars. Very long error bars suggest that the performance may not have been stable, and so it will be important to evaluate the stability criterion used. If the range of the data is used, long error bars may also occur if an outlying data point was included in the data set depicted in the graph. If so, then the average value depicted by the height of the bar may not represent most of the data; in such cases, the median would be a better measure of central tendency. Error bars also indicate the overlap of data points across conditions. For example, Figure 9.2 shows results from a hypothetical experiment that evaluated the effects of three time-out durations after incorrect responses on match-to-sample performance. The height of the bar is the mean, and the error bar shows the standard deviation. In the top graph, the error bars are long, and the mean of the 20-second condition overlaps with the variance in the 5-second condition. Thus, differences between the 5- and 20-second time-out duration conditions are less convincing than the difference between the no time-out and the 20-second conditions. Error bars provide no information about trends in the data, however, and a reader must look to the text of the article or to other graphs for evidence that the performances plotted in a bar graph represent stable responding. Care should also be taken to consider which measure of variability is represented by the error bars. The standard deviations plotted in Figure 9.2 quantify the average deviation of each data point from the condition mean and, therefore, are an appropriate measure of variability when mean values are reported (interquartile ranges usually accompany medians). Some errors bars will depict the standard error of the mean, and readers should interpret these with caution. The standard error of the mean is used to estimate the standard deviation among many different means sampled from a normally distributed population of values. As such, it tells one less about the variability in the data than does the standard deviation. Moreover, the standard 205

Bourret and Pietras

error of the mean is calculated by dividing the sample standard deviation by the square root of n (i.e., the number of values used to calculate the mean); thus, error bars depicting the standard error of the mean will be increasingly more narrow than the standard deviation as the number of data points included in the data set increases. If the standard error of the mean had been plotted in Figure 9.2 instead of the standard deviation, the visually apparent difference between all three conditions would seem greater even with no change in the data set plotted. The general strategies we have outlined (i.e., consider the difference in the measure of central tendency in light of the variability in the data to evaluate how convincing the difference is) are formalized by common inferential statistical tests. Behavior analysts wanting to reach a broader audience of scientists (including extramural grant reviews), professionals, and public policymakers may wish to use these tests in addition to conducting a careful visual analysis of the data.

inhibit one’s ability to detect small but clinically important effects of an experimental manipulation (Sidman, 1960). Thus, whenever possible, conditions should remain unchanged until stability is achieved. Stability of time-series data may be assessed by visual inspection or quantitative criteria (see Perone, 1991; Sidman, 1960; Chapter 5, this volume). Both will evaluate bounce and trend, with the latter catching patterns that may be missed by the quantitative criterion. Bounce refers to unsystematic changes in behavior across successive data points, whereas trend refers to a systematic (directional) change (Perone, 1991). Figure 9.8 shows baseline data for two participants. The baseline depicted in the top panel has considerably more between-session variability than

Analyzing Time-Series Data in Line Graphs The analysis of time-series data is the most prevalent visual inspection practice in behavior analysis. Basic and applied behavior analysts use visual analysis techniques to determine when behavior has stabilized within a phase, to judge whether behavior has changed between phases, and to evaluate the evidence that the experimental variable has affected individual behavior. Evaluating stability. Once an experiment is underway, one of the first decisions a researcher must make is, When should a phase change be made? In most cases, this question is answered by evaluating the stability (i.e., consistency) of responding over time (see Chapter 5, this volume). If behavior is not stable before the condition change (e.g., there is a trend in the direction of the anticipated treatment effect), then attributing subsequent shifts in responding to the experimental manipulation will be unconvincing (Johnston & Pennypacker, 1993). Moreover, an unstable baseline (i.e., one containing a great deal of between-session variability) will 206

Figure 9.8. Hypothetical baseline data with added mean lines (solid horizontal lines), range lines (long dashed horizontal lines), trimmed range lines (short dashed horizontal lines), and regression lines (solid trend lines).

Visual Analysis in Single-Case Research

that depicted in the lower panel. The extent to which the researcher will be concerned with this bounce in the data will depend on how large the treatment effect is likely to be. If a very large effect is expected, then the intervention data should fall well outside of the baseline range, and therefore the relatively weak experimental control established in the baseline phase would be acceptable. If, however, a smaller effect is anticipated, then the intervention data are unlikely to completely separate from the range of data in the baseline, making detection of an intervention effect impossible. Under these conditions, the researcher would be well served to further identify the source of variability in the baseline. Indeed, if the researcher succeeds in this endeavor, a potent behavior-change variable may be identified. Visually analyzing bounce may be facilitated by the use of the horizontal lines shown in Figure 9.8. The solid horizontal line shows the mean of the entire phase (i.e., the mean level of the data path) and allows one to see graphically how much each data point deviates from an ostensibly appropriate measure of central tendency.1 The dashed lines furthest from the mean line illustrate the range of the data (i.e., they are drawn through the single data point furthest from the mean), whereas the dashed lines within these dashed lines show a trimmed range in which the furthest data point from the mean is ignored (see Morley & Adams, 1991). Drawing range and trimmed range lines may be useful when considering how large the intervention effect will have to be to discriminate a difference between the baseline and intervention data. Clearly, to produce a visually apparent difference, the intervention implemented in the top panel will have to produce a much larger effect than that implemented in the bottom panel. Neither range lines nor trimmed range lines will depict changes in variability within a condition, however. To visualize changes in variability within conditions, Morley and Adams (1991) suggested plotting trended range lines. To construct these, the data in a condition are divided in half along the x-axis, and the middle x-axis value of each half is located. For each half, the

minimum and maximum y-axis data points are located, and those values are marked at the corresponding x-axis midpoint. Finally, two lines on the graph are drawn connecting the two minimum data points from each half and the two maximum data points from each half. Converging lines suggest decreasing variability (bounce) across the phase, diverging lines suggest increasing variability, and parallel lines suggest that the variability is constant. The next characteristic of the baseline data to be considered, when deciding when to change phases, is the extent to which there is a trend in the data. A first step can be to plot a line of best fit (i.e., linear regression) through the baseline data. Any graphing software package will suffice. Researchers should be aware, however, that a line of best fit can be unduly affected by outliers (Parsonson & Baer, 1992). One alternative to linear regression that was recommended by Cleveland (1994) is the curve-smoothing loess technique. The loess technique is less sensitive to outliers and does not assume that data will conform to any particular function (e.g., a straight line). This technique smoothes data and makes patterns more visible by plotting, for each x-axis value, an estimate of the center of the distribution of y values falling near that x value (akin to a moving average; for descriptions, see Cleveland, 1994; Cleveland & McGill, 1985). Linear regression, however, has the advantage of being a more widely used technique, and it quantifies the linear relation between the two variables (i.e., estimates the slope and y-intercept). In the upper panel of Figure 9.8, the line of best fit indicates an upward trend in the baseline data, suggesting that if no intervention is implemented, the rate of response will continue to increase over time. This is problematic if one expects the intervention to increase response rates. In the lower panel of Figure 9.8, the trend line is horizontal and overlaps with the mean line. Thus, in the absence of an experimental manipulation, the best prediction about future behavior is that it will remain stable with little between-session variability. Baseline data need not be completely free of trends before a phase

Plotting a mean line is appropriate only if the data in the phase are free of extreme values that will pull the mean line away from the center of the distribution. Under such cases, a median line would be a better visual analysis tool.

1

207

Bourret and Pietras

is ended and the intervention is begun. If the baseline data are trending down (up), and the intervention is anticipated to increase (decrease) responding, then the baseline trend is of little concern. A modest trend in the direction of the anticipated intervention effect is also acceptable as long as the intervention proves to produce a large change in trend and mean level. Finally, continuing a baseline until it is free of trends and the bounce is minimal is sometimes impractical. In applied settings, it may be unethical to continue a baseline until stability has been achieved because to do so delays the onset of treatment. These practical and ethical concerns, however, must be balanced with the goal of constraining the effects of extraneous variables so that orderly effects of independent variable manipulations can be observed in subsequent conditions. It may be of little use (and may also be considered unethical) to conduct an intervention if the data are so variable that it is impossible to interpret the effects of treatment. Thus, researchers should be especially concerned with achieving stability when treatment effects are unknown or are expected to be small (i.e., when one is conducting research rather than practice). Visual inspection of trends within a data set sometimes reveals nonlinear repetitive patterns, or cycles. Some cycles result from feedback loops created by self-regulating behavior–environment interactions (Baum, 1973; Sidman, 1960), whereas others result from extraneous variables. Identifying the source of cyclical patterns is sometimes necessary to produce behavior change. In Figure 9.9, for example, every other data point is higher than the preceding one. Such a pattern could be the result of

Figure 9.9. Example of a figure showing a cyclical pattern. Data are hypothetical. 208

different experimenters conducting sessions, changes in levels of food deprivation, or perhaps practice effects if two sessions are conducted each day. Cycles may be difficult to detect if there is a good deal of between-session variability, but plotting data in various formats may help reveal cyclical patterns. For example, plotting each data point as a deviation from the mean using a vertical bar graph can make patterns in the variability more apparent (see Morley & Adams, 1991). The same strategies for evaluating the stability of baseline data are used to evaluate the stability of data in an intervention phase. Figure 9.10 repeats the data from Figure 9.8, but adds data from an intervention phase. In the upper panel, the line of best fit reveals an upward trend in the intervention phase, although the final four data points suggest that the behavior may have asymptoted. The researcher collecting these data should continue the intervention phase to determine whether the performance has reached an asymptote or will increase further given continued exposure to the intervention.

Figure 9.10. Hypothetical baseline and intervention data with added mean lines (solid horizontal lines), range lines (long dashed horizontal lines), trimmed range lines (short dashed horizontal lines), and regression lines (solid trend lines). Baseline data are the same as in Figure 9.8.

Visual Analysis in Single-Case Research

In the lower panel, a similar upward trend may be observed in the intervention phase, but over the final 10 sessions of the phase, the performance has stabilized because there is little bounce around the mean line and no visually apparent trend. Evaluating differences across phases. The second use of visual analysis of time-series data involves comparing the baseline and intervention data to determine whether the difference makes a compelling case that behavior has changed between phases. Determining whether behavior change was an effect of the intervention (assuming a compelling difference is observed) is a different matter, and one that we consider in more detail next. Five characteristics of the data should control the evaluation of behavior change across phases. The first is the change in level. Level refers to the immediate change in responding from the end of one phase to the beginning of the next (Kazdin, 1982). Level is assessed by comparing the last data point from a condition to the first data point of the subsequent condition. In the top graph of Figure 9.10, the change in level was a decrease from about 46 responses per minute to about 26 per minute. In the lower panel, the level increased from about 30 to 48 responses per minute. Level may be used to evaluate the magnitude of treatment effect. Large changes in level suggest a potent independent variable, but only when the data collected in the remainder of the intervention phase continue at the new level, as in the lower panel of Figure 9.10. The level change in the upper panel of Figure 9.10 is inconsistent with most of the remaining intervention data and, therefore, appears to be another instance of uncontrolled between-session variability. As this example illustrates, a level change is neither necessary nor sufficient to conclude that behavior changed in the intervention phase. The second, related characteristic that will affect judgments of treatment effects is latency to change. Latency to change is the time required for change in responding to be detected after the onset of a new experimental condition (Kazdin, 1982). To evaluate latency to change, a researcher must examine multiple data points after the condition change to determine whether a consistent change in level or a

change in trend occurs (at least three data points are required to detect a trend). A short latency to change indicates that the experimental manipulation produced an immediate effect on behavior, whereas a long latency to change indicates either that an extended exposure to the change in the independent variable is required before behavior changes (such as during extinction) or that the change is caused by an extraneous variable. Again, we consider the question of the causal relation between the behavior change and the intervention later in the chapter. In the top panel of Figure 9.10, approximately six sessions were required before the trend and mean level in the intervention phase appear distinguishable from baseline. In the lower graph, changes in trend and mean level were observed in the first three sessions after the phase change, showing more clearly that the data in the two phases are distinct. Although short latencies to change suggest that behavior has changed across phases, this change may be temporary and, therefore, additional observations should be made until one is convinced that the change is enduring. How many additional observations are necessary will be affected by factors such as baseline variability (as in the top panel of Figure 9.10) and how novel the finding is (skeptical scientists prefer to have many observations when the intervention is novel). Under most conditions, an intervention that produces a large but temporary behavior change is of limited utility. The third characteristic of time-series data that is used when visually evaluating differences across phases is the mean shift (Parsonson & Baer, 1992). Mean shift refers to the amount by which the means differ across phases. In both panels of Figure 9.10, there is an upward mean shift from baseline to intervention. The bottom graph, however, illustrates a shift that is visually more compelling. The reason for this takes us to the fourth characteristic controlling visual analysis activities: between-phase overlap. In the upper panel of Figure 9.10, as the range lines illustrate, five of eight data points in the intervention condition fall within the range of the preceding baseline data, and, therefore, the difference is not convincing. Perhaps, in the top graph, if additional data were collected during the intervention phase, and assuming responding remained at 209

Bourret and Pietras

the upper plateau characterizing the final intervention sessions, the difference might be compelling. In the lower graph of Figure 9.10, the level change, mean shift, and limited between-phase overlap in range make the difference visually apparent. The fifth characteristic of the data that will affect visual evaluation of between-phase differences is trend. As noted earlier, if the baseline data are trending up (or down) and responding increased (or decreased) during the intervention phase (upper graph of Figure 9.10), then the mean shift will not be convincing unless the trend line is much steeper in the intervention phase than at baseline. In the upper graph of Figure 9.10, the baseline data show a slight upward trend. Data in the subsequent intervention phase show a steeper trend. The greater the difference in trend is, the clearer it is that the mean shift in the intervention is not simply a continuation of the baseline trend. When evaluating mean shifts, floor or ceiling effects must be considered. These effects occur when performance has reached a minimum or maximum, respectively, beyond which it cannot change further. For example, if baseline response rates are low and an intervention is expected to decrease responding, mean shifts may be small because response rates have little room to further decrease. Readers skeptical of visual analysis practices may be unsettled by the use of terms and phrases such as judgment, visually apparent, and much steeper. How much steeper is “much steeper?” Although any interpretation of data requires the researcher make a variety of judgment calls (e.g., which statistic to use, how to handle missing data), Fisher, Kelley, and Lomas (2003) sought to reduce the number of judgments by developing the conservative dual-criterion (CDC) technique to aid the visual analysis of singlecase data. This method, illustrated in Figure 9.11, involves extending the baseline mean and trend lines into the intervention phase and raising both of these lines by 0.25 standard deviation (or lowering the lines by 0.25 standard deviation, if the intervention is anticipated to decrease responding). A difference across conditions is judged as meaningful when the number of intervention-phase data points falling above both lines (or below both lines in the case of an intervention designed to decrease a behavior) 210

Figure 9.11. Example of the conservative dualcriterion technique applied to hypothetical intervention data. In the baseline phase, solid horizontal lines are mean lines, and solid trend lines are regression lines. To analyze intervention effects, these lines are superimposed onto the intervention phase and raised by 0.25 standard deviation from the baseline means. See text for details.

exceeds a criterion based on the binomial equation (i.e., exceeds the number that would be expected by chance). Fisher et al. found that the use of CDC procedures improved agreement on visual inspection (data were hypothetical, and intervention effects were computer generated). Figure 9.11 shows the CDC applied to the data shown in Figure 9.10. In the top panel, three data points in the intervention phase fall above the two criterion lines. Following the table presented in Fisher et al. (2003) for treatment conditions with eight data points, seven data points should be above both lines to conclude that a compelling difference exists between phases. In the lower panel, the CDC requires that 12 of the 15 data points in the intervention condition appear above both lines, a criterion easily met, so the researcher may conclude that behavior changed across phases. Although the CDC method appears to improve the accuracy of judgments of behavior change, only

Visual Analysis in Single-Case Research

a few studies have yet investigated this technique (Fisher et al., 2003; Stewart, Carr, Brandt, & McHenry, 2007). The Fisher et al. (2003) procedure is, of course, but one technique for making visual assessment of data more objective. It is ultimately incumbent on the investigator or therapist to provide convincing evidence of an effect, whether through some formal set of rules as illustrated by Fisher et al. or by amplifying the effect to the point at which reasonable people agree on it, through increased control over both independent and extraneous variables. Assuming that appropriate decisions were made about stability and a visually apparent behavior change was observed from baseline to the intervention phase, the next task is to evaluate the role of the intervention in that behavior change. Evaluating the causal role of the intervention requires that an appropriate experimental design be used, a topic falling under the scope of Chapter 5, this volume. Here, we largely confine our discussion to the visual analysis techniques appropriate to the most commonly used single-case research designs. In these sections of the chapter, the visual analysis focuses on answering the question, “Did the intervention change behavior?”

systematic behavior changes with each manipulation provide evidence for a causal relation. Figure 9.12 shows the previously considered data set now extended to include a second baseline and a second intervention condition. The bottom graph is easily interpreted. The visually apparent difference between the first baseline phase and the first intervention phase is reversed in the return to baseline. In the second baseline phase, responding was well outside the range in which data should have fallen had no experimental manipulation been implemented. Further evidence for an intervention effect is that responding returned to the level observed in the original baseline. The reintroduction of the intervention (fourth phase) reverses the downward trend in the second baseline, yielding a striking level shift, mean shift, and minimal variability. There is very little overlap in the data across conditions, the latency to change is short, and there are no trends that make interpretation difficult. The mean level is close to the mean level obtained in the first exposure to the intervention, thus replicating the effect. These data thus make a strong case for the intervention as an effective means of influencing behavior.

Comparison designs. The data shown in the lower panel of Figure 9.11 come from a comparison design (or A-B design). There is evidence of a convincing change in behavior across conditions, level and mean level differ, variability and overlap of the data points across conditions are not interfering, and the latency to change is short. Despite stable data, one cannot conclude that the intervention produced the visually apparent behavior change. Although the rapid level change suggests an intervention effect, one cannot rule out extraneous variables that may have changed at the same time that the intervention was introduced (e.g., in addition to the intervention, Participant 2 may have been informed that if his productivity did not improve, his job would be in jeopardy). When visually analyzing data, a difference in behavior between two phases is insufficient evidence that the intervention, and not extraneous variables, produced the change. Reversal designs. In a reversal design, the experimental variable is introduced and removed, and

Figure 9.12. Example of a reversal design. Data from the first baseline and intervention phases are the same as shown in Figures 9.8 and 9.10. Data are hypothetical. 211

Bourret and Pietras

40 30

Intervention 1 Intervention 2 No Intervention

20 10

A 0 40

Responses per Min

The upper panel of Figure 9.12 tells a different story. When the baseline conditions are reestablished in the third phase, there is a precipitous downward trend in behavior. Although this behavior change is consistent with the removal of an effective intervention, the between-session variability in the preceding condition renders an unconvincing the argument for a between-phase behavior change. Clearly, the hypothetical researcher who collected these data failed to continue the first intervention phase long enough for a stable pattern of behavior to develop. If responding had stabilized in the upper plateau reached at the end of the first intervention phase, the sharp reduction in responding in the second baseline may have been more compelling. When the intervention is again introduced, the downward trend levels off, but the data points overlap considerably with the data points for the preceding baseline condition. Furthermore, the mean level in the second intervention phase did not closely replicate the mean level of the first intervention phase.

30 20 10 B 0 40 30 20

Multielement designs. Figure 9.13 shows data from three hypothetical multielement experimental designs (Barlow & Hayes, 1979). In this design, conditions alternate (often after every session), and consistent level changes are suggestive of a functional (causal) relation between variables arranged in the condition and the behavior of interest. Visual analysis of multielement design data requires evaluation of sequence effects in addition to variability, mean shift, trend, overlap, and latency to change. Detection of sequence effects requires close attention to the ordering of conditions and patterning in the data (i.e., if responding in one condition is consistently elevated when preceded by one of the other conditions). The data in Figure 9.13 represent response rate in three different conditions, two interventions, and a no-intervention control condition. The top graph is easily interpreted. The mean level in Intervention 1 is higher than in the other two conditions, and there is no difference in mean level between Intervention 2 and the no-intervention condition. The data are relatively stable (i.e., there is little variability), and there are no trends to complicate interpretation. Thus, the difference in behavior between the 212

10 C

0 2

4

6

8

10

12

14

16

18

20

Sessions

Figure 9.13. Examples of data from multielement designs. Data are hypothetical.

conditions is obvious. The effects of each experimental manipulation on response rate are reliable (each repetition of a condition allows a test of reliability), and it would be extremely unlikely that some extraneous variable would happen to produce increases in response rate in each Intervention 1 session and none in any other session. Finally, the effect of Intervention 1 does not appear to be dependent on the prior condition, and in at least one case, the effect lasts when two consecutive Intervention 1 sessions are completed. These data provide compelling evidence that Intervention 1 is responsible for producing higher response rates than either Intervention 2 or the no-intervention condition. The middle graph contains more withincondition variability. The mean level is higher in the

Visual Analysis in Single-Case Research

Multiple-baseline designs. Multiple-baseline designs are frequently used in applied settings, either when it would be unethical to remove the treatment or because the treatment is expected to produce an irreversible effect. The design involves a series of comparison designs in which the researcher implements the treatment variable at different times across participants, behaviors, or contexts. A researcher visually analyzing data from a multiplebaseline design must evaluate whether there are convincing changes in mean level from baseline to treatment conditions, whether the effects are replicated across baselines, and whether changes in behavior occur only when the treatment is implemented for each baseline. Figure 9.14 illustrates data from a multiplebaseline design. In the top panel, a brief baseline precedes the intervention. The intervention produces level changes and mean shifts easily discriminated as behavior change. There is no latency to change, and there are no trends or overlap between data points across conditions to complicate data interpretation. As noted in the Comparison

40

Baseline

Intervention

30 20 10 A

0 40 Responses per Min

Intervention 1 condition; however, there is considerable overlap in the range of response rates observed in each condition. Because it is not clear that behavior is distinct across conditions, the question of causation is moot. In the third graph, the mean level of the data in Intervention 1 is high, the mean level in the no-intervention condition is low, but the data in Intervention 2 are more difficult to interpret. During some sessions, the response rate is low; during others, it is high. This graph shows a hypothetical sequence effect. Each time an Intervention 2 session follows an Intervention 1 session, the response rate is high; otherwise, the response rate is low. The rate during the Intervention 2 sessions is affected by the preceding condition, which complicates interpretation of the effects of Intervention 2. If the high-rate Intervention 2 sessions were merely a carry-over effect of Intervention 1, then the nointervention sessions that follow Intervention 1 sessions should show comparable high rates. A researcher who obtains findings of this sort will conduct further experimentation to clarify the processes responsible for the sequence effect.

30 20 10 B

0 40 30 20 10

C

0 2

4

6

8

10

12

14

16

18

20

Sessions

Figure 9.14. Example of data from a multiplebaseline design. Data are hypothetical.

Designs section, however, these data alone are insufficient to support causal statements. The behavior change could also have been caused by an extraneous variable that changed at the same time as the intervention (e.g., a change in classroom contingencies). If the latter were true, then one might expect this variable to affect behavior in the other baselines. To evaluate this, one examines the other baselines for behavior change that corresponds with the introduction of the intervention in the first graph (i.e., at Session 5). Figure 9.14 shows evidence of this, which strengthens the case that the intervention produced the behavior change observed in the top panel. Further evidence that the intervention is related to the behavior change must be gathered in the remaining 213

Bourret and Pietras

panels of Figure 9.14 because the intervention is implemented at different points in time across the baselines. In the second graph, the data in baseline show an upward trend. Because the effect of the intervention is a decrease in the mean level, it does not negatively affect identifying the change in behavior in the second phase. In examining the third graph, these baseline data were unaffected by the phase change depicted in the second graph, which provides further evidence that the behavior change observed in the first two graphs is a function of the intervention and not some uncontrolled variable. In the third graph, the baseline is relatively stable, and there is a large, immediate reduction in response rate after implementation of the intervention, which replicates the effects observed in the first two graphs, providing strong evidence of the effects of the intervention. Changing-criterion designs. In changing-criterion designs, a functional relation between the intervention and behavior change is established by (a) periodically changing the contingency specifying which responses (e.g., those with a force between 20 and 30 g) will lead to experimenter-arranged consequences and (b) observing that behavior approximates the specified criterion (Hall & Fox, 1977; Hartmann & Hall, 1976). Typically, the criterion in graphs of changing-criterion designs is indicated by horizontal lines at each phase. Visual analysis of changing-criterion designs, as with that of other designs, requires an assessment of variability, level, mean shift, trend, overlap, and latency to change but also requires an assessment of the relation between behavior and the criterion. Figure 9.15 shows data from Hartmann and Hall (1976), who used a changing-criterion design to assess the effectiveness of a smoking cessation program. During the intervention, the participant was fined a small amount of money for smoking above a criterion number of cigarettes and earned a small amount of money for smoking fewer cigarettes. In the top graph, the number of cigarettes smoked per day is shown across successive days of the intervention. In baseline, the number of cigarettes smoked per day was stable over the first 6 days but fell precipitously on Day 7. Ideally, the researchers would not have begun the intervention on Day 8, as 214

Figure 9.15. Example of data from a changingcriterion design. The figure shows the number of cigarettes smoked per day. Solid horizontal lines depict the criterion for each phase. From “The Changing Criterion Design,” by D. P. Hartmann and R. V. Hall, 1976, Journal of Applied Behavior Analysis, 9, p. 529. Copyright 1976 by the Society for the Experimental Analysis of Behavior, Inc. Used with permission.

they did, because if a trend line was drawn through these baseline data, the subsequent decreases in smoking would be predicted to occur in the absence of an intervention. Had the researchers collected more baseline data, they would likely have found that Day 7 was uncharacteristic of this individual’s rate of smoking and could have more clearly established the stable rate of baseline smoking. In subsequent phases (B–G), the criterion number of cigarettes was systematically decreased, as indicated by the horizontal line in each phase. Changes in the criterion tended to produce level changes, with many subsequent data points falling exactly on the criterion specified in that phase. Each phase establishes a new mean level approximating the criterion. There are no long latencies to change, and the variability in the data and overlap in data points across conditions are not sufficient to cause concern. Finally, with the exception of Phase F, there is no downward trend in any phase, suggesting that if the criterion remained unchanged, smoking would remain at the current depicted level. Thus, the visual analysis of these data raises concerns about the downward trend in the baseline, but these concerns are largely addressed by the repeated demonstrations of control over smoking rate in each condition. If the study were ongoing and concerns remained, the

Visual Analysis in Single-Case Research

researchers could set the next criterion (Phase H) above the last one. If smoking rate increased to the new criterion, then additional evidence for intervention control would be established while nullifying concerns about the downward trend in baseline.

Interpreting Relational Graphs Researchers who conduct time-series research may report their outcomes using relational graphs. In these cases, each data point represents the mean (or another appropriate measure of central tendency) of steadystate responding from a condition. When evaluating these data, measures of variability, such as error bars, are also assessed to help determine whether responding was stable (see Interpreting Bar Graphs section). Data on relational graphs are evaluated by analyzing the clustering and trend of the data points. Data that appear horizontal across all values of the x-axis indicate that the independent or predictor variable has no effect on behavior or that there is no correlation between the two dependent variables. Sometimes behavior changes in a linear fashion across the range of x-axis values of the independent variable. When both axes of the graph are scaled linearly, a linear relation indicates that changing the independent variable produces a constant increase or decrease in behavior. Nonlinear relations indicate that the behavioral effect of the independent variable changes across x-axis values. Figure 9.16 provides an example. Here, the subjective value of a $10 reward is plotted as a function of the delay to its delivery. Both axes are linear, and the relation between the variables is nonlinear. Relational graphs, when properly constructed, allow the researcher to quickly evaluate the relation between variables. It is common, however, to evaluate relational data more precisely with quantitative methods, including curve-fitting techniques (e.g., linear and nonlinear regression), Pearson’s correlation coefficient, and quantitative models (see Chapters 10 and 12, this volume). Curve-fitting techniques clarify the form of the relation between the independent and dependent variables, and Pearson’s correlation coefficient quantifies the relation between two dependent variables. Quantitative models may describe more complex behavior–environment relations and are used to make predictions about behavior. Even

Figure 9.16. Example of curve fitting. From “Delay or Probability Discounting in a Model of Impulsive Behavior: Effect of Alcohol,” by J. B. Richards, L. Zhang, S. H. Mitchell, and H. de Wit, 1999, Journal of the Experimental Analysis of Behavior, 71, p. 132. Copyright 1976 by the Society for the Experimental Analysis of Behavior, Inc. Used with permission.

when quantitative methods are used to describe data, however, visual analysis is used as a supplement. For example, visual analysis can help researchers choose which type of curve to fit to the data, evaluate whether data trends are linear and thus appropriate for calculating Pearson correlation coefficients, or determine whether the data have systematic deviations from the fit of a quantitative model. For example, Figure 9.16 shows the best fits of both exponential and hyperbolic models to the subjective value of delayed money. The figure shows that the exponential model systematically predicts a lower y-axis value than that obtained at the highest x-axis value. Reference lines may be added to relational graphs (or other graph types) to provide a point of comparison to the data. For instance, in a graph showing discrete-trial performances, such as matching-to-sample, reference lines may be plotted at values expected by chance. In a graph depicting choice data, reference lines might be plotted at values indicative of indifference. Conclusion Graphs provide clear and detailed summaries of research findings that can guide scientific decisions 215

Bourret and Pietras

and efficiently communicate research results. Visual analysis, as with any form of data analysis, requires training and practice. The use of visual analysis as a method of data interpretation requires graph readers to make sophisticated decisions, taking into account numerous aspects of the data. This complexity can make the task seem daunting or subjective; however, visual analysis in conjunction with rigorous experimental procedures is a proven, powerful, and flexible method for generating scientific knowledge. The development of effective behavioral technologies provides evidence of the ultimate utility of the visual analysis techniques used in behavior-analytic research. Data analyzed by means of visual inspection have contributed to a technology that produces meaningful behavior change in individuals across a wide range of skill domains and populations, including individuals with no diagnoses and those with diagnoses including attention deficit/hyperactivity disorder, autism, an array of developmental disabilities, pediatric feeding disorders, and schizophrenia, to name a few (Didden, Duker, & Korzilius, 1997; Lundervold & Bourland, 1988; Weisz, Weiss, Han, Granger, & Morton, 1995). Because of its history of effective application and advantages for the study of the behavior of individuals, behavior analysts remain committed to visual inspection as a primary method of data analysis.

References American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author. Barlow, D. H., & Hayes, S. C. (1979). Alternating treatments design: One strategy for comparing the effects of two treatments in a single subject. Journal of Applied Behavior Analysis, 12, 199–210. doi:10.1901/ jaba.1979.12-199 Baron, A., & Perone, M. (1998). Experimental design and analysis in the laboratory study of human operant behavior. In K. A. Lattal & M. Perone (Eds.), Handbook of research methods in human operant behavior (pp. 45–91). New York, NY: Plenum Press. Baum, W. M. (1973). The correlation-based law of effect. Journal of the Experimental Analysis of Behavior, 20, 137–153. doi:10.1901/jeab.1973.20-137 Cançado, C. R. X., & Lattal, K. A. (2011). Resurgence of temporal patterns of responding. Journal of the Experimental Analysis of Behavior, 95, 271–287. 216

Cleveland, W. S. (1994). The elements of graphing data (rev. ed.). Summit, NJ: Hobart Press. Cleveland, W. S., & McGill, R. (1984). Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association, 79, 531–554. doi:10.2307/2288400 Cleveland, W. S., & McGill, R. (1985). Graphical perception and graphical methods for analyzing scientific data. Science, 229, 828–833. doi:10.1126/ science.229.4716.828 Cooper, J. O., Heron, T. E., & Heward, W. L. (1987). Applied behavior analysis. Columbus, OH: Merrill. Davison, M., & Baum, W. M. (2006). Do conditional reinforcers count? Journal of the Experimental Analysis of Behavior, 86, 269–283. doi:10.1901/ jeab.2006.56-05 DeFulio, A., Donlin, W. D., Wong, C. J., & Silverman, K. (2009). Employment-based abstinence reinforcement as a maintenance intervention for the treatment of cocaine dependence: A randomized controlled trial. Addiction, 104, 1530–1538. doi:10.1111/j.13600443.2009.02657.x Didden, R., Duker, P. C., & Korzilius, H. (1997). Metaanalytic study on treatment effectiveness for problem behaviors with individuals who have mental retardation. American Journal on Mental Retardation, 101, 387–399. Ferster, C. B., & Skinner, B. F. (1997). Schedules of reinforcement. Acton, MA: Copley. (Original work published 1957) doi:10.1037/10627-000 Fisher, W. W., Kelley, M. E., & Lomas, J. E. (2003). Visual aids and structured criteria for improving inspection and interpretation of single-case designs. Journal of Applied Behavior Analysis, 36, 387–406. doi:10.1901/jaba.2003.36-387 Galbicka, G., Fowler, K. P., & Ritch, Z. J. (1991). Control over response number by a targeted percentile schedule: Reinforcement loss and the acute effects of d-amphetamine. Journal of the Experimental Analysis of Behavior, 56, 205–215. doi:10.1901/jeab.1991.56-205 Gallistel, C. R., King, A. P., Gottlieb, D., Balci, F., Papachristos, E. B., Szalecki, M., & Carnone, K. S. (2007). Is matching innate? Journal of the Experimental Analysis of Behavior, 87, 161–199. doi:10.1901/jeab. 2007.92-05 Gentry, G. D., Weiss, B., & Laties, V. G. (1983). The microanalysis of fixed-interval responding. Journal of the Experimental Analysis of Behavior, 39, 327–343. doi:10.1901/jeab.1983.39-327 Hall, R. V., & Fox, R. G. (1977). Changing-criterion designs: An alternative applied behavior analysis procedure. In B. C. Etzel, J. M. LeBlanc, & D. M. Baer (Eds.), New developments in behavioral research:

Visual Analysis in Single-Case Research

Theory, method, and application (pp. 151–166). Hillsdale, NJ: Erlbaum. Harris, R. L. (1996). Information graphics: A comprehensive illustrated reference. Atlanta, GA: Management Graphics. Hartmann, D. P., & Hall, R. V. (1976). The changing criterion design. Journal of Applied Behavior Analysis, 9, 527–532. doi:10.1901/jaba.1976.9-527 Johnston, J. M., & Pennypacker, H. S. (1993). Strategies and tactics of behavioral research (2nd ed.). Hillsdale, NJ: Erlbaum. Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. New York, NY: Oxford University Press. Lundervold, D., & Bourland, G. (1988). Quantitative analysis of treatment of aggression, self-injury, and property destruction. Behavior Modification, 12, 590–617. doi:10.1177/01454455880124006 Morley, S., & Adams, N. I. (1991). Graphical analysis of single-case time series data. British Journal of Clinical Psychology, 30, 97–115. doi:10.1111/j.2044-8260. 1991.tb00926.x Parsonson, B. S., & Baer, D. M. (1978). The analysis and presentation of graphic data. In T. R. Kratochwill (Ed.), Single-subject research: Strategies for evaluating change (pp. 101–165). New York, NY: Academic Press. Parsonson, B. S., & Baer, D. M. (1986). The graphic analysis of data. In A. Poling & R. W. Fuqua (Eds.), Research methods in applied behavior analysis: Issues and advances (pp. 157–186). New York, NY: Plenum Press. Parsonson, B. S., & Baer, D. M. (1992). The visual analysis of data, and current research into the stimuli controlling it. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case research design and analysis: New directions for psychology and education (pp. 15–40). Hillsdale, NJ: Erlbaum. Perone, M. (1991). Experimental design in the analysis of free-operant behavior. In I. H. Iversen & K. A. Lattal

(Eds.), Experimental analysis of behavior, Parts 1 and 2. Techniques in the behavioral and neural sciences (Vol. 6, pp. 135–171). New York, NY: Elsevier. Richards, J. B., Zhang, L., Mitchell, S. H., & de Wit, H. (1999). Delay or probability discounting in a model of impulsive behavior: Effect of alcohol. Journal of the Experimental Analysis of Behavior, 71, 121–143. doi:10.1901/jeab.1999.71-121 Shah, P., Freedman, E. C., & Vekiri, I. (2005). The comprehension of quantitative information in graphical displays. In P. Shah & A. Miyake (Eds.), The Cambridge handbook of visuospatial thinking (pp. 426– 476). New York, NY: Cambridge University Press. Shull, R. L. (1991). Mathematical description of operant behavior: An introduction. In I. H. Iversen & K. A. Lattal (Eds.), Experimental analysis of behavior, Parts 1 and 2. Techniques in the behavioral and neural sciences (Vol. 6, pp. 243–282). New York, NY: Elsevier. Sidman, M. (1960). Tactics of scientific research. Oxford, England: Basic Books. Staddon, J. E. R., & Simmelhag, V. L. (1971). The “superstition” experiment: A reexamination of its implications for the principles of adaptive behavior. Psychological Review, 78, 3–43. doi:10.1037/h0030305 Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677–680. doi:10.1126/science.103. 2684.677 Stewart, K. K., Carr, J. E., Brandt, C. W., & McHenry, M. M. (2007). An evaluation of the conservative dualcriterion method for teaching university students to visually inspect AB-design graphs. Journal of Applied Behavior Analysis, 40, 713–718. Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press. Weisz, J. R., Weiss, B., Han, S. S., Granger, D. A., & Morton, T. (1995). Effects of psychotherapy with children and adolescents revisited: A meta-analysis of treatment outcome studies. Psychological Bulletin, 117, 450–468. doi:10.1037/0033-2909.117.3.450

217

Chapter 10

Quantitative Description of Environment–Behavior Relations Jesse Dallery and Paul L. Soto

In 1687, Sir Isaac Newton invented a profound new way of thinking about how objects move across space and time. Newton tackled familiar empirical facts such as the observation that objects fall when dropped, but the description of falling objects was elevated to a new realm of precise, quantitative analysis. Among other feats, his analysis produced the universal law of gravitation, which yielded considerable predictive and practical advantages over previous accounts (Kline, 1959). The stunning successes of classical mechanics, of landing rockets on the moon, would not have been possible before Newton’s framework. In addition to its predictive and practical advantages, the universal law of gravitation unified seemingly disparate phenomena: It described moving bodies both here on Earth and in the heavens. Newton’s account also provided a foundation for attempts to explain gravity. As sciences have matured, from astronomy to zoology, so too has their use of quantitative analysis and their ability to describe, unify, and explain. The field of behavior analysis has witnessed similar transformations in how people describe an organism’s behavior across space and time. Fortunately, one does not need Newton’s calculus to understand and appreciate these exciting advances in behavioral science. In this chapter, we explain key techniques involved in quantitative analysis. We describe how quantitative models are evaluated and compared. To provide some theoretical backbone to our explication of techniques, we use a model of choice known as matching theory as

a case study in quantitative analysis. Although we have selected matching theory as our case study because of its widespread application in behavior analysis, the techniques and issues we discuss could be generalized to any quantitative model of environment–behavior relations, and where appropriate we highlight these extensions. On the Utility of Quantitative Models Models specify relations among dependent variables and one or more independent variables. These relations can be specified using words (a verbal model) or mathematics (a mathematical or quantitative model). Verbal models can be just as useful as quantitative models in describing the causes of behavior. For instance, an applied behavior analyst does not need quantitative models to assess determinants of self-injurious behavior or to generate verbal behavior for a child with autism. However, there are also examples in which quantitative models are useful in the applied realm (Critchfield & Reed, 2009; Fisher & Mazur 1997; Mazur, 2006; McDowell, 1982). Quantitative models of choice can help the analyst tease apart controlling variables in a naturalistic setting (McDowell, 1982; see Fuqua, 1984, for some caveats to this assertion), or they may be useful in terms of evaluating preferences for treatment options (Fisher & Mazur, 1997). Even if the benefits are not immediate, knowledge of quantitative accounts could lead to alternative treatments

We thank Rachel Cassidy and Bethany Raiff for comments on an earlier version of this chapter. DOI: 10.1037/13937-010 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

219

Dallery and Soto

(McDowell, 1982), or it could inspire new insights into the determinants of problem behavior (Critchfield & Kollins, 2001; Fisher & Mazur, 1997; Nevin & Grace, 2000; see Volume 2, Chapters 5 and 7, this handbook). Behavioral scientists should appreciate and understand a wide variety of analytic methods to understand the determinants of behavior. Quantitative analysis entails a rich and powerful set of tools. From 1970 to 2000, basic behavior-analytic science saw an increase from 10% to almost 30% in articles published in the Journal of the Experimental Analysis of Behavior that used equations to describe behavior (Mazur, 2006). In addition to understanding the advances described in these articles, other intellectual and practical payoffs are derived from knowledge of quantitative analysis. These benefits may be delayed, but they can be profound. The English philosopher Roger Bacon (c. 1214–1294, as quoted in Kline, 1959) noted, Mathematics is the gate and the key of the sciences. . . . Neglect of mathematics works injury to all knowledge, since he who is ignorant of it cannot know the other sciences or the things of this world. And what is worse, men who are thus ignorant are unable to perceive their own ignorance and so do not seek a remedy. (p. 1) Quantitative models have at least four benefits. First, quantitative models force a greater degree of precision than their verbal counterparts. Assume several reinforcer rates have been experimentally arranged, from low to high, and the response rates obtained at each reinforcer rate have been measured. A verbal description might assert that at low reinforcer rates, increases in reinforcer rate produce rapid increases in response rate, but that response rate eventually reaches a maximum at higher reinforcer rates. The same description of the relation between response rate and reinforcer rate is embodied by Herrnstein’s hyperbolic equation (described in more detail later). Herrnstein’s equation states succinctly and precisely how behavior will change with increases in reinforcement. Moreover, Herrnstein’s equation precisely asserts how reinforcement from the experimentally arranged schedule of 220

reinforcement interacts with other, background sources of reinforcement. Assumptions about and descriptions of interactions between variables can be particularly troublesome and obscure if formulated in verbal terms. Quantitative models force specificity about the nature of these interactions. For example, Staddon (1984) analyzed a model of social dynamics by expressing the model quantitatively. Staddon’s quantitative analysis revealed several inconsistencies (and even contradictions) in the corresponding verbal formulation. Staddon concluded that “unaided verbal reasoning is almost useless in the analysis of the dynamics of interactions, human or mechanical” (p. 507). Second, the predictions of a quantitative model are more specific than the predictions of a verbal description, which allows one to observe minor but systematic deviations from the predictions. For instance, one could verbally describe where a projectile might land after being launched into the air (e.g., “over there”), or one could use an equation to make a precise prediction (e.g., 157 feet, 2 inches, due north). Any deviation from the latter location may be precisely measured, and records may be kept to determine whether the equation makes systematic errors (e.g., it consistently overestimates the distance traveled). Similarly, one could hypothesize verbally about how much reinforcement would be necessary to establish some level of appropriate behavior (e.g., one might say one needs to identify a “powerful” reinforcer or deliver the reinforcer “frequently”). As with the estimate of the projectile, a hypothesis might be qualitatively correct, but it will be more precise if one uses established equations of reinforced responding. Admittedly, this degree of precision may not be necessary in an applied context. Nevertheless, a hallmark of behavioral science is the predictive power of its explanations, and, as these examples imply, quantitative models surpass verbal models in making predictions. Third, to the extent that the predictions and conditions under which one should observe them are more precise, the more falsifiable the theory becomes. As the philosopher of science Karl Popper (1962) noted, falsifiability is a virtue of scientific theory. Consider Sir Arthur Eddington’s test of Albert Einstein’s theory of general relativity. One of

Quantitative Description of Environment–Behavior Relations

general relativity theory’s critical predictions is that gravity bends light. To test this prediction, Eddington measured the amount of shift in light from stars close to the sun (Silver, 1998). The sun’s powerful gravitational field, according to general relativity theory, should produce measurable shifts in the light emanating from the nearby stars. Eddington waited for an eclipse (the eclipse of 1919), when the stars adjacent to the sun were observable, and he measured the amount of shift in light emanating from those stars. He found not only that the light from these stars did indeed bend around the sun, but also that the exact amount of shift was as predicted by Einstein’s theory. An observation (or more realistically, a series of observations) to the contrary would have falsified the theory. Similarly, Herrnstein (1974) noted that certain predictions made by his quantitative theory of reinforced responding were especially vulnerable to “empirical confirmation or refutation” (p. 164). All good scientific theories, whether verbal or quantitative, are falsifiable. One virtue of most quantitative theories is that the conditions under which one can refute them are relatively clear (McDowell, 1986). Fourth, quantitative modeling encourages researchers to unify diverse phenomena; to see similarities in the determinants of behavior across settings, cultures, and species (Critchfield & Reed, 2009; Lattal, 2001; Mazur, 2006; see Chapter 7, this volume). Another way to put this is that a goal of a science of behavior is to discover invariance, or regularity in nature. For example, people see the diversity of structure in the animal kingdom as a result of evolutionary processes and the diversity of geological events as a result of tectonic processes. The mathematician Bell (1945, p. 420) defined invariance as “changelessness in the midst of change, permanence in a world of flux, the persistence of configurations that remain the same despite the swirl and stress of countless transformations” (also quoted in Nevin, 1984, p. 422). Skinner (1950/1972) also saw the discovery of invariance as a worthy goal: Beyond the collection of uniform relationships lies the need for a formal

representation1 of the data reduced to a minimal number of terms. A theoretical construction may yield greater generality than any assemblage of facts. It will not stand in the way of our search for functional relations because it will arise only after relevant variables have been found and studied. Though it may be difficult to understand, it will not be easily misunderstood. (p. 100). Skinner’s (1950/1972) assessment of theory, however, was tempered by a recommendation that psychologists must first establish an experimental analysis of how relevant variables affect behavior. Ultimately, however, quantitative theory increases the precision and generality of explanations and improves the ability to predict and influence behavior. Structure and Function of Quantitative Models Although quantitative models can generate new predictions and explanations of behavior, they are often developed inductively on the basis of descriptions of empirical facts, and this is where we start. After rigorous, parametric, experimental analysis, researchers plot a dependent variable (e.g., response rate, interresponse times, ratio of time spent engaged in one activity over another activity) as a function of an independent variable (e.g., reinforcer rate, time in session, reinforcers delivered for one activity relative to another activity) in graphical space. They examine the shape of the relation. Is it described by a straight line or a curve? In behavioral science, relations characterized by straight lines or monotonic curves (curves that change in one direction) are common. Researchers may also model behavioral processes that show a bitonic relation, which is a curve that changes in two directions (e.g., a curve that increases and then decreases or vice versa). Although these shapes are not complicated, the equations that describe them may appear to be. If the equations are intimidating at first, start with the shapes and the specific environment–behavior

A formal representation means a mathematical representation.

1

221

Dallery and Soto

relations described by the shapes. Therefore, use careful visual inspection (see Chapter 9, this volume) of the data. The importance of visual analysis is illustrated nicely by Anscombe’s quartet (Anscombe, 1973), which is a series of four datasets that are shown in Figure 10.1. Across the four panels of the figure, the mean and variance of the data points in each panel are equivalent, as is the linear equation describing the relation between the independent (x-axis) and dependent (y-axis) variables: y = 0.5x + 3. Even a casual visual analysis, however, reveals that the shape of each dataset is remarkably distinct and that only the data in the upper left panel are appropriately described by a linear equation. A purely quantitative analysis alone is not sufficient. Careful use of quantitative analysis involves careful visual inspection of the data (Parsonson & Baer, 1992; see Chapter 9, this volume), not to mention rigorous experimental analysis (Perone, 1999; Sidman, 1960; see Chapter 5, this volume). After a visual analysis of the relation between independent and dependent variables, one must

dissect the anatomy of the equation that describes the relation. In behavioral and psychology journals, equations may appear in the introduction of an article. Thus, dissecting the equation may require some detective work and skipping ahead in the article to find a graph that shows the shape described by the equation. Comparing the equation with the graph is useful. Here is an example of a common equation in behavioral science:  r  R = k .  r + re 

This equation, known as Herrnstein’s hyperbolic equation, describes a hyperbolic relation between reinforcer rate, r, and response rate, R. Figure 10.2 shows two examples of this hyperbolic shape described by the equation. The first step to understanding a new equation is to identify the variables in the equation, or the environmental and behavioral events that are measured directly in the experiment. In the case of Herrnstein’s hyperbolic equation, the experimenter measures response rate,

Figure 10.1. Anscombe’s (1973) quartet. The four graphs show different data sets that are described by the same linear equation, which is indicated by the straight line in each panel. The datasets also have the same mean and variance. From “Graphs in Statistical Analysis,” by F. J. Anscombe, 1973, American Statistician, 27, pp. 19–20. Reprinted with permission from The American Statistician. Copyright 1973 by the American Statistical Association. All rights reserved. 222

(1)

Quantitative Description of Environment–Behavior Relations

Figure 10.2. Two examples of Herrnstein’s hyperbolic equation, which predicts how response rate changes with increases in reinforcer rate. The only difference between the two curves is the value of re.

R, as a function of reinforcer rate, r. For example, assume one measured rates of engaging in disruptive behavior as a function of rates of social attention for this behavior. The dependent variables (e.g., response rates) will always appear on the left side of the equation, and the independent variables (e.g., reinforcer rates) will always appear on the right side of the equation. Next, identify the parameters. The parameters are the quantities in the equation that are not directly measured; rather, they are estimated statistically (see Evaluating Linear Models: Linear Regression and Evaluating Linear Models: Nonlinear Regression sections). The parameters affect the shape and position of the curve (or line), and a computer program that conducts regression analyses will find values for the parameters so that the curve (or line) comes as close to the data as possible (called best-fit parameter values). To illustrate how the value of a parameter affects the shape of the curve in Figure 10.2, we used two values for the parameter re and the same value for the parameter k. If one considers just the top curve, for example, the curve predicts response rates given one value of k, one value of re, and all reinforcer rates between zero and 200 reinforcers per hour. The curve can also predict response rates for higher reinforcer rates, but for practical purposes, we do not show these in the graph. The value of re is small in the top curve. In terms of the equation, the value of re does not produce a large impact in the denominator, and so

response rates increase rapidly with increases in reinforcer rates. As re increases, however, as in the bottom curve, the effects of increases in reinforcer rates are essentially dampened. Because r and re are summed in the denominator, smaller changes occur in R with increases in reinforcer rates. Parameters usually reflect the operation of environmental, behavioral, or physiological processes. For example, the parameter re in Herrnstein’s hyperbolic equation is thought to reflect the rate of background, extraneous reinforcers. In our example, these extraneous reinforcers represent all of the reinforcers in the child’s environment (video games, food, etc.) except for the reinforcers delivered as a consequence of disruptive behavior. The degree to which the interpretations of parameters are confirmed empirically is an interesting and exciting feature of quantitative modeling. As we discuss in more detail in the Evaluating Nonlinear Models: Nonlinear Regression section, even if an equation describes a dataset extremely well (e.g., the curve comes very close to the obtained data), it does not necessarily mean that the interpretations of the parameters are supported. Eventually, such inferences should be supported by direct experimental analysis of the controlling variables, whether biological or environmental (Dallery & Soto, 2004; Shull, 1991). For instance, if re is thought to measure extraneous reinforcers, its value should increase or decrease as reinforcers are added or subtracted, respectively, from the environment in which the target behavior is measured. One way to understand an equation is to plot the equation using multiple parameter values as in Figure 10.2. Creating multiple plots of an equation can easily be done using a spreadsheet program such as Microsoft Excel (at the time of this writing, Excel is perhaps the most common spreadsheet program in use). As described earlier, Figure 10.2 illustrates that the smaller the value of re, the steeper the curve rises toward its asymptote, k. Although not shown, plotting Equation 1 with different values of k illustrates how increases in k increase the maximum value reached by the curve (100 in Figure 10.2). Understanding how the shape of the function changes as the parameter values change is essential because changes in the shape reflect changes in 223

Dallery and Soto

behavior. In an applied context, increasing re as depicted in Figure 10.2 would mean that rates of disruptive behavior would decrease, even if reinforcer rates for disruptive behavior remain the same (compare the predicted response rates between curves when the reinforcer rate is 100 reinforcers/ hour; McDowell, 1986). There is obviously more to dissecting and digesting a quantitative model than examining the shape defined by the equation. Researchers need to ask whether they have enough data to evaluate the model (e.g., a general rule of thumb is to have twice as many data points as the number of parameters), whether there are known properties of behavior or alternative models that they need to consider in evaluating the model, and whether they need to consider statistical or theoretical requirements for estimated parameter values (we discuss these considerations in more detail in the Evaluating Linear Models: Linear Regression and Evaluating Nonlinear Models: Nonlinear Regression sections). This brief introduction provides a starting point from which to approach equations in behavioral science. All equations in behavioral science share the same general structural and functional characteristics. The tactics presented in this section represent a broad, yet effective strategy to dissect equations. They may be summarized as follows: Visually inspect the shape of the dataset when plotted in graphical space, decompose the variables and parameters in the equation, explore how the plot of the equation changes as the parameter values change, and consider what these changes mean in terms of behavior and its causes. Introduction to a Quantitative Description of Choice One purpose of quantitative analysis is to understand the complex interaction between the determinants of behavior (e.g., biological, environmental, pharmacological) and some measure of behavior (e.g., response rate, latency). For instance, researchers may be interested in when and why a bee travels to a new field of flowers for nectar, why a shopper on a diet picks up some candy in the checkout line, or why a pigeon in an operant chamber impulsively 224

pecks a lighted key for an immediately delivered small amount of grain instead of waiting for a larger amount. These examples of choice, of choosing one activity over another, are amenable to quantitative analysis. One common way in which psychologists study choice in the operant laboratory is by using concurrent schedules of reinforcement. In the pigeon example, two concurrent, mutually exclusive alternatives were available (pecking for small payoffs or waiting for larger payoffs). This situation is also common in naturalistic, clinical settings: A child may choose to engage in disruptive behavior rather than appropriate behavior (Borrero & Vollmer, 2002); a smoker may choose to light up or forgo a cigarette (Dallery & Raiff, 2007). The relative rates at which the behavior occurs—in the laboratory or in the world outside the laboratory—can be powerfully governed by relative rates of reinforcement earned for engaging in them. To quantitatively model the relation between relative responding and reinforcement over time, the response and reinforcer rates need to be denoted in mathematical terms. In the case of the pigeon responding on a two-alternative concurrent schedule of reinforcement, the rate of responding on one alternative can be represented by R1, and the rate of responding on the other by R2. The rate at which reinforcers are delivered on the two alternatives can be represented by r1 and r2. Some authors use different letters, but they signify the same quantities. In 1961, Richard Herrnstein examined the relation between rates of reinforcement and responding in a seminal study. His subjects were pigeons, and they could peck two lighted keys that were mounted on the front panel (one on the left and one on the right) of an experimental operant chamber. Pecking the keys sometimes resulted in brief access to food. Specifically, the rates of food access varied according to variable-interval (VI) schedules of reinforcement, which delivered a reinforcer for the first response after some interval had elapsed. For example, the left key may have been scheduled to provide access to food every 30 seconds on average, and the right key may have only provided access to food once every 60 seconds on average. After varying these rates of reinforcement across a wide range of values, Herrnstein plotted the proportion of responses allocated to one

Quantitative Description of Environment–Behavior Relations

Figure 10.3. Examples of matching and deviations from matching. The left panel shows perfect matching. The right panel shows two common deviations from perfect matching. The dashed curve is an example of bias, and the dotted curve represents an example of insensitivity.

key (in our example, responding on the left key is designated as R1) as a function of the proportion of reinforcers delivered on that key. Herrnstein found that the shape of the data path was a straight line. See the diagonal line in the left panel of Figure 10.3. Data points indicate hypothetical obtained response and reinforcer proportions. The straight line predicts that the proportion of responses should equal, or match, the proportion of reinforcers obtained at that alternative. So, for example, if the proportion of reinforcers on the left key equaled .25, then the proportion of behavior allocated to that side would also equal .25. This shape can be described by an equation: R1 r = 1 . R1 + R 2 r1 + r2

(2)

Equation 2 describes a straight line passing through the origin (x = 0, y = 0) with a slope of 1—a relation termed perfect matching. Herrnstein’s (1961) study, and Equation 2, has inspired hundreds of studies on choice. The applications have ranged from the laboratory to clinical settings (Borrero & Vollmer, 2002; Murray & Kollins, 2000; St. Peter et al., 2005; Taylor, Lincoln, & Foster, 2010; see Volume 2, Chapter 7, this handbook), social dynamics (Borrero et al., 2007; Conger & Killeen, 1974), and education (Martens & Houk, 1989). The matching law has also served as a basis for studies in behavioral ecology (Houston, 1986) and behavioral neuroscience (Lau & Glimcher, 2005).

As the predictions of Equation 2 were compared with behavior, however, researchers soon discovered that perfect matching was actually a special case of a more generalized equation (Baum, 1974; Staddon, 1968). Obtained data tended to deviate in two ways from the straight line shown in the left panel of Figure 10.3. First, animals sometimes showed a bias for one alternative over another. Perhaps one response key was easier to peck or, in a naturalistic setting, the attention derived from disruptive behavior was “better” than the minor attention for appropriate behavior. In the right panel of Figure 10.3, the impact of such asymmetries on behavior is shown by the dashed curve. Because the curve is bent upward, one can deduce in the context of our example that left responses are occurring more frequently than predicted by Equation 2 (the solid line). Note that this bias toward the left alternative is not caused by the rates of reinforcement associated with the left and right alternatives. Second, animals also showed some degree of insensitivity to changes in the rates of reinforcement. The dotted curve shows an example of such insensitivity; note how the curve bends toward indifference, or a proportion of .5. The curve is above the straight line of equality at x-axis values less than .5 and below the straight line at x-axis values above .5. A single equation describing sensitivity to relative reinforcer rate across the two alternatives and bias toward one alternative over the other (not 225

Dallery and Soto

resulting from differences in reinforcer rate) would be useful. Unfortunately, bias and sensitivity are hard to detect visually and to quantify when proportions are used. They are hard to detect because a plot of an animal’s behavior may show biased responding (e.g., the plot tends to bend upward) and insensitivity (the plot also bends toward indifference) at the same time. Also, data from live organisms usually demonstrate more variability than our graphical example, which further complicates visual detection of bias and insensitivity. Staddon (1968) found that plotting ratios instead of proportions aided in visually detecting deviations from strict matching. The ratio form of matching is algebraically equivalent to the proportional form (to obtain the ratio form from the proportional form, invert both sides of Equation 2, separate the terms involved, and subtract one from both sides), and it is written R1 r1 = . R 2 r2

(3)

Baum (1974) modified Equation 3 to account for deviations from matching by adding two parameters, a and b, to Equation 3: r  R1 = b 1  , R2  r2 

where a is the slope, log b is the intercept, and the logged response ratio and logged reinforcer ratios are the dependent and independent variables, respectively. This straight line is different from the line described by Equation 2 in that its slope and intercept can vary; perfect matching specifies a slope of 1 and an intercept of 0. Because the logarithmic form of the generalized matching equation is a straight line, one can evaluate the equation’s descriptive accuracy by fitting a line to the data. Bias is indicated by the intercept of the straight line, log b, and sensitivity is indicated by the slope of the line, a. One example of a line described by Equation 5 is shown in Figure 10.4. First, note that the slope is less than 1.0. This means that for every unit change in the reinforcer ratio, there is less than a one-unit change in the response ratio. Considering what the value of the slope means in terms of unit changes in the dependent and independent variables is often useful. Second, note that the line is elevated above the origin, which shows some bias for the response alternative in the numerator, which is R1, or the left key, in our example. Of course, the line could be steeper than 1.0, and it could intersect the ordinate below the origin. Consider what this

a

(4)

where a quantifies sensitivity to the ratio of reinforcer rates on alternatives 1 and 2, and b quantifies bias. Equation 4 is known as the generalized matching equation, or power–function matching, because the independent variable is raised to a power, a. Traditionally, Equation 4 is evaluated by plotting the logarithm of the response ratio as a function of the logarithm of the reinforcer ratio. Logarithmic transformation2 of Equation 4 results in a straight line when the equation’s predictions are plotted in graphical space. The logarithmic form of the generalized matching equation is r  R  log  1  = a log  1  + log b,  r2   R2 

(5)

Figure 10.4. An example of the generalized matching equation. The line represents a bestfitting line obtained via least-squares regression to the obtained response and reinforcer rates. See the text for the equation. The data points in columns E and F from Table 10.2 are plotted.

The logarithm of the product of two numbers is the sum of the logarithms of the individual numbers, and the logarithm of an exponentiated number is the product of the exponent and the logarithm of the number.

2

226

Quantitative Description of Environment–Behavior Relations

would mean in terms of behavior. It would mean that the change in response ratios is more than one unit for every unit change in reinforcer ratios (called overmatching; Baum, 1979) and that there is a bias toward the response alternative in the denominator, the right key. Now one has a single, elegant equation to describe how two variables (i.e., reinforcer rates on two alternatives), other asymmetries in the properties of the responses or reinforcers (i.e., changes in the intercept or bias), and the degree of insensitivity to changes in reinforcer rates (i.e., change in the slope or sensitivity) affect choice. One log transforms the ratios to evaluate the generalized matching equation for several reasons. The main reason, as implied earlier, is that it is easier to detect departures from strict matching. The log transform also turns the curves produced by bias and insensitivity into straight lines (McDowell, 1989), and thus bias and insensitivity are reflected by changes in intercept and slope, respectively. Indeed, quantitative analysis has several useful data transformations, which are used for a variety of reasons, for example, to increase interpretability of a graph or to meet assumptions for statistical analysis (e.g., normality). Some balk at the idea of transforming behavioral data; after all, one does not observe logarithms of behavior. This argument misses the point of transforms, which is to magnify the ability to describe and explain environment–behavior relations. For instance, a logarithmic transform might help one to see the spread of data across the range of values of the independent variable, especially when that range is very large. That is, when plotting logarithms of the data, the spacing of data between numbers 1 and 10 is the same as the spacing between the numbers 10 and 100. Indeed, even a simple measure of response rate is a transformation. Transformations can have advantages, much as the metric system has advantages over the U.S. system of measurements (see Shull, 1991, and Tukey, 1977, for further discussion of transformations). Evaluating Linear Models: Linear Regression The generalized matching equation states that the relation between log reinforcer and log response

ratios can be described by a straight line—but is it? If visual inspection suggests that the data are orderly, sensible, and linear, one can evaluate this theory of choice statistically by performing linear regression. More generally, when researchers want to evaluate a relation between an independent and a dependent variable, and the relation between them can be graphed as a straight line, they use linear regression (if the variables are not linearly related, they use nonlinear regression, which is covered in the Evaluating Nonlinear Models: Nonlinear Regression section). When the two variables are not causally related, researchers use a correlation coefficient. For example, one would not want to perform regression to describe the relation between response rate and response latency. As a rule of thumb when deciding between regression and correlation, ask whether the independent variable plotted along the abscissa is causing the changes in the dependent variable. If it is, use regression. If the variable plotted along the abscissa is a dependent variable (response latency), and so is the variable plotted along the ordinate (e.g., response rate), then a correlation is more appropriate. Also, before conducting a so-called ordinary regression analysis, whether linear or nonlinear, it is important to ensure that the assumptions of ordinary regression are not violated. For convenience, these assumptions are listed in Table 10.1. Consider the data shown in Table 10.2. These data show how a human might interact during a discussion with two confederates. The dependent variable is how much the individual interacts with each confederate or, more specifically, the rate of words per minute directed toward each confederate (columns C and D). The independent variable is the rate of social reinforcers per hour (e.g., agreement statements, nods; columns A and B) delivered by each of the confederates. The social reinforcers are delivered according to VI schedules. The rates of reinforcement are varied across 10 conditions. Thus, when the reinforcer rate is 5.97 reinforcers per hour for Confederate 1 and 4.63 reinforcers per hour for Confederate 2, the rates of words per minute directed toward each confederate are 6.84 and 4.66, respectively. It is important to remember the units in which the variables are expressed; they are important in understanding any quantitative model. 227

Dallery and Soto

Table 10.1 Assumptions of Ordinary Least-Squares Regression ■■ ■■ ■■ ■■

The values of the independent variable are known precisely. All of the variability in the data is in the values of the dependent variable. The variability in the dependent variable, the error, follows a normal bell-shaped distribution. The degree of error, or scatter, in the dependent variable is the same at all points along the line (or curve). Another way to say this is that the standard deviation of the scatter must be the same at all points along the line (or curve), which is known as homoscedasticity. The observations are independent.

Note. From Fitting Models to Biological Data Using Linear and Non-Linear Regression: A Practical Guide to Curve Fitting (p. 57), by H. Motulsky and A. Christopoulos, 2004, Oxford, England: Oxford University Press. Copyright 2004 by Oxford University Press. Adapted by permission of Oxford University Press, Inc.

Table 10.2 Regression Calculations for a Linear Model

Condition 1 2 3 4 5 6 7 8 9 10

A: rnf rate

B: rnf rate

C: rsp rate

D: rsp rate

E: log rnf

by Conf. 1

by Conf. 2

to Conf. 1

to Conf. 2

ratio

5.97 2.35 7.09 1.55 11.08 2.42 4.96 2.46 4.57 1.13

4.63 5.46 2.31 11.15 2.20 3.55 1.17 4.04 2.55 5.29

6.84 4.40 6.29 2.50 8.60 1.20 2.06 1.80 2.48 0.91

4.66 3.77 1.55 11.38 2.01 1.96 0.55 2.07 1.42 2.48

.11

F: Log rsp G: predicted log ratio (Y )

rsp ratio (Y′)

−.37 .49 −.86 .70

.17 .07 .61 −.66 .63

.19 −.19 .49 −.58 .66

−.17 .63

−.21 .57

−.03 .60

−.22 .25

−.06 .24

−.07 .30

−.67

−.44

−.43

H: Y − Y′ I: (Y −Y′)2 −0.02 0.26 0.12 −0.08 −0.03 −0.18 −0.03 0.01 −0.06 −0.01

0.0004 0.0676 0.0144 0.0064 0.0009 0.0324 0.0009 0.0001 0.0036 0.0001

Note. Columns A–D show response and obtained reinforcer rates for a subject engaging in speech directed at Conf. 1 or 2. Columns E and F present the logged rsp. and rnf. rates, which are the data used to perform linear regression. Column G shows the predicted log rsp. ratios calculated by Equation 5 with a = .80 and log b = .10. Column H shows the residuals, and column I shows the squared residuals. The sum of the squared residuals is SSREG = 0.13, the sum of squared deviations from the mean is SSTOT =1 .78. The r2 is calculated as 1− (SSREG/SSTOT) = .93. Conf = confederate; rnf = reinforcement; rsp = response.

Columns E and F show the logged reinforcer ratios (r1/r2) and response ratios (R1/R2), respectively. The data from columns E and F are the data plotted graphically in Figure 10.4. The independent variable, the logged reinforcer ratio, is always plotted along the abscissa; it is important that the independent variable be plotted along the abscissa and the dependent variable be plotted on the ordinate when performing linear regression.

Linear regression may be calculated with a statistics program or by an algorithm in a spreadsheet program. These programs start with initial values of the slope and intercept and then adjust these values over a series of iterations so that the line is as close to the obtained data as possible.3 The initial values used by programs may be unknown to the user. If one uses a spreadsheet with regression capabilities, for example, one needs to enter the initial values

The parameters of a line can also be calculated directly from formulas derived from linear regression theory rather than minimizing the residual sumof-squares. We chose to illustrate the curve-fitting approach to calculating the parameter values because the logic should be easier to understand in the context of a more familiar function form (i.e., a line) relative to less familiar forms (i.e., curves) and because this method can be applied to both linear and nonlinear regression.

3

228

Quantitative Description of Environment–Behavior Relations

into the spreadsheet. Currently, for pedagogical purposes, assume the starting values must be entered. The starting values will depend on the particular dataset being evaluated. One shortcut in determining starting values is to inspect the plotted data. If the slope of the line is positive, one can use 1 as its starting value. If it is negative, one can use −1. One can also examine where the line will intersect the ordinate at zero on the abscissa, and then enter that value as the starting value for the intercept. Once these values are entered, the equation makes predictions about the response ratios. For example, let us use an intercept of 0 and a slope of 1 and predict the logged response ratio when the logged reinforcer ratio is 0.11. Insert these values into the right side of Equation 5: r  R  log  1  = a log  1  + log b,  r2   R2  R  log  1  = 1 ( 0.11) + 0, and  R2  R  log  1  = 0.11.  R2  The predicted log response ratio is 0.11. The prediction is not too far off from what we observed (0.17; first row in column F). Because the regression analysis has already been conducted, however, we know the prediction could be better. Note that one could, of course, select values of a and b such that the prediction at one x value exactly equaled the obtained y value, but the key is to bring the fitted line as close as possible to all the data points. The vertical distances or deviations between the calculated predictions and the obtained data points are called residuals. We do not show all of the calculations here, but there will be residuals for all of the data points; for our single data point the residual is 0.17 − 0.11 = 0.06. At first, given the initial parameter values, the residuals will be large. The algorithm has not adjusted the parameter values to make these distances smaller, which is the same thing as making the line fall closer to the data. How does the algorithm make these distances smaller? Whether the distances are positive or negative does not

matter; one can square the residuals and then sum them. (There is another important reason to square the residuals, which we come back to this later in the Rationale for Squaring the Sum of Squared Deviations in Regression Analysis section). We want this sum of squared residuals to be as small as possible, so the algorithm seeks to minimize this sum. The procedure is as follows: The program increases or decreases the parameter values and asks, is the sum of squared residuals smaller than the last iteration? The program continues to change the values to make the sum smaller. At some point, when changes in the parameter values produce very small changes in the sum of squared residuals, the algorithm stops and reports the best-fitting parameter values. The description of Table 10.2 lists the best-fitting parameter values obtained via least-squares regression (which is called least squares because of the process of minimizing the sum-of-squared deviations between the line and the data). In column G, we show the predicted data points, labeled Y′. Equation 5, the generalized matching equation, predicts these data points—they can be calculated by hand by inserting the slope and intercept values given in the table description and any one of the log reinforcer ratios in Table 10.2. The deviation between the obtained data, Y, and the predicted data, Y′, is shown in column H. If each of these values is squared, we obtain the squared deviations, shown in column I. The sum of these deviations in column I is what the algorithm minimizes to obtain the bestfitting parameter values. To show the accuracy of the regression model visually, vertical lines can be drawn between each data point and the predicted line (these lines represent the residuals), which is shown in the left panel of Figure 10.5. The distances are very small. In linear (and nonlinear) regression, these distances are compared with the distances given by another model, the null hypothesis. The null model asserts that the mean of the obtained data can account for (predict) the results better than the more complicated two-parameter model. The null model is more parsimonious. There are no parameters, just the calculated mean of the data. The accuracy of this model is shown in the right panel of Figure 10.5. To quantitatively characterize this visual comparison 229

Dallery and Soto

Figure 10.5. A visual representation of how linear regression is a matter of model comparison. The plot on the left shows the regression model, the alternative hypothesis. The plot on the right shows the null model. In linear regression, one compares the vertical distances between the data points and the line on the left to the vertical distances between the data points and the horizontal line on the right.

between the two models, sum the squared residuals in the regression model (denoted SSREG), and sum the squared deviations in the null model (denoted SSTOT; TOT stands for total deviation from the mean). The ratio of these quantities is subtracted from 1, which provides the first index of whether the model is accurate:  SS  r 2 = 1 −  REG .  SSTOT 

(6)

The r2 resulting from the linear regression analysis is shown in Table 10.2. If the value of SSREG is close to SSTOT, then the value of the ratio will be close to 1, which will result in an r2 close to 0. If SSREG is very small compared with SSTOT, the ratio will be small and the r2 value will be close to 1. Because r2 is a ratio, it can be negative when the null model predicts the data better than the regression model. In other words, a negative value is obtained when SSREG is larger than SSTOT. Some authors designate the r2 in different ways, such as by reporting the variance accounted for in terms of a percentage (r2 multiplied by 100), abbreviated VAF or VAC. Whether r2, variance accounted for, or even variance explained, all of these represent a relative comparison of distances between the data and the predicted line (or curve; see the section Evaluating Nonlinear Models: Nonlinear Regression later in this chapter) 230

and the distances between the data and a horizontal line at the mean. In other words, r2 is a ratio of how well the model predicts the data relative to the simpler null model. The r2 is just the first step in evaluating the model’s accuracy. Examine the left panel in Figure 10.5, particularly the patterns of the data points around the line. One can see that about the same number of data points are above or below the line, that the data points are approximately the same distance from the line, and that the deviations, positive or negative, are not systematically related to the x-axis values, a condition known as homoscedasticity. Violations of homoscedasticity, a condition known as heteroscedasticity, means that there is a serious problem with the data or the model, even if one has a high r2. To understand why, examine Figure 10.6. The left and middle panels show different examples of heteroscedasticity. In the left panel, the model (the line) obviously and systematically misses the data. When the log reinforcer ratio is negative, the line overpredicts and then underpredicts the data. When the log reinforcer ratio is positive, the line tends to overpredict the data. The systematic pattern of deviations violates the requirement of homoscedasticity. In the middle panel, it is harder to see how the data depart from homoscedasticity, which is often the case in quantitative modeling. Although the r2 may be high, it does not tell the whole story. Careful visual

Quantitative Description of Environment–Behavior Relations

Figure 10.6. Two examples of fits of a linear equation to data showing heteroscedasticity (left and middle panels) or homoscedasticity (right panel). The left panel shows that the model underpredicts the data when reinforcer ratios are smaller. The middle panel shows a more subtle example, and it may be difficult to see heteroscedasticity. The right panel shows what appears to be a homoscedastic pattern, in which the data are evenly spread along the range of the fit.

inspection reveals that more data points are above the line and that within a certain range of the x-axis, approximately −0.8 to −0.2, all the data points fall above the line. Finally, in the right panel, one can see that the number of points above and below the line are approximately equal, that the deviations above and below are roughly equal, and that there is no systematic relation between positive and negative deviations and x value, in keeping with the requirement of homoscedasticity. To be sure whether the model produces systematic departures from homoscedasticity, one needs to perform a finer grained analysis, which is called residual analysis. Most programs routinely report residual analyses, and we recommend careful inspection and reporting of these results. Residual analysis can be performed in several ways. All of them use the same general procedure: Evaluate the pattern of residuals as a function of the independent variable or as a function of the predicted data (e.g., each Y′ in column G of Table 10.2) to determine whether the residuals are evenly and randomly distributed. For example, one of the most basic ways to perform residual analysis is to plot the raw residuals (i.e., the difference between the predicted and obtained values) as a function of the independent variable. The residuals from Figure 10.6 are plotted in Figure 10.7 as a function of their corresponding reinforcer ratios. The order of the panels corresponds to the panels in Figure 10.6. Thus, the left panel shows obvious departures from homoscedasticity. The middle panel in Figure 10.7

shows more clearly than that of Figure 10.6 that the data are heteroscedastic. The right panel shows a desirable pattern of residuals, one that shows homoscedasticity. To perform a residual analysis using the data from Table 10.2, plot the data in column H, the raw residuals, as a function of the independent variable shown in column E, the logged reinforcer ratios. Residual analysis can be performed in other ways (Cook & Weisberg, 1982). The standardized residual could be plotted as a function of the independent variable. The standardized residual is calculated by dividing each raw residual by the standard deviation of residuals. Standardized residuals allow one to more easily detect the presence of outliers (e.g., a residual ±2 standard deviations from the mean of the residuals). To detect patterns in the residuals, one could use inferential statistics to supplement visual analysis. For example, in some cases a correlation between the residual and the independent variable could be calculated (e.g., Dallery, Soto, & McDowell, 2005; McDowell, 2005). A significant correlation would indicate systematic departures from homoscedasticity, and a nonsignificant correlation (which is desirable) indicates homoscedasticity. There are more advanced techniques to detect patterns in the residuals, which involve fitting curves to the residuals. Some authors use cubic, quadratic, and other polynomial equations to detect these deviations from homoscedasticity (e.g., see Sutton, Grace, McLean, & Baum, 2008). Additionally, there are techniques that involve examining the number 231

Dallery and Soto

Figure 10.7. Residual plots of data shown in Figure 10.6. The order of panels reflects the order in Figure 10.6. See text for further details.

of residuals that have a positive or negative sign and the number of consecutive residuals of the same sign, called runs of residuals (McDowell, 2004; Motulsky & Christopoulos, 2004). Given the logic of residual analysis, the latter runs test should show that an equal number of residuals are above and below the line and that there are no consecutive runs of one sign or the other. There are several potential consequences if nonrandom residuals are obtained. First, it may mean that the assumption of random scatter is violated and that using an ordinary least-squares regression technique may not be appropriate. We explain why this is the case in the Rationale for Squaring the Deviations in Regression Analysis section. Second, the equation might be incorrect. Perhaps the equation form is incorrect, a parameter is missing from the equation, or the parameters have not been adjusted properly (see Motulsky & Christopoulos, 2004, on troubleshooting bad fits). Something may also possibly have gone wrong with the experiment. Perhaps some uncontrolled variable was responsible for the systematic departures, or an error was made in transcription or data analysis. Before dismissing a model because of nonrandom residuals, one should be sure that an error in the experiment or analysis is not the source of the nonrandom residuals. Although residual analysis is critical in evaluating the accuracy and validity of a particular model, for example, the generalized matching equation, residual analysis can also be used to compare models. Sutton et al. (2008) used several datasets to distinguish two models of choice and performed what might be called a meta-analysis of residuals. Because residuals can show deviations at a finer grained level 232

than overall measures of model accuracy such as r2, they allow researchers to more easily evaluate model accuracy (see also Dallery et al., 2005). Sutton et al. concluded that although the r2s of the two models were equivalent, analysis of the residuals yielded subtle, yet important distinctions between the models. Thus far, we have considered three essential steps in performing linear regression: careful visual inspection of the data, calculating r2, and performing residual analysis. Researchers must be sure that a line describes the data. If it does not, they may need to evaluate a different model, but only if the alternative model makes theoretical sense when applied to the data. This issue requires critical analysis. There are almost always theoretical reasons for choosing a particular model, and thus regression analysis in particular and quantitative modeling in general are not simply a matter of obtaining a good fit. Additional Questions When Evaluating Linear Regression

Do the Parameter Values Make Sense? Estimated parameter values must be plausible, both in terms of precedent and in relation to the dataset to which the equation is fitted. For example, in the case of linear regression, the y-intercept should be plausible given the obtained data in the experiment. What should one expect to see if the value of the independent variable is 0? In the case of generalized matching, the independent variable is 0 when the reinforcer rates are equal. Using the example from the preceding section, assume each confederate delivered 60 reinforcers per hour; thus, 60/60 is 1, and the log of 1 is 0. A safe initial assumption,

Quantitative Description of Environment–Behavior Relations

therefore, is that one would observe equal response rates on the two alternatives (i.e., a log response ratio close to 0), and some deviation might be expected if the response alternatives differ in some way (e.g., one confederate is more genuine or the quality of feedback is different). That is, one might expect some bias. If the obtained parameter value is widely discrepant from the existing literature, then something is most likely wrong. Solving the discrepancy may require some detective work. Perhaps the problem is lack of experimental control or human or measurement error, or some variable other than reinforcer rate is exerting an effect.

Is There Error in the Values of the Independent Variable? When researchers perform regression analysis, they are predicting the values of the dependent variable at specific values of the independent variable, which means that they need to obtain precise numbers for their obtained data. In our earlier example, this would mean precise reinforcer rates. For example, if one uses an averaged value of 6.0 reinforcers per hour as one of the independent variable values in linear regression, one must be sure that the reinforcer rates going into this average are extremely close to 6.0 reinforcers per hour. If the measurement of a particular value of the independent variable has a lot of error or variability, one cannot make precise predictions. Regression analysis assumes that there is no, or very little, error variance in the independent variable. In the case of a generalized matching analysis, there will be some variability in the independent variable, the obtained log reinforcer ratio, because the obtained log reinforcer ratio is plotted and depends in part on the rate of responding at each alternative. (One could plot the experimentally programmed reinforcer ratio, which has no error, but the programmed ratio may not be what the animal experiences.) Because the schedules for each alternative are VI schedules, the variability in reinforcer rates will be minimal compared with the variability in response rates, and thus the resulting reinforcer ratio will have minimal variability. For example, under a VI 60-second schedule, a pigeon could press once a minute or 100 times a minute and receive about the same number of reinforcers.

Therefore, the amount of variability is not enough to invalidate the use of ordinary linear regression techniques. If the variability in the independent variable is high, alternative regression approaches are available (Brace, 1977; Riggs, Guarnieri, & Addelman, 1978; although the same considerations apply for nonlinear regression, these approaches apply only to linear regression). To our knowledge, there are no specified rules about how much variability is too much to warrant an alternative approach.

Is Another Model More Appropriate? Even when the model accounts for the variability in the data points (i.e., a high r2), the residuals are homoscedastic, and the parameter values are sensible, there may be other questions regarding the model that cannot be answered by linear regression analysis. For example, one may ask whether another model of concurrent choice can account for more of the variance in the data. In the case of concurrent schedules of reinforcement, in addition to the generalized matching law, alternative models can account for choice (e.g., Dallery, McDowell, & Soto, 2004; McDowell & Kessel, 1979). One may also ask whether a more parsimonious model can account for the same amount of variance. In general, a more parsimonious model would be one that contained fewer parameters. To our knowledge, no current model can account for the data and is more parsimonious than generalized matching. In other areas of research, however, questions about parsimony and accuracy are still quite relevant (e.g., Mazur, 2001; McKerchar et al., 2009). Questions about accuracy and parsimony can be addressed using model comparison approaches, which we discuss later (see also Mazur, 2001, for an example of model comparison). The underlying assumptions of the selected model should also be examined. Indeed, a highly accurate model may be based on faulty assumptions. There are no tidy rules for discovering these assumptions; it takes critical thinking and a thorough understanding of the mathematical model. One assumption of generalized matching, for example, is that sensitivity to reinforcer rates (i.e., a in Equation 5) should be constant for an individual subject across changes in reinforcer magnitude or quality. By constant, we mean that a should not 233

Dallery and Soto

change if, for instance, a is estimated using small food pellets or large food pellets. If sensitivity to reinforcer rates does change when different reinforcer magnitudes are used, then the assumptions underlying the generalized matching equation have been violated, and the application of the equation is invalid. At least one study has shown that sensitivity to reinforcer rates does change depending on reinforcer magnitude (Elliffe, Davison, & Landon, 2008). Elliffe et al.’s (2008) findings question a matching-based understanding of how multiple independent variables—reinforcer rate and magnitude—interact and govern choice. Whether these findings will be confirmed by future research is an important issue, and such work may inspire new models that will yield new insights into the determinants of choice. Rationale for Squaring the Deviations in Regression Analysis The main feature of the regression analysis is minimizing the sum-of-squared deviations. The story behind the reason researchers seek to minimize the squared deviations, or perform so-called least-squares regression, is an interesting one. In 1801, at the age of 23, Carl Friedrich Gauss used the method of least squares to predict, with unprecedented precision, the path of a dwarf planet named Ceres. He made about 22 observations over a 40-day period before Ceres went behind the sun. To predict where it would reappear almost half a year later, he developed the method of least squares. The particular equation he used to predict the path of the planet need not concern us. What made the method work was Gauss’s assumption that any error in the observations followed a normal distribution. To illustrate why, we provide a more familiar example than planetary motion. Consider one of the more highly controlled experimental preparations in behavioral science: a pigeon pecking a lighted key for brief access to grain. After every 20 pecks, access is granted for a brief period. As response rate is measured from session to session, the numbers will vary. One day the response rate is higher, the next it is lower. However, all of the numbers revolve around some average. The distribution of the numbers is the 234

key—the scatter of response rates is normally distributed, or Gaussian. Most of the numbers will be close to the average, but some scatter will also be observed further away from the average. In other words, scatter close to the mean is more likely than scatter far away from the mean. Assume the average is known because the pigeon has been observed for 100 consecutive days. The average is 50 responses per minute. On Days 101 and 102, one student takes measurements, and on Days 103 and 104, another student takes measurements. The first student obtains 45 and 55, and the second obtains 49 and 59. What if the same students made two more measurements each? Which pair of observations would be more likely the second time around? This is a question about predicting behavior and how to maximize accuracy. To quantify accuracy, calculate deviations from the mean. Smaller deviations mean better accuracy. What if the researcher just calculates the deviations from the mean for both pairs? Ten units would be obtained for both pairs of observations (5 + 5 = 10 and 1 + 9 = 10). The researcher would conclude that both pairs are equally as likely because they have equal total deviations from the mean. However, these calculations do not take into account a property of the deviations around the mean: the normal, Gaussian distribution. To take this property into account, the deviations are squared, that is, 52 + 52 = 50 and 12 + 92 = 82. On the basis of these calculations, the first pair of observations is more likely (see also Motulsky & Christopoulos, 2004). The reason that squaring the deviations produces the most accurate predictions can be seen in Figure 10.8. The abscissa shows the values of the observations, and the ordinate shows the frequency of their occurrence. Consistent with the example, behavior has been measured for 100 days. On the basis of the distribution, one can see how the observations made by Student A are more likely; the likelihood of Student A’s observations of 45 and 55 is greater than that of the observations made by Student B. If the observational error in a system under study shows a Gaussian distribution, whether planetary motion or a pigeon’s pecking, then least-squares regression will be most accurate in generating the best parameter values, the values that make the best

Quantitative Description of Environment–Behavior Relations

Figure 10.8. An example of a Gaussian distribution. The graph shows the frequency of observations, in this case response rates, over the course of 100 sessions. The distribution has a mean of 50. Two pairs of observations are taken, one by Student A (indicated by the As in the graph) and two by Student B (indicated by the Bs in the graph).

predictions. The error could be intrinsic to the system itself, or perhaps the variation is the result of some extrinsic factor such as measurement error or how much the experimenter feeds the pigeon between sessions. Regardless of the source, the observations over time, such as response rates over the course of 100 sessions, should show a Gaussian distribution. It is important to understand how and why the sum-of-squared deviations incorporate the Gaussian distribution in making predictions. Other methods to perform regression do not assume a Gaussian distribution. Usually, leastsquares regression is a justified practice to assess a model, but there are times when the distribution of observations do not show Gaussian scatter; sometimes the scatter of observations increases as the value of the independent variable increases, perhaps the scatter is wider than a Gaussian distribution, or perhaps the distribution is skewed and therefore non-Gaussian. For example, distributions of interresponse times are often skewed, and therefore using linear regression on mean interresponse times would not be appropriate in these cases. Techniques are available to perform regression on the basis of these distributions that involve different methods of weighting the residuals when calculating the best fit.

We do not present these techniques because of space limitations (see Motulsky & Christopoulos, 2004), but we do want to emphasize why least-squares regression would not be appropriate for these distributions. In a nutshell, it is because some residuals will have undue influence on the fit (similar to the effect of an extremely large or small value on the average of a set of values). The logic also explains why outliers require some special consideration in regression, even if one is confident that the scatter of observations follows a Gaussian distribution. Because the deviation to the outlier will be large, it will have a large impact when one attempts to fit the curve (or line). Consequently, the outliers will have a disproportionate impact on the values of the parameters, which would be unfortunate if the outlier was a fluke observation (e.g., the food dispenser stopped working one day and so did the animal). There are methods to reduce the influence of outliers, and these methods are called robust leastsquares regression. Computer programs will differ in which robust methods are offered, so we advise a search for these robust methods if outliers are present (and the reason for the outlier has been critically analyzed and including the outlier in the regression analysis is justified). 235

Dallery and Soto

An Extension of Matching Theory’s Quantitative Description of Choice Just as the development of linear models often begins with data, so too does the development of a nonlinear model. For example, imagine we have arranged a laboratory situation in which a human has one person to talk to, not two, and the rates of reinforcer delivery for talking about a certain topic, say politics, are varied across conditions. In this case, reinforcers consist of brief periods of eye contact (Beardsley & McDowell, 1992). Eye contact is provided according to several VI schedules of reinforcement (manipulated across conditions), but only when politics are discussed. We find that the rates of political speech increase with reinforcer rates. The relation is not a straight line, however; it is a curve. (Two examples of such curves are depicted in Figure 10.2.) At first, responding increases rapidly when reinforcer rates increase, and eventually responding levels off. The equation, of course, will reflect this hyperbolic shape in the observed data, and in that sense it does not take us beyond our observations. Our model, however, should extend our observations in several ways. Can it unify our observations of human behavior with the behavior of other organisms? Can it make novel and testable assertions about why the data take the shape they do? Before answering these questions, let us sketch the development of the hyperbolic equation. In 1968, Catania and Reynolds published an enormous and elegant dataset on how pigeons’ rate of pecking changed with changes in rate of VI reinforcement. The relation, as in our example of social dynamics, appeared hyperbolic. Thus, researchers knew that any quantitative account of reinforced responding on a single VI alternative must be consistent with a hyperbolic form. In 1970, Herrnstein demonstrated that one can obtain a hyperbolic model from the original matching equation, Equation 2. In deriving a hyperbolic equation from Equation 2, Herrnstein made several assumptions. First, he assumed that even when only one alternative is specifically arranged, other extraneous, unarranged behavioral alternatives exist. Consider the pigeon’s situation in Catania and Reynolds. In addition to pecking the lighted key, the 236

pigeon may engage in other behavior such as walking around, wing flapping, and so on, and presumably, each behavior produces some natural reinforcer. Similarly, a human engaged in political speech could talk about a different topic, fidget, or stare into space. Therefore, a single alternative arrangement is really a concurrent schedule, in which one choice is the target alternative and the other is the aggregate of all extraneous behaviors. Responding in a singlealternative situation can be expressed as R r = , R + R e r + re

(7)

where Re and re represent the rates of extraneous responding and reinforcement, respectively (see Herrnstein, 1974, for a further discussion). In other words, Re and re represent any responding and reinforcement extraneous to the target alternative. The subscripts have been dropped from the target alternative because only one schedule of reinforcement is explicitly arranged. Second, Herrnstein (1970) also assumed that R (target responding) and Re (extraneous responding) are exhaustive of the total amount of behavior possible in a given environment. In other words, R + Re is all an animal can do in a given situation. Equation 7 can be solved for R by letting k = R + Re and then rearranging, which produces the familiar Equation 1:  r  R = k .  r + re 

The parameter k is now interpreted as total behavior possible in a situation. By assuming that all behavior is choice and that the total amount of behavior in a situation is constant, Herrnstein (1970) derived Equation 1 from Equation 2. Herrnstein’s (1970) theory consists of a quantitative statement about how reinforcer rate affects response rate (i.e., hyperbolically). More important, the theory also explains why reinforcement affects behavior in the manner specified by Equation 1: Reinforcement alters the distribution of a constant amount of behavior (Herrnstein, 1970, 1974). Now, at least conceptually, one can see how Herrnstein’s model links responding in situations in which only

Quantitative Description of Environment–Behavior Relations

one choice is experimentally arranged and situations in which two choices are experimentally arranged. Both represent choice situations. Indeed, Herrnstein went so far as to say that all behavior is choice. A pigeon’s key pecking and a person’s talking can be described by the same equation. The theory also explains why one feature of the hyperbolic shape is seen: asymptotic responding. Once enough reinforcers are delivered for some activity, the limits of the distribution of behavior are presumably reached (i.e., exclusive responding on the arranged alternative), and thus asymptotic levels of responding are observed. Evaluating Nonlinear Models: Nonlinear Regression The preceding conceptual analysis highlights some of the surprising and interesting consequences of quantitative models. Researchers can unify phenomena and explain why they see similar functional relations across situations and species. These consequences are moot unless the model is accurate, however, and as such we must return to analysis. Table 10.3 shows empirical observations in the single-alternative situation involving social interaction and political speech, and it follows the same general format as Table 10.2. Figure 10.9 shows a plot of the data in columns A and B, the response

and reinforcer rates, and the curve represents the least-squares fit of the hyperbolic equation. Least-squares regression for nonlinear models works in the same way as it does for linear models. As before, start with some initial parameter values for k and re. Here, it may be more difficult to guess where to start for the initial values; it is easier if one examines a graph of the data. Inspect the data points in Figure 10.9. The value of the asymptote k is measured in the units of the dependent variable— usually responses per minute. Because k reflects the Table 10.3 Regression Calculations for a Nonlinear Model B: rsp/min C: predicted A: rnf/hr

(Y )

180 138.6 98 22.5 15 9

32.0 30.1 28.0 18.8 16.3 6.7

rsp/min (Y′) D: Y − Y ′ E: (Y − Y′)2 31.46 30.45 28.78 17.68 14.14 10.09

0.54 −0.35 −0.78 1.12 2.16 −3.39

0.2916 0.1225 0.6084 1.2544 4.6656 11.4921

Note. Columns A and B show response and reinforcer rates, and column C shows predicted response rates calculated by Equation 1 with k = 35.41 and re = 22.57. Other conventions are identical to those in Table 10.1. Regression calculations: SSTOT = 478.43, SSREG = 18.43, R2 = 0.96. rnf = reinforcer; rsp = response.

Figure 10.9. The curve shows Herrnstein’s (1970) hyperbolic equation fitted via least-squares regression to the data in Table 10.3. 237

Dallery and Soto

asymptote, one could start with the maximum response rate observed, or 32 responses per minute. Mathematically, the parameter re represents the reinforcer rate necessary to obtain half the asymptotic response rate, k. The reinforcer rate for the maximum response rate was 180 reinforcers per hour, so half of this reinforcer rate could be used as the initial value of re. We should note that, by definition, re represents the aggregate rate of reinforcement from sources extraneous to the target alternative. These extraneous reinforcers are typically not measured directly; rather, they are estimated in terms of the reinforcer delivered on the instrumental alternative. For example, if political speech were reinforced with eye contact, then an re of 10 would be read as 10 eye contacts per hour (the time unit for re is usually reinforcers per hour). In performing nonlinear regression, having a sense of what the parameters mean allows the researcher to use the data to determine initial parameter values. Then, the computer program (or algorithm) inserts the initial estimates (32 responses per minute and 90 reinforcers per hour) into Herrnstein’s (1970) equation to obtain a predicted response rate for each of the measured reinforcer rates. The computer program adjusts the parameter values to minimize the sum-of-squared residuals. Table 10.3 shows the best-fitting parameter values of k and re obtained via least-squares regression. The predicted data based on these values are show in column C. The first index of the model’s accuracy is the R2 (by convention, results of nonlinear regression are reported using R, and results of linear regression are reported using r). Similar to what was depicted in Figure 10.5, the R2 compares the vertical distances of the data to the curve with the vertical distances of the data to a horizontal line at the mean. More specifically, a ratio of the sum-of-squared residuals to the total sum-of-squares is calculated. In our example, SSREG is 18.43, and SSTOT is 478.43. Subtracting the ratio of SSREG from SSTOT from 1 yields an R2 of .96. A residual analysis is also performed to assess whether there are systematic departures from homoscedasticity. The R2 tells only part of the story. Again, one could perform several different kinds of residual analysis (see earlier), but for simplicity the 238

Figure 10.10. Plot of the residuals of the fit of Herrnstein’s (1970) hyperbolic equation. The residual plot does not suggest systematic deviations for homoscedasticity.

residual is plotted as a function of the independent variable, which is shown in Figure 10.10. One can see that the scatter of residual values is relatively even across the range of reinforcer rates. More data may be needed to assess the assumptions of leastsquares regression, but the analysis suggests that there are no obvious departures from the assumptions. Additional Questions When Evaluating Nonlinear Regression

Do the Parameters Make Sense? To answer this question, as in linear regression, one has to know what the parameters mean and in what units they are measured. If researchers obtain an estimate of k of 1,000, and they have measured response rates in units of words per minute, this estimate of asymptotic response rates would not be feasible. Moreover, even if the maximum observed response rate were 100 or so, one should be suspicious of such a parameter value because it exceeds the observations by orders of magnitude. That is, if the parameter value exceeds what is reasonable on the basis of observed values, the researchers should think carefully about the obtained parameter value. It is indeed possible to obtain very large ks, but this

Quantitative Description of Environment–Behavior Relations

is usually because the researcher has not collected enough data under high reinforcer rates (i.e., in which response rate is close to asymptote). In choosing the VI schedules, the researchers have not sampled the parameter space effectively. For example, one would say the parameter space has not been sampled adequately if only the three lowest reinforcer rates shown in Figure 10.9 (three leftmost data points) were experimentally arranged. Even visually, using these three data points, the researchers would not be able to make a reasonable guess about the asymptote of the function. Their guess might be large (and uncertain). Similarly, the estimate produced by the regression program might be large (and uncertain, as measured by the standard error of the parameter estimate, for example; Motulsky & Christopoulos, 2004). Given that the parameter space has been adequately sampled, as shown in Figure 10.9, the obtained k of 35.41 responses per minute makes sense given the observations. The obtained re of 22.57 reinforcers per hour also makes sense. Although it may seem odd that extraneous reinforcers are measured in units of the delivered reinforcer, eye contacts, the value of eye contacts is plausible given the observed data.

Is There Error in the Values of the Independent Variable? As noted for linear regression, when one performs nonlinear regression analysis, one is predicting the values of the dependent variable at specific values of the independent variable. We do not go into this issue in detail because the arguments are the same as outlined in the Evaluating Linear Models: Linear Regression section. Briefly, recall that if there is a lot of variability in the independent variable, one cannot answer with precision the question of how the dependent variable changes with changes in the independent variable.

Is Another Model More Appropriate? As with linear models, when evaluating a nonlinear model, one should consider whether there is evidence that questions the assumptions underlying the model or evidence for other candidate models that seek to describe the same environment– behavior relation. For example, although Equation 7

typically accounts for more than 90% of the variance in response rates under VI schedules in laboratory, applied, and naturalistic settings (see de Villiers & Herrnstein, 1976; Fisher & Mazur, 1997; McDowell, 1988; and Williams, 1988, for reviews), there are some reasons to think it may not be wholly correct. As noted by Shull (1991), “The fact that an equation of particular form describes a set of data well does not mean that the assumptions that gave rise to the equation are supported” (p. 247). The assumptions underlying Herrnstein’s (1970) hyperbola have been examined in terms of how the parameters change with manipulations of environmental variables (see Dallery & Soto, 2004, for a review). According to the equation, re should vary directly with changes in the extraneous reinforcer rate, and this prediction has some empirical support (Belke & Heyman, 1994; Bradshaw, Szabadi, & Bevan, 1976). Other studies, however, have questioned this prediction (Bradshaw, 1977; Soto, McDowell, & Dallery, 2005; White, McLean, & Aldiss, 1986). Similarly, there are conflicting findings for the prediction of a constant k as reinforcer properties are manipulated. Recall that this is required by Herrnstein’s (1974) theory: Reinforcement alters the distribution of a constant amount of behavior among the available alternatives (de Villiers & Herrnstein, 1976, p. 1151; Herrnstein, 1974). Thus, the evidence has suggested some limitations to the assumptions underlying the hyperbola, and these limitations are independent of the criteria by which one evaluates the appropriateness of the model in describing a particular data set. Moreover, other models account for the same hyperbolic relations as Herrnstein’s model and predict the conditions under which k should remain constant or vary (Dallery, McDowell, & Lancaster; 2000; Dallery et al., 2004; McDowell, 1980; McDowell & Dallery, 1999). There is another reason to question the validity of Herrnstein’s equation. Recall that the original, strict matching equation, on which Herrnstein’s equation is based, was modified to account for bias and insensitivity. If bias and insensitivity are ubiquitous in concurrent schedules, they should be incorporated into Herrnstein’s (1974) account of single-alternative responding. We do not develop the equation that incorporates these parameters into Herrnstein’s original equation (see McDowell, 2005, 239

Dallery and Soto

for the derivation). As one might imagine, incorporating these parameters makes the equation appear more complicated. However, the equation describes the same hyperbolic shape, the same environment– behavior relation. Evaluating the equation involves the same steps outlined earlier. Indeed, in comparing the family of equations that make up the so-called modern theory of matching with the classic theory of matching, McDowell and colleagues (Dallery et al., 2005; McDowell, 2005) made extensive use of residual analysis to detect subtle, yet important deviations from the original, classic theory of matching. Although Dallery et al. (2005) and McDowell (2005) concluded that the modern theory represented a superior theory, alternative quantitative accounts also provide useful, general, and informative descriptions of the same environment–behavior relations (e.g., Killeen & Sitomer, 2003; Rachlin, Battalio, Kagel, & Green, 1981). Model Comparison Evaluating the accuracy and generality of a single quantitative theory is an important first step in describing environment–behavior relations. Comparing the model with other accounts is also a critical feature of the scientific process. Indeed, making distinctions between models can help further isolate controlling variables, develop critical empirical tests to distinguish the models (Mazur, 2001), and potentially lead to new insights about the processes responsible for behavior change in applied settings (Nevin & Grace, 2000). In the next section, we present two techniques to compare models. One technique, the extra sum-of-squares F test (Motulsky & Christopoulos, 2004), is based on conventional inferential statistics, and the other technique, Akaike’s information criterion (AIC; Akaike, 1974; Motulsky & Christopoulos, 2004), is based primarily on information theory (Burnham & Anderson, 2002). The objective of each technique is to provide a quantitative basis on which to judge whether one model provides a better account of the data than does another. Both methods evaluate goodness of fit relative to model complexity (i.e., the number of parameters). This is because the more parameters a model has, the better it will describe the data, and 240

thus any improvement in goodness of fit relative to another model must be considered in the context of this increase in the number of parameters. Before comparing models using either the sum-ofsquares F test or AIC, researchers must determine whether both models provide a reasonable and accurate account of the data: (a) Do they describe the data well (i.e., high r2 or R2), (b) are the residuals randomly distributed, and (c) are the parameter values sensible? If the answer to each of these three questions is yes, then researchers can proceed to model comparison with the F test or AIC. If the answer to one of the questions is no, then researchers must either resolve the issue (e.g., if nonsensible parameters are the result of experimental error) or choose between the models on the basis of the residuals or goodness of fit (i.e., if one model has nonrandom residuals or does a poor job of describing the data, then choose the other model). In addition, researchers must ensure that all datasets are expressed in the same units. They need to compare apples to apples. If the dependent variable in one data set is expressed in responses per second and in another it is expressed in responses per minute, the sum-of-squared deviations will be much larger in the former than in the latter. A comparison with different units is not appropriate. The researcher must either reexpress the data so that the units are identical (e.g., multiply or divide by 60 in the previous example) or find a way to standardize, or normalize, the data before making the comparison. The appropriate model comparison test depends on whether the models are nested. Models are nested when one model is a simpler case of the other model. For instance, the original (Equation 2) and generalized matching (Equation 5) equations are nested models because Equation 4 can be reduced to Equation 2 when a and b are equal to 1. If the models are nested, they can be compared with an extra sum-of-squares F test or with the AIC. If they are not nested, they can only be compared using AIC (or an alternative statistical approach using parametric or nonparametric methods).

Extra Sum-of-Squares F Test to Compare Nested Models Let us use strict matching and generalized matching as an example of model comparison using the data in

Quantitative Description of Environment–Behavior Relations

Table 10.2. Perfect matching is described by Equation 5 with a and b set to 1. Generalized matching is also described by Equation 5, but the parameters are free to vary. Assume that both models describe the data well, have randomly distributed residuals, and have sensible parameter values. The logic of the extra sum-of-squares F test is relatively simple. Goodness-of-fit is being compared along with the complexity of each model. The goodness-of-fit is measured by the sum-of-squared residuals. The complexity is measured by the number of parameters in each model. More parameters that are free to vary mean more complexity. The rule of parsimony asserts that simpler models should be preferred, so any increase in complexity should be more than offset by an increase in accuracy. Before going into the specific calculations, the F ratio compares accuracy (sum-of-squares) and complexity (parameters). If the calculated F ratio is high enough (and the corresponding p value is low enough), then the more complicated model can be accepted (generalized matching, in which the parameters are free to vary). If not, the simpler model is accepted (strict matching, in which the parameters are not free to vary). To calculate the extra sum-of-squares F test, one begins by calculating the sum-of-squared residuals for each model. The sum-of-squared residuals for strict matching in this case is 0.34; for generalized matching, it is 0.13 (see Table 10.4). A smaller sumof-squares is expected for generalized matching, and in general for models with more parameters that are free to vary. Next, the degrees of freedom for each model are calculated as the number of data points minus the number of parameters. For strict matching, the degrees of freedom are 10 (10 data points minus 0 free parameters); for generalized matching, the degrees of freedom are 8 (10 data points minus 2 parameters). Now, the relative difference in sumsof-squares and the relative difference in degrees of freedom are calculated. The relative difference in sums-of-squares is calculated by subtracting the alternative hypothesis sum-of-squares from the simpler hypothesis4 sum-of-squares, and then dividing that difference by the alternative hypothesis

Table 10.4 Obtained Logged Reinforcer and Response Ratios and Predicted Log Response Ratios and Corresponding Regression Calculations for Strict Matching and Generalized Matching D: predicted A: log rnf ratio

C: predicted log

log rsp ratio

B: log rsp

rsp ratio from

from general-

ratio

strict matching

ized matching

0.11 −0.37 0.49 −0.86 0.70 −0.17 0.63 −0.22 0.25 −0.67

0.17 0.07 0.61 −0.66 0.63 −0.21 0.57 −0.06 0.24 −0.44

SSTOT SSREG r2

1.78

0.11 −0.37 0.49 −0.86 0.70 −0.17 0.63 −0.22 0.25 −0.67

0.19 −0.19 0.49 −0.58 0.66 −0.03 0.60 −0.07 0.30 −0.43

Regression calculations 0.34 0.81

0.13 0.93

Note. Columns A and B show the logged response and reinforcer ratios, and columns C and D show the predicted log response ratios from strict matching (Equation 5 with a and b equal to 1) and generalized matching (Equation 5) with a = 0.80 and log b = 0.10 and their respective regression calculations. rnf = reinforcer; rsp = response.

sum-of-squares. Similarly, the relative difference in degrees of freedom is calculated by subtracting the alternative hypothesis degrees of freedom from the simpler hypothesis degrees of freedom, and then dividing that difference by the alternative hypothesis degrees of freedom. These numbers are summarized in Table 10.5. The relative difference in the sums-ofsquares is 1.62 ([0.34 − 0.13]/0.13) and the relative difference in the degrees of freedom is 0.25 ([10 − 8]/8). The F ratio is the ratio of the relative difference in the sum-of-squares to the relative difference in the degrees of freedom, and it is a measure of how much better the alternative model describes the data

In model comparison, we prefer the term simpler hypothesis to null hypothesis. We agree with Rodgers (2010) that the term null hypothesis has too much baggage and often refers to no effect or to chance processes. The modeling approach does not posit a null (or nil) effect; rather, it entails a specific, but simpler, hypothesis.

4

241

Dallery and Soto

Table 10.5 Calculations Involved in the Extra Sum-of-Squares F Test for Model Comparison of the Descriptions of Strict Matching and Generalized Matching Accounts of the Data in Table 10.4 Model Simpler hypothesis (strict matching) Alternative hypothesis (generalized matching) Difference Relative difference Ratio (F ) p

SS

df

0.34 0.13

10.00 8.00

0.21 1.62 6.48 .02

2.00 0.25

Note. Strict matching = Equation 5 with a and b equal to 1; generalized matching = Equation 5.

relative to how many more degrees of freedom the alternative model uses. In this case, the value is 6.48 (1.62/0.25) and the associated p value is .02. The p value can be found in an F ratio statistics table or by using an online calculator. Remember to use the correct degrees of freedom when looking up or calculating a p value. In this case, the degree of freedom for the numerator is 2 (the difference in degrees of freedom of our simpler and alternative hypotheses, 10 and 8, respectively), and the degree of freedom for the denominator is 8 (the alternative hypothesis degree of freedom). Typically, the criterion alpha value is set at .05, so in this case, one would reject the simpler hypothesis that the simpler model, strict matching, provides a better account of the data.

Akaike’s Information Criterion to Compare Nested Models Another technique used for comparing models is AIC. As noted earlier, AIC is based on information theory. The basis of the AIC calculation is beyond the scope of this chapter (for a brief discussion of the basis of AIC, see Motulsky & Christopoulos, 2004; for a more detailed discussion, see Burnham & Anderson, 2002). The AIC calculation allows one to determine which model is more likely to be correct (the one with the lower AIC value) and how much more likely it is to be correct. Unlike the extra sumof-squares F test, AIC can be used to compare nested 242

or nonnested models. We use our previous example to illustrate the use of AIC for model comparison. The logic is the same as in the extra sum-of-squares F test: Accuracy and complexity are compared. Increases in complexity must be offset by increases in accuracy. The first step in using AIC for model comparison is to calculate the AIC for each model using the equation  SS  AIC = N ln  REG  + 2K,  N 

(8)

where N is the number of data points, K is the number of parameters in the model plus 1, and SSREG is the model’s residual sum-of-squares. Consider Equation 8. The AIC value depends both on how well the model describes the data (how small the residual sum-of-squares is; SSREG) and on how many parameters the model has (K). When comparing two models, the model more likely to be correct is the one with the lower AIC value because a lower AIC value represents a better balance between goodness of fit and number of parameters. When the number of data points is small relative to the number of parameters, which may often be the case in behavioral experiments, AIC values will be small and should be corrected (Motulsky & Christopoulos, 2004). The corrected AIC formula is AICC = AIC +

2K ( K + 1)

. N − K −1

(9)

In the previous example, the residual sum-ofsquares was 0.34 for strict matching and 0.13 for generalized matching. The number of data points is the same for each model (10), and K is 1 for strict matching and 3 for generalized matching. Thus, one AIC value for each model can be calculated. The corrected AIC for strict matching is −31.31, and the AIC for generalized matching is −33.43. Because the AIC value for generalized matching (designated as AICB) is lower than the AIC value for strict matching (designated as AICA), one can say that generalized matching is more likely to be correct. Some authors just present the individual AICs to make comparisons between models; the model with a smaller AIC is preferred. Another approach is to

Quantitative Description of Environment–Behavior Relations

calculate the probability that one model is more likely by using the following equation: Probability =

e

−0.5( AICB − AIC A )

1+ e

−0.5( AICB − AIC A )

.

(10)

If the corrected AIC for generalized matching, AICB (−33.43), and the corrected AIC for strict matching, AICA (−31.31), are substituted into Equation 10, the probability that generalized matching is correct is found to be approximately .74. Alternatively, it can be said that the chance that generalized matching is correct is 74%. Note, the difference could be calculated the other way and would result in a probability of approximately .26 that strict matching is correct relative to generalized matching. Finally, an evidence ratio can be calculated that quantifies how much more likely one model is to be correct relative to another model by dividing the probability that one model is correct by the probability that the other model is correct. For the current example, the probability that generalized matching is correct (.74) could be divided by the probability that strict matching is correct (.26), resulting in an evidence ratio of 2.85, which means that generalized matching is 2.85 times more likely to be correct than strict matching. These calculations are summarized in Table 10.6.

Akaike’s Information Criterion to Compare Nonnested Models When comparing models that are not nested, when neither model is a simpler version of the other model, researchers must use AIC. They must scrutinize each model (i.e., r2, residuals, parameters)

before subjecting them to the AIC. The sum-ofsquares F test cannot be used with nonnested models. We do not provide an example of using the AIC with nonnested models because the calculations described in the previous section are exactly the same whether AIC is used for comparing nested or nonnested models. When evaluating the probability that the more likely model is correct relative to another model (Equation 10), subtract the AIC for the less likely model (i.e., the model with the larger AIC value) from the AIC for the more likely model (i.e., the model with the smaller AIC value). One area in which AIC could be applied to model comparison is in research on temporal discounting. Temporal discounting equations describe the rate at which a reinforcer loses value as the delay to its receipt increases (Critchfield & Kollins, 2001; Mazur, 1987). Several nonnested quantitative models (e.g., exponential decay, hyperbolic decay) have been proposed that account for high proportions of the variance in responding (for a recent comparison of four discounting models, see McKerchar et al., 2009). Assuming that residual analysis does not reveal departures from randomness, nonnested AIC analysis may be useful in making comparisons between these models.

Model Comparison to Evaluate Whether an Intervention Changed Behavior The model comparison approach can also be used to assess whether some intervention (e.g., an environmental or pharmacological intervention) produced a significant change in behavior. For example, the model comparison could be used to evaluate whether a drug or some other intervention produced

Table 10.6 Calculations Involved in Akaike’s Information Criterion for Comparison of Strict (Equation 5 With A and B Equal to 1) and Generalized Matching (Equation 5) Accounts of the Data in Table 4 Akaike’s information Model

SSREG

N

Parameters

criterion

Probability

Evidence ratio

Simpler hypothesis (strict matching) Alternative hypothesis (generalized matching)

0.34 0.13

10 10

0 2

−31.31 −33.43

.26 .74

2.85

243

Dallery and Soto

a change in the dose–response curve in a behavioral pharmacology experiment. Similarly, one could ask whether a curve, say a demand curve (which relates consumption to response requirement), changed as a function of a manipulation such as income (number of reinforcers available per day) or degree of food deprivation. Here, the comparison is not a change in the value of a single data point, as in an analysis of variance comparing means; the comparison is between two models (i.e., equations) of the relation across many data points (e.g., comparing a fit of the generalized matching law under control and drug conditions). One could also assess whether a particular parameter value changed as a result of some intervention. For example, one could evaluate whether the sensitivity parameter in the generalized matching equation varies as a function of some manipulation such as drug administration. Similarly, researchers in behavioral pharmacology may be interested in whether a drug alters a parameter value in a quantitative model (e.g., Pitts & Febbo, 2004). Such changes may indicate that certain behavioral mechanisms (e.g., changes in sensitivity to reinforcer amount or delay) are responsible for a particular drug effect (e.g., an increase in impulsive choice). In the interest of brevity, we provide a detailed example of model comparison and only the general logic and procedure for parameter comparison.

The central question in model comparison is, did an intervention produce a significant change in the function describing the relation between the independent and dependent variables? Consider a case in which the researcher has collected two datasets, one obtained before and one obtained after an intervention. The model comparison approach asks whether separate fits of an equation to the data, pre- and postintervention, for example, are better than a single fit of the equation to both datasets. If the separate fits are not better, it suggests that the intervention had no effect on behavior. Note that the basic logic of all model comparison is entailed by this procedure: A more complicated model (one equation fitted separately to each dataset) is being compared with a more parsimonious model (one equation fitted to both datasets). An example of this comparison is shown graphically in Figure 10.11. The left panel shows the alternative hypothesis, and the right panel shows the simpler hypothesis (more details about the figure are discussed later). Either the F test or the AIC can be used to determine whether a particular intervention changed behavior, but we present only the F test. (For the AIC, one would need to calculate the sum-of-squared residuals, N and K, for each model. The calculations for the AIC were given earlier.) Motulsky and Christopoulos (2004) provided a succinct discussion of some factors that may lead a researcher to favor one test over the other.

Figure 10.11. A model comparison approach for nested models. The left panel shows the more complicated alternative hypothesis. The alternative hypothesis is that separate fits of the equation provide a better account of the data. The more parsimonious, simpler hypothesis, shown in the right panel, is that one equation can account for all of the data. The data points represent the logged response and reinforcer ratios presented in Table 10.6. 244

Quantitative Description of Environment–Behavior Relations

In this example, assume we have conducted an experiment with pigeons in which we have manipulated the presence of discriminative stimuli. The discriminative stimuli, such as different colored lights, signal the availability of reinforcement at each of the two alternatives. If behavior is more sensitive to changes in reinforcer rates when discriminative stimuli are used, we can make a prediction: A fit of the generalized matching equation with discriminative stimuli will differ from the fit of the generalized matching equation without discriminative stimuli. The first dataset, obtained without discriminative stimuli, and the second dataset, obtained with discriminative stimuli, are presented in Table 10.7 and plotted in the left and right panels of Figure 10.11. The simpler hypothesis is that the discriminative stimuli have no effect on behavior and that a single fit describes all the data. Thus, we fit the generalized matching equation to all the data pooled together, which is also called global fitting. The global fit is indicated by the solid line in the right panel of Figure 10.11. The values of the parameters for global fit are a = 1.09 and log b = 0.09. The alternative hypothesis is that the discriminative stimuli did indeed change behavior and that separate fits of the generalized matching equation are better (one fit for Table 10.7 Log Response and Reinforcer Ratios Obtained Under Conditions With and Without Discriminative Stimuli Without discriminative

With discriminative

stimuli

stimuli

Log rnf ratio 0.11

Log rsp ratio

−0.37 0.49 −0.86 0.7 −0.17 0.63

0.17 0.07 0.61 −0.66 0.63 −0.21 0.57

−0.22 0.25

−0.06 0.24

−0.67

−0.44

Log rnf ratio

Log rsp ratio

−0.3 −0.25 1 −0.95 0.43 −0.84 −0.77 0.43

−0.22 −0.12 1.6 −0.99 0.62 −1.2 −0.83 0.43

−0.85 0.06

−0.85 0.08

Note. rnf = reinforcer; rsp = response.

each data set). Thus, we also fit the generalized matching equation to each data set separately. The alternative hypothesis is shown graphically by the dashed and dotted lines in the left panel of Figure 10.11. The parameter values for our dataset without discriminative stimuli are a = 0.80 and log b = 0.10; the parameter values for our dataset with discriminative stimuli are a = 1.28 and log b = 0.11. As always, before moving on to model comparison using an F test or the AIC, one must evaluate each model separately in terms of visual inspection, R2, residuals, and the other criteria listed earlier. It would not be appropriate to perform model comparison if one of the models produces nonrandom residuals or unrealistic parameter values (e.g., Dallery et al., 2005). Because no problems occurred with either the simpler or the alternative models (not shown), we can move on to statistical comparison. We calculate the residual sum-of-squares for the two-parameter (one fit with two parameters) global model and the more complicated four-parameter (two fits each with two parameters) alternative model. The total residual sum-of-squares for the simpler hypothesis, that a single global fit provides a better account of the data, is 0.71. The alternative hypothesis, that two separate fits of the equation provide a better account of the data, has a sum of squares of 0.34. The value of 0.34 was calculated by summing the individual fit sumsof-squares of 0.13 and 0.21. Next, we calculate the degrees of freedom for each model. The degrees of freedom for the global fit is 18 (20 data points minus two parameters); for the alternative hypothesis, it is 16 (20 data points minus four parameters—i.e., two parameters for each fit). As before, we can now calculate the relative difference in the sums-of-squares ([0.71 − 0.34]/0.34 = 1.09) and degrees of freedom ([18 − 16]/16 = 0.125) to allow the calculation of an F ratio. The F ratio is 8.72 (1.09/0.125), with a numerator degree of freedom of 2 (the simpler hypothesis degree of freedom minus the alternative hypothesis degree of freedom) and a denominator degree of freedom of 16 (the alternative hypothesis degree of freedom), which has a p value of approximately .003. These calculations are summarized in Table 10.8. Assuming a threshold value of .05, we reject the simpler hypothesis and conclude that separate fits of the equation provide a better account of 245

Dallery and Soto

Table 10.8 Extra Sum-of-Squares F Test Comparison of a Single Global Fit of the Generalized Matching Equation (Equation 5) With Both Data Sets in Table 10.7 Versus Two Separate Fits of the Generalized Matching Equation, One for Each Data Set Model Simpler hypothesis (one global fit) Alternative hypothesis (separate fit for each data set) Difference Relative difference Ratio (F ) p

SS

df

0.71 0.34

18.000 16.000

0.37 1.09 8.72 .003

2.000 0.125

the data than a single global fit. We can also see that the slope increased when discriminative stimuli were present and, as suggested by the difference in the intercepts, perhaps there were some small biases for certain colors over others. A model comparison approach could also be used to assess whether a particular parameter value changed as a result of some intervention (for a more in-depth discussion of this technique, see Chapter G of Motulsky & Christopoulos, 2004). In the example just described using discriminative stimuli, we could have asked whether just the sensitivity parameter changed as opposed to the entire function. The general procedure is similar to what we have described in this section. Construct two models, a simpler and an alternative hypothesis, and ask whether the residuals are random, whether the parameters are sensible, and so on. In constructing the simpler model, again use a global fitting procedure, but with one key difference: Fit the equation to each dataset separately with the constraint that the value of the parameter of interest is shared between the two fits (e.g., the sensitivity parameter, a), and the other parameters are allowed to vary across the two fits. Least-squares regression in this case involves finding one value of the shared parameter and separate values of the other parameters, one for each data set, that minimize the total residual sum-of-squares. As before, the alternative model involves completely separate fits of the equation to 246

each dataset. Because a parameter value must be shared in the simpler model, this technique requires a program that can calculate least-squares regression with shared parameters, such as GraphPad Prism (GraphPad Software, Inc., La Jolla, CA) or Microsoft Excel. Finally, as with the previous model comparison example, compare the sum-of-squared residuals and the complexity (degrees of freedom) obtained for both models, calculate the F ratio, and evaluate significance using an F table. Following the same procedure outlined earlier, the AIC could also be used to evaluate whether an intervention changed the value of a parameter. Conclusion In this chapter, we introduced and developed the mechanics of how quantitative models are assessed and compared. This knowledge should be coupled with an appreciation of historical, conceptual, and philosophical issues concerning quantitative modeling in science in general and behavioral science in particular (e.g., Critchfield & Reed, 2009; Kline, 1959; Marr, 1989; Moore, 2008, 2010; Nevin, 1984; Rodgers, 2010; Smith, 1990). Space limitations preclude us from discussing these philosophical issues beyond acknowledging them, but we hope the reader is now better equipped to consider quantitative modeling from a conceptual and philosophical perspective and thereby bring quantitative modeling into the broader context of how science is conducted. Indeed, just as models of phenomena in other sciences have evolved on the basis of new empirical evidence, models in behavioral science have and will evolve with the accumulation of empirical evidence. We have emphasized that the precise description and critical evaluation of behavior’s controlling variables are at the heart of quantitative analysis in behavioral science. Evaluating quantitative models requires clear and critical thinking and some specialized tools such as residual analysis and model comparison approaches. The benefits of quantitative models are considerable; they yield new insights into the causes of behavior and allow for a high degree of precision, falsifiability, and generality.

Quantitative Description of Environment–Behavior Relations

References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723. doi:10.1109/TAC.1974.1100705 Anscombe, F. J. (1973). Graphs in statistical analysis. American Statistician, 27, 17–21. doi:10.2307/2682899 Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242. doi:10.1901/jeab.1974.22-231 Baum, W. M. (1979). Matching, undermatching, and overmatching in studies of choice. Journal of the Experimental Analysis of Behavior, 32, 269–281. doi:10.1901/jeab.1979.32-269 Beardsley, S. D., & McDowell, J. J. (1992). Application of Herrnstein’s hyperbola to time allocation of naturalistic human behavior maintained by naturalistic social reinforcement. Journal of the Experimental Analysis of Behavior, 57, 177–185. doi:10.1901/ jeab.1992.57-177 Belke, T. W., & Heyman, G. M. (1994). Increasing and signaling background reinforcement: Effect on the foreground response-reinforcer relation. Journal of the Experimental Analysis of Behavior, 61, 65–81. doi:10.1901/jeab.1994.61-65 Bell, E. T. (1945). The development of mathematics. New York, NY: McGraw-Hill. Borrero, J. C., Crisolo, S. S., Tu, Q., Rieland, W. A., Ross, N. A., Francisco, M. T., & Yamamoto, K. Y. (2007). An application of the matching law to social dynamics. Journal of Applied Behavior Analysis, 40, 589–601. Borrero, J. C., & Vollmer, T. R. (2002). An application of the matching law to severe problem behavior. Journal of Applied Behavior Analysis, 35, 13–27. doi:10.1901/ jaba.2002.35-13 Brace, R. A. (1977). Fitting straight lines to experimental data. American Journal of Physiology: Physiology and Comparative Physiology, 233, 94–99. Bradshaw, C. M. (1977). Suppression of response rates in variable-interval schedules by a concurrent schedule of reinforcement. British Journal of Psychology, 68, 473–480. doi:10.1111/j.2044-8295.1977.tb01617.x Bradshaw, C. M., Szabadi, E., & Bevan, P. (1976). Human variable-interval performance. Psychological Reports, 38, 881–882. doi:10.2466/pr0.1976.38.3.881 Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodal inference: A practical informationtheoretic approach (2nd ed.). New York, NY: Springer-Verlag. Catania, A. C., & Reynolds, G. S. (1968). A quantitative analysis of the responding maintained by interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 11, 327–383. doi:10.1901/jeab.1968.11-s327

Conger, R., & Killeen, P. (1974). Use of concurrent operants in small group research: A demonstration. Pacific Sociological Review, 17, 399–416. Cook, D. R., & Weisberg, S. (1982). Residuals and influence in regression. New York, NY: Chapman & Hall. Critchfield, T. S., & Kollins, S. H. (2001). Temporal discounting: Basic research and the analysis of socially important behavior. Journal of Applied Behavior Analysis, 34, 101–122. doi:10.1901/jaba.2001.34-101 Critchfield, T. S., & Reed, D. D. (2009). What are we doing when we translate from quantitative models? Behavior Analyst, 32, 339–362. Dallery, J., McDowell, J. J., & Lancaster, J. S. (2000). Falsification of matching theory’s account of singlealternative responding: Herrnstein’s k varies with sucrose concentration. Journal of the Experimental Analysis of Behavior, 73, 23–43. doi:10.1901/jeab. 2000.73-23 Dallery, J., McDowell, J. J., & Soto, P. L. (2004). The measurement and functional properties of reinforcer value in single-alternative responding: A test of linear system theory. Psychological Record, 54, 45–65. Dallery, J., & Raiff, B. R. (2007). Delay discounting predicts cigarette smoking in a laboratory model of abstinence reinforcement. Psychopharmacology, 190, 485–496. doi:10.1007/s00213-006-0627-5 Dallery, J., & Soto, P. (2004). Herrnstein’s hyperbola and behavioral pharmacology: Review and critique. Behavioural Pharmacology, 15, 443–459. doi:10.1097/00008877-200411000-00001 Dallery, J., Soto, P. L., & McDowell, J. J. (2005). A test of the formal and modern theories of matching. Journal of the Experimental Analysis of Behavior, 84, 129–145. doi:10.1901/jeab.2005.108-04 de Villiers, P. A., & Herrnstein, R. J. (1976). Toward a law of response strength. Psychological Bulletin, 83, 1131–1153. doi:10.1037/0033-2909.83.6.1131 Elliffe, D., Davison, M., & Landon, J. (2008). Relative reinforcer rates and magnitudes do not control concurrent choice independently. Journal of the Experimental Analysis of Behavior, 90, 169–185. doi:10.1901/jeab.2008.90-169 Fisher, W. W., & Mazur, J. E. (1997). Basic and applied research on choice responding. Journal of Applied Behavior Analysis, 30, 387–410. doi:10.1901/jaba. 1997.30-387 Fuqua, R. W. (1984). Comments on the applied relevance of the matching law. Journal of Applied Behavior Analysis, 17, 381–386. doi:10.1901/jaba.1984.17-381 Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267–272. doi:10.1901/jeab.1961.4-267 247

Dallery and Soto

Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243–266. doi:10.1901/jeab.1970.13-243 Herrnstein, R. J. (1974). Formal properties of the matching law. Journal of the Experimental Analysis of Behavior, 21, 159–164. doi:10.1901/jeab.1974.21-159 Houston, A. (1986). The matching law applies to wagtails’ foraging in the wild. Journal of the Experimental Analysis of Behavior, 45, 15–18. doi:10.1901/jeab.1986.45-15 Killeen, P. R., & Sitomer, M. T. (2003). MPR. Behavioural Processes, 62, 49–64. doi:10.1016/S0376-6357(03) 00017-2 Kline, M. (1959). Mathematics and the physical world. New York, NY: Dover. Lattal, K. A. (2001). The human side of animal behavior. Behavior Analyst, 24, 147–161. Lau, B., & Glimcher, P. W. (2005). Dynamic responseby-response models of matching behavior in rhesus monkeys. Journal of the Experimental Analysis of Behavior, 84, 555–579. doi:10.1901/jeab.2005.110-04 Marr, M. J. (1989). Some remarks on the quantitative analysis of behavior. Behavior Analyst, 12, 143–151. Martens, B. K., & Houk, J. L. (1989). The application of Herrnstein’s law of effect to disruptive and ontask behavior of a retarded adolescent girl. Journal of the Experimental Analysis of Behavior, 51, 17–27. doi:10.1901/jeab.1989.51-17 Mazur, J. E. (1987). An adjusting amount procedure for studying delayed reinforcement. In M. L. Commons, J. E. Mazur, J. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior: The effects of delay and of intervening events on reinforcement value (pp. 55–73). Hillsdale, NJ: Erlbaum. Mazur, J. E. (2001). Hyperbolic value addition and general models of animal choice. Psychological Review, 108, 96–112. doi:10.1037/0033-295X.108.1.96 Mazur, J. E. (2006). Mathematical models and the experimental analysis of behavior. Journal of the Experimental Analysis of Behavior, 85, 275–291. doi:10.1901/jeab.2006.65-05 McDowell, J. J. (1980). An analytic comparison of Herrnstein’s equations and a multivariate rate equation. Journal of the Experimental Analysis of Behavior, 33, 397–408. doi:10.1901/jeab.1980.33-397 McDowell, J. J. (1982). The importance of Herrnstein’s mathematical statement of the law of effect for behavior therapy. American Psychologist, 37, 771–779. doi:10.1037/0003-066X.37.7.771 McDowell, J. J. (1986). On the falsifiability of matching theory. Journal of the Experimental Analysis of Behavior, 45, 63–74. doi:10.1901/jeab.1986.45-63 McDowell, J. J. (1988). Matching theory in natural human environments. Behavior Analyst, 11, 95–109. 248

McDowell, J. J. (1989). Two modern developments in matching theory. Behavior Analyst, 12, 153–166. McDowell, J. J. (2004). A computational model of selection by consequences. Journal of the Experimental Analysis of Behavior, 81, 297–317. doi:10.1901/jeab. 2004.81-297 McDowell, J. J. (2005). On the classic and modern theories of matching. Journal of the Experimental Analysis of Behavior, 84, 111–127. doi:10.1901/jeab.2005.59-04 McDowell, J. J., & Dallery, J. (1999). Falsification of matching theory: Changes in the asymptote of Herrnstein’s hyperbola as a function of water deprivation. Journal of the Experimental Analysis of Behavior, 72, 251–268. doi:10.1901/jeab.1999.72-251 McDowell, J. J., & Kessel, R. (1979). A multivariate rate equation for variable-interval performance. Journal of the Experimental Analysis of Behavior, 31, 267–283. doi:10.1901/jeab.1981.36-9 McKerchar, T. L., Green, L., Myerson, J., Pickford, T. S., Hill, J. C., & Stout, S. C. (2009). A comparison of four models of delay discounting in humans. Behavioural Processes, 81, 256–259. doi:10.1016/ j.beproc.2008.12.017 Moore, J. (2008). A critical appraisal of contemporary approaches in the quantitative analysis of behavior. Psychological Record, 58, 641–664. Moore, J. (2010). Philosophy of science, with special consideration given to behaviorism as the philosophy of the science of behavior. Psychological Record, 60, 137–150. Motulsky, H., & Christopoulos, A. (2004). Fitting models to biological data using linear and non-linear regression: A practical guide to curve fitting. Oxford, England: Oxford University Press. Murray, L. K., & Kollins, S. H. (2000). Effects of methylphenidate on sensitivity to reinforcement in children diagnosed with attention deficit hyperactivity disorder: An application of the matching law. Journal of Applied Behavior Analysis, 33, 573–591. doi:10.1901/ jaba.2000.33-573 Nevin, J. A. (1984). Quantitative analysis. Journal of the Experimental Analysis of Behavior, 42, 421–434. doi:10.1901/jeab.1984.42-421 Nevin, J. A., & Grace, R. C. (2000). Behavioral momentum and the law of effect. Behavioral and Brain Sciences, 23, 73–90. doi:10.1017/S0140525X00002405 Parsonson, B. S., & Baer, D. M. (1992). The visual analysis of data, and current research into the stimuli controlling it. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case research design and analysis: New directions for psychology and education (pp. 15–40). Mahwah, NJ: Erlbaum. Perone, M. (1999). Statistical inference in behavior analysis: Experimental control is better. Behavior Analyst, 22, 109–116.

Quantitative Description of Environment–Behavior Relations

Pitts, R. C., & Febbo, S. M. (2004). Quantitative analyses of methamphetamine’s effects on self-control choices: Implications for elucidating behavioral mechanisms of drug action. Behavioural Processes, 66, 213–233. doi:10.1016/j.beproc.2004.03.006 Popper, K. R. (1962). Conjectures and refutations. London: Routledge. Rachlin, H., Battalio, R., Kagel, J., & Green, L. (1981). Maximization theory in behavioral psychology. Behavioral and Brain Sciences, 4, 371–417. doi:10.1017/ S0140525X00009407 Riggs, D. S., Guarnieri, J. A., & Addelman, S. (1978). Fitting straight lines when both variables are subject to error. Life Sciences, 22, 1305–1360. doi:10.1016/ 0024-3205(78)90098-X Rodgers, J. L. (2010). The epistemology of mathematical and statistical modeling: A quiet methodological revolution. American Psychologist, 65, 1–12. doi:10.1037/a0018326 Shull, R. L. (1991). Mathematical description of operant behavior: An introduction. In I. H. Iversen & K. A. Lattal (Eds.), Experimental analysis of behavior (Vol. 2, pp. 243–282). New York, NY: Elsevier.

Journal of the Experimental Analysis of Behavior, 84, 185–225. doi:10.1901/jeab.2005.09-05 Staddon, J. E. R. (1968). Spaced responding and choice: A preliminary analysis. Journal of the Experimental Analysis of Behavior, 11, 669–682. doi:10.1901/ jeab.1968.11-669 Staddon, J. E. R. (1984). Social learning theory and the dynamics of interaction. Psychological Review, 91, 502–507. St. Peter, C. C., Vollmer, T. R., Bourret, J. C., Borrero, C. S., Sloman, K. N., & Rapp, J. T. (2005). On the role of attention in naturally occurring matching relations. Journal of Applied Behavior Analysis, 38, 429–443. doi:10.1901/jaba.2005.172-04 Sutton, N. P., Grace, R. C., McLean, A. P., & Baum, W. M. (2008). Comparing the generalized matching law and contingency discriminability model as accounts of concurrent schedule performance using residual meta-analysis. Behavioural Processes, 78, 224–230. doi:10.1016/j.beproc.2008.02.012

Sidman, M. (1960). Tactics of scientific research. New York, NY: Basic Books.

Taylor, D., Lincoln, A. J., & Foster, S. L. (2010). Impaired behavior regulation under conditions of concurrent variable schedules of reinforcement in children with ADHD. Journal of Attention Disorders, 13, 358–368.

Silver, B. L. (1998). The ascent of science. New York, NY: Oxford University Press.

Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.

Skinner, B. F. (1950). Are theories of learning necessary? Psychological Review, 57, 193–216. doi:10.1037/ h0054367

White, K. G., McLean, A. P., & Aldiss, M. F. (1986). The context for reinforcement: Modulation of the response-reinforcer relation by concurrently available extraneous reinforcement. Animal Learning and Behavior, 14, 398–404. doi:10.3758/BF03200085

Smith, L. D. (1990). Models, mechanisms, and explanation in behavior theory: The case of Hull versus Spence. Behavior and Philosophy, 18, 1–18. Soto, P. L., McDowell, J. J., & Dallery, J. (2005). Effects of adding a second reinforcement alternative: Implications for Herrnstein’s interpretation of re.

Williams, B. A. (1988). Reinforcement, choice, and response strength. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey, & R. D. Luce (Eds.), Stevens’ handbook of experimental psychology (2nd ed., Vol. 2, pp. 167–244). New York, NY: Wiley.

249

Chapter 11

Time-Series Statistical Analysis of Single-Case Data Jeffrey J. Borckardt, Michael R. Nash, Wendy Balliet, Sarah Galloway, and Alok Madan

For years, time-series designs were almost entirely limited to the behavior analysis movement begun by B. F. Skinner (Jones, Vaught, & Weinrott, 1977; Michael, 1974; Morgan & Morgan, 2001; Sidman, 1960). Contrary to the current dominant approach in psychology, luminaries of scientific psychology based their work and theory on systematic observation of one organism at a time (Ebbinghaus, 1913; Kohler, 1925; Pavlov, 1927; Skinner, 1938, 1956; Watson, 1925; for a review, see Morgan & Morgan, 2001). In the view of these pioneering researchers, behavioral processes occur at the level of the individual, and the large-N paradigm dominant in modern psychology obscures a proper analysis of these processes. More explicitly, a top-down approach in which data are aggregated across large sample sizes and generalized to an individual might fail to describe the behavioral process of individuals in the sample and may therefore have limited applicability to individuals (see Chapter 7, this volume). However, from a bottom-up approach, when one measures the ebb and flow of an individual’s behavior before, during, and after an intervention, one can test whether, when, and sometimes even why an intervention is effective. For several decades, researchers such as Barlow and Hersen (1984), Bergin and Strupp (1970), and Kazdin (1982, 1992) have persuasively argued that practitionergenerated case-based time-series designs are true experiments and, as such, ought to stand alongside the more commonly accepted group designs (e.g., randomized controlled trial). Many researchers have been interested in whether and how laboratory-validated interventions translate

to practical settings (Jacobson & Christensen, 1996; Westen & Bradley, 2005; Westen, Novotny, & Thompson-Brenner, 2004). The American Psychological Association’s Division 12 Task Force for Promotion and Dissemination of Psychological Procedures (1995) explicitly recognized time-series designs as important methodological approaches that can test treatment efficacy (Chambless & Hollon, 1998; Chambless & Ollendick, 2001). Westen and Bradley (2005) suggested that psychotherapy researchers “would do well to use clinical practice as a natural laboratory for identifying promising treatment approaches” (p. 267). Hayes, Barlow, and Nelson-Gray (1999) provided a book-length argument suggesting that single-case research designs are better suited to the daily task of evaluating the efficacy of interventions used by clinical psychologists than are the large-N research designs in which these clinicians were trained during graduate school. In accord with these arguments, the APA Presidential Task Force on Evidence-Based Practice (2005) endorsed systematic single-case studies as contributing to effective psychological practice. The clinical field seems to be recognizing that assaying aggregate effect is not the only empirical window from which to view the nature of behavioral change; in addition, the systematic observation of one or a few individuals can be scientifically sound and instructive. Unfortunately, no upsurge of empirically grounded case studies has occurred in clinical psychology. In fact, many practitioners still despair over the relevance of psychotherapy research to practice, perhaps because no therapy has been

DOI: 10.1037/13937-011 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

251

Borckardt et al.

designated as efficacious on the weight of timeseries data as prescribed by American Psychological Association task forces. Moreover, arguments about efficacy are framed almost exclusively in terms of group designs (Howard, Krause, Caburnay, Noel, & Saunders, 2001; Jacobson & Christensen, 1996; Kotkin, Daviet, & Gurin, 1996; Morrison, Bradley, & Westen, 2003; Nathan, Stuart & Dolan, 2000; Nielsen et al., 2004; Seligman, 1995; VandenBos, 1996; Westen & Morrison, 2001). Hence, despite calls for its resurrection, the time-series design in psychotherapy outcome research remains underused. Statistical Analysis of Clinical Single-Case Data Unsubstantiated case reports justifiably exert little influence among skeptical scientists. Social scientists who receive little formal training in single-case research designs may lump the results of studies using these designs into a class that includes unsubstantiated case reports. As such, these researchers (and they are probably mostly psychologists) will dismiss the outcomes of single-case studies. Statistical analysis of single-case data has the potential to provide evidence of treatment effects that are more widely accepted among social scientists than are visual inspections of time-series data. Single-case data may be subject to inferential statistical analyses. Behavioral outcomes may be objectively quantified along a number of dimensions and summarized using familiar descriptive statistics. Error variability can be calculated, permitting inference about whether change after treatment onset is a chance outcome. When researchers can assign a very low probability that the observed behavior change is a chance outcome, they can be more comfortable entertaining alternative explanations. Of course, inferential statistics alone tell one nothing about the nature of the relation between an intervention and observed changes in behavior, and thus researchers look to the quality of the research design to permit more meaningful inferences. Because the family of time-series designs is large, a survey of these designs is beyond the scope of this chapter (see Barlow & Hersen, 1984; Kazdin, 1982; and Chapter 5, this volume, for comprehensive descriptions). Here 252

we focus primarily on a simple preintervention– postintervention design because it is the most fundamental unit of inferential analysis across time-series designs, and hence the most relevant.

Fundamentals of Single-Case Research Proper single-case research design and good datasampling strategies can go a long way toward enhancing the validity of the inferences that are ultimately available to the researcher. Observing change across time is the most fundamental feature of singlecase time-series outcome designs. At the outset, the researcher must make critical decisions about the data that are to be collected. In a typical time-series study, the behavior of a patient or research subject is measured repeatedly on several variables across baseline, treatment, and follow-up phases. The measures chosen are, of course, determined by the nature of the research questions, the opportunity for measurement, and the soundness of the measures themselves. Individual data points are the basic units of a time-series data set, and they typically represent observations, ratings, or counts of some predefined behavioral unit over time. The person making the observations can be the patient, the research subject, the researcher, significant others, or a clinician. The content of the measures could be symptoms, specific behavior, physical status, or medication requirements, to name a few. The single-case study is enriched when observations are collected across more than one source and more than one variable (Strupp, 1996). For example, when investigating a clinical case involving treatment for pain, the patient might report his or her daily level of pain, engagement in predefined pain behavior, involvement in healthy activities, and amount of daily as-needed medication used. In addition, the spouse might report the patient’s level of activity and frequency of pain-related complaints. When considering the data points in a sequence over time, one can think of this set as a data stream. This stream can be divided into phases on the basis of the hypotheses and actions of the researcher. For instance, a baseline phase can encompass the stream of observations of a particular behavior during the period of time before the introduction of an intervention. Subsequently, an intervention is implemented, and the researcher can

Time-Series Statistical Analysis of Single-Case Data

continue to observe the data stream that would now belong to the postintervention phase. This example illustrates the classic A-B design, wherein data are collected during Phase A—the baseline or preintervention phase—and collection continues after the onset of an intervention (i.e., Phase B). This simple design is the basic building block of most available time-series research designs. In time-series data, behavior change requires a metric anchored in time. Therefore, observations should be repeated evenly across time (e.g., daily, weekly) so that the interval between measurements is the same throughout the study (i.e., consistent temporal resolution); otherwise, statistical artifacts can occur. Whether the focus is heart rate, blood pressure, hair pulling, itching, medication usage, or self-mutilation, repeated observations sampled consistently over time and phase establish the topography of change. The number of observations for each phase can be different, but the interval between observations (e.g., hours, days, weeks) should ideally be the same for all phases of the study. For example, mood ratings might be collected once per day from a patient during the baseline phase of a study, and thus these ratings should also be collected daily after treatment onset (not reduced to weekly or increased to three times per day).

Another data collection strategy involves counting the frequency of a response or response class during a predefined set of time periods and examining changes in rates. However, potentially valuable information about the variability of a behavior over time is lost with this approach if the periods during which the counts take place are too long and thus may not be appropriate to answer questions about the change process over time. Frequency of observation is also pertinent to time-series designs. Although frequent observations (e.g., minute to minute or hourly) can yield large data sets, analytic problems related to serial dependence can be amplified by such high-frequency data collection practices (discussed in the Statistical Analytic Techniques for Single-Case Data section). However, sampling infrequently (e.g., weekly, monthly) may yield data streams that more closely resemble data sets made up of independent observations, and these streams tend to be too short to permit rigorous statistical analysis. Moreover, when an investigator does not make observations frequently enough (insufficient temporal resolution) and then attempts to infer trends regarding the underlying distribution from which the data points are drawn, aliasing is more likely and the chance for inferential error increases (see Figure 11.1). Aliasing occurs

8 Actual Behavioral Pattern

Behavioral Ratings

7

Insufficient/Inconsistent Sampling

6 5 4 3 2 1 0 1

3

5

7

9

11

13

15

17

19

21

23

25

Time Figure 11.1. Example of a series of hypothetical behavioral ratings over time. When sampled inconsistently and infrequently, a different pattern can emerge in the sample that is not reflective of the actual underlying behavioral pattern. This pattern is known as aliasing. 253

Borckardt et al.

when the sampling rate is insufficient to accurately represent the underlying pattern of the behavior of interest. Because the researcher is only able to access his or her sample of observations, if the sampling frequency is insufficient to fully capture the underlying pattern, trends may emerge in the sample that do not accurately reflect the behavioral patterns in question. Selecting the right sampling frequency to adequately address the research question of interest is no trivial task, but in clinical settings we have found daily measurement to be well tolerated by patients and well suited to the statistical requirements of several time-series analytic techniques. The clinical researcher is often interested in knowing how the data map against intervention phases (e.g., baseline, treatment, follow-up); therefore, the total number of observations in the entire data set as well as the number of observations in each phase is important. Statistically, the usual time-series study encountered in naturalistic clinical practice has about 10 to 20 total data points (Center, Skiba, & Casey, 1985; Jones et al., 1977; Sharpley, 1987).

Baseline Observations Obtaining a meaningful number of baseline observations is a critical aspect of quality time-series design. Understandably, clinical patients object to delays in treatment; however, this delay is likely to be less of an issue in laboratory studies. However, it is possible at an intake session for clinicians to record a number of potential dependent variables related to the patient’s presenting problem and proposed treatment plan. At the end of an intake interview, the patient can be told that within a few days he or she will receive a packet of customized daily rating sheets to track symptom status (e.g., number of panic symptoms) along with a telephone call from the therapist to schedule a second appointment. Within 2 days after the intake appointment, the therapist can mail customized response sheets consisting of three to four dependent variables to the patient. The patient can hand in the rating sheet (typically covering 2–6 days) on arriving for the next appointment. During this meeting, the therapist and patient review the results of previous testing, elaborate on the nature and scope of the clinical 254

problem, complete any further psychological testing if indicated, define the treatment plan, and schedule the first treatment session. Hence, when the patient returns for the first treatment session, he or she has completed seven daily ratings in addition to the two to six previously completed. In this manner, nine to 13 baseline observations are realized before treatment begins. In laboratory settings, the researcher typically has more flexibility and can design studies to permit collection of many more baseline data points than is feasible in clinical practice. Although extended baselines are not strictly necessary to construct a statistical model of the data set, stable baselines with many observations over a long period of time are conceptually preferable. When researchers have the luxury of long baselines, they have more confidence about attributing any observed change in behavior to the intervention itself rather than to random error or chance.

Supplementing With a Standardized Outcome Measure in Clinical Settings In addition to rating sheets, researchers may use standardized measures to obtain data from subjects. In clinical settings, standardized outcome measures (e.g., the Outcomes Questionnaire 45, the Beck Depression Inventory, the Symptom Checklist–90) can be administered once at intake and periodically throughout treatment. Results obtained from these measures across larger time intervals (once a month, once a week) can enrich the more fine-grained analysis of the continuously monitored measures (for a review of outcome assessment measures, see Maruish, 2004). Although statistical modeling of these data at the individual patient level is not viable given the small number of data points, if the trend of standardized scores (e.g., Beck Depression Inventory total scores) across time tracks reasonably well against the improvement indexed by daily observations, the validity of the daily measures is enhanced. Figure 11.2 shows a hypothetical example of daily depression ratings using a simple 10-point numeric rating scale (on which 1 = not sad at all and 10 = extremely sad) is graphed alongside periodic Beck Depression Inventory scores. The two measures are highly correlated over time and follow the same trend, which supports the validity of the numeric

Time-Series Statistical Analysis of Single-Case Data

25

Daily Depression Ratings (1 = not sad at all, 10 = extremely sad)

Patient-Reported Scores

20

Beck Depression Inventory Scores

15

10

5

0 1

3

5

7

9

11

13

15

17

19

Time Figure 11.2. Hypothetical daily depression ratings using a simple 10-point numeric rating scale on which 1 = not sad at all and 10 = extremely sad graphed alongside periodic Beck Depression Inventory scores

ratings scale used to capture mood on a daily basis in the hypothetical patient. Supplemental assessment on a standardized measure also provides a common metric, allowing the clinical researcher to compare the extent and relevance of the patient’s therapeutic gains with those of patients in other studies (e.g., changes in Beck Depression Inventory scores over time can be compared against data from other studies using this standardized measure). Single-Case Data: Unique Properties and Considerations Single-subject data is fundamentally different from group-level data, and several aspects of such data deserve special consideration. In the sections that follow, we provide a theoretical backdrop for understanding why single-subject data needs special treatment and discuss some of the unique characteristics and statistical challenges associated with its analysis.

Data Fluctuation in Group Designs The generic outcome question for a randomized controlled trial hinges on a conditional probability: If the mean symptom status of the treated group and

the mean symptom status of the control group were actually drawn from the same population of means, the question is, how likely is it that the treatment group’s aggregate symptom status would surpass that of the control group by the amount actually observed (or more)? In other words, how viable is the notion that mere random sampling fluctuation (e.g., error variance) accounts for whatever benefit is observed in the treatment group relative to that of the control group? Scientific psychology has at its disposal a formidable array of parametric and nonparametric statistics designed to detect nonrandom shifts in population parameters.

Data Fluctuation in Time-Series Studies In a case-based time-series study, dispatching the sampling fluctuation explanation is more complex because from the onset there are not one but two types of problem fluctuation in a time-series data stream: one random and one not. There is of course the intrasubject counterpart to the sampling error encountered with group designs. After all, few patients score an 8 on all of their baseline mood ratings and a 3 on every treatment rating. There is, however, another source of variability peculiar to 255

Borckardt et al.

time-series designs: fluctuation resulting from monotonic trends, periodicity, or behavioral drifts in the data occurring across time (Suen, 1987). This source of predictable fluctuation in the data is termed autocorrelation or serial dependence and is encountered in most other areas of natural science (e.g., soil erosion, menses, economic recovery) in which time is a key element of the research question. The presence of autocorrelation violates the fundamental assumption of most conventional parametric and nonparametric statistics: independence of observations.

Autocorrelation For observations to be independent, each and every datum must be its own unique source of information. That is, the datum must be unrelated to preceding or subsequent observations (as in a series of coin flips). In group research, this assumption is often relatively secure. In time-series studies in which, for example, a researcher is investigating a patient’s mood, the patient’s mood on Day 1 has the potential to partially determine the patient’s mood on Day 2, and subsequently the patient’s mood on Day 2 similarly becomes the point of departure for mood on Day 3, and so on: The data points are not independent of each other. Because the data points are drawn from the same person over time, the variability from data point to data point is attenuated to some extent and constrained within the confines of the single individual generating the data. For this reason, when conventional inferential group statistics (e.g., t, F, chi-square, and sign tests) are mistakenly applied to autocorrelated data sets, variability is underestimated, and subsequently the effect or variability ratio is inflated. Spuriously high ts, Fs, and rs are generated, and as a result researchers may conclude that an effect exists when such a conclusion is not justified (Hibbs, 1974; Sharpley & Alavosius, 1988). Autocorrelation can be quantified for time-series data sets and is typically expressed across different lags of the data. For example, the correlation of pairs of data points immediately adjacent to each other in time from a single time-series data set represents lag 1 (i.e., data points at time i, correlated with data points at time i + 1). Similarly, data points at time i can be correlated with data points at time i + 2, which would represent the lag-2 autocorrelation 256

coefficient, and so on. The autocorrelation coefficient (AR) for a stream of data can be calculated as 1 n−1 ∑ x − x x i +k − x n − k i=1 i AR = , 2 1 n ∑ x −x n i=1 i

)(

(

(

)

)

where n = the number of data points, k = lag, and x = the data point value at time i. In laboratory and applied work, the incidence of autocorrelation in behavioral time-series data is generally viewed as sufficient to cause serious inferential bias if conventional statistics or even visual analytic techniques are used (Busk & Maracuilo, 1988; Jones, Weinrott, & Vaught, 1978; Matyas & Greenwood, 1990; Suen, 1987; but see Huitema, 1985; Huitema & McKean, 1998; Huitema, McKean, & McKnight, 1999). Moreover, it does not matter if the autocorrelation coefficient itself is statistically significant. What matters is “the degree of distortion visited upon the t and F statistics when the autocorrelated data are analyzed via those procedures” (Sharpley & Alavosius, 1988, p. 246). For instance, whether it is significant or not, an autocorrelation coefficient of .10 can inflate t and F values 110% and 200%, respectively, when the autocorrelation coefficient is .6. For this reason, time-series designs require special statistical treatment. Although it is a statistical nuisance, by its nature serial dependence reflects the momentum and gradualism of physiological, behavioral, and emotional repair. Because it is an index of serial dependence, autocorrelation can reveal something about the ebb and flow of behavioral change over time. For this reason, autocorrelation is the natural subject matter of a behavioral science. Whatever inferential statistic is applied to single-case time-series data, we believe it should approach autocorrelation not as noise that obscures change, but as music that accompanies it. Put differently, the preferred statistic gauges the occurrence of change while preserving its structure. Statistical Analytic Techniques for Single-Case Data In this section, we discuss several approaches to analyzing single-case time-series data. Although

Time-Series Statistical Analysis of Single-Case Data

numerous approaches have been developed and used for examining time-series data, each model possesses limitations and optimally operates under certain assumptions. When selecting a technique, it is important to consider that elements of the research design and characteristics of the data stream as inherent properties of single-case timeseries designs (i.e., autocorrelation and possibly data-stream length) can pose challenges to accurate parameter estimation.

Ordinary and Generalized Least-Squares Regression Ordinary least squares (OLS) is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared distances between the observed responses in the data set and the responses predicted by linear approximation. An independent measure (e.g., phase) can be numerically coded and used to predict a series of observations collected over time. However, adequate performance of OLS methods assumes independence and normality of the residuals of the fitted model, an assumption that is violated in timeseries data. When autocorrelation is present, one may use generalized least-squares (GLS) estimation rather than OLS. With GLS, the estimated OLS residuals are used to estimate the error covariance matrix for the model. GLS estimation then minimizes the sum of squares of the residuals weighted by the inverse of the sample covariance matrix, which may permit more robust performance in the face of autocorrelated residuals. These approaches are widely used, and the OLS and GLS regression output can be interpreted by using commercial statistical software packages. As noted earlier, to ensure the accuracy and validity of OLS regression, assumptions of a normal distribution and independence of the residuals from the fitted OLS model must be met. Unfortunately, these assumptions are often violated when working with single-case data. It is difficult to make a compelling case that data collected using a single-case design are independent even if the autocorrelation correlation coefficient for a data stream is not significantly different from zero, because one would expect a priori that data points collected from the

same subject over time should be related to each other and not independent. The power to detect significant autocorrelation coefficients is influenced by n size, and the number of data points typically collected, at least in clinical time-series research, is often small. Thus, with short data streams, even if the autocorrelation coefficient is not statistically significantly different from zero, one should avoid application of OLS regression techniques to singlecase data. Moreover, GLS (which may be capable of minimizing inferential error associated with autocorrelated residuals) requires a large number of data points to accurately estimate the parameters necessary to yield valid and reliable results (see Ferron, 2002). As with almost all traditional regression approaches (including GLS), one should have at least 30 data points per phase (preferably 50) to ensure that the output is robust and valid (Robey, Schultz, Crawford, & Sinner, 1999), which is often untenable in clinical case research.

Randomization and Permutation Tests To overcome some of the limitations associated with the application of OLS regression for time-series analysis, several nonparametric tests have been proposed that involve randomization and permutation of time-series data sets (Ferron & Onghena, 1996; Ferron & Sentovich, 2002; Manly, 1997; Mielke & Berry, 1994). Randomization tests are statistical tests that involve generating numerous iterations of random assignment of data points to treatments. Permutation of the order of the data in a time-series data set can permit determination of whether the same results would have been obtained if the data were assigned to rearranged placements. That is, data points representing a behavioral outcome measure of interest spanning two phases of intervention can be randomly ordered several times, and the impact of the phase intervention can be determined by evaluating the importance of data-point ordering in achieving the observed differences between phases. With this approach, a single-case researcher can determine how likely the pattern of a dependent measure is to occur if the data are randomly ordered across intervention phases. The extent to which the patterns of the dependent measure change when the data points are randomly assigned to different 257

Borckardt et al.

intervention phases determines the probability that the data patterns are uniquely tied to the intervention. Although once seen as computationally excessive, modern computers can conduct randomization tests quickly and efficiently. The most important advantage of randomization tests is that they are nonparametric and consequently are not based on distributional assumptions (Arndt et al., 1996; Hooton, 1991; Ludbrook, 1994; Recchia & Rocchetti, 1982; Wilson, 2007). Randomization tests are also fairly straightforward, are easy to apply, and may be used with very short data streams (Edgington & Onghena, 2007). More important, the experimenter must designate a certain time point at which the treatment can be administered and then randomly assign each time point to a treatment. This random assignment not only enhances the study’s internal validity, it also justifies the application of randomization tests (Todman & Dugard, 1999). Ferron, Foster-Johnson, and Kromrey (2003) investigated randomization tests with and without random assignment and concluded that the absence of random assignment questions the legitimacy of using a randomization test (p. 285) and that randomization tests need to be based on permutations that mirror the random assignment of subjects to conditions used in the experiment. This is only a minor constraint for researchers, and attention to it can actually enhance the internal validity of singlecase research. Although some have suggested that randomization tests may not be significantly influenced by the effects of autocorrelation in the data stream (Ferron et al., 2003), parametric and nonparametric statistics alike assume that data points are drawn from independent sources. This is, of course, untrue in single-case research. Good (1994) stated that “all hypothesis-testing methods rely on the independence and/or exchangeability of the observations” (p. 149). True significance probability values are underestimated for positively autocorrelated residuals (Franklin, Allison, & Gorman, 1996 p. 172). Thus, more work is needed on randomization approaches to ensure validity and accuracy when applied to autocorrelated single-case data before these approaches can be recommended without qualification. 258

Autoregression and Autoregressive Integrated Moving Average Autoregressive integrated moving average (ARIMA) and autoregressive models can be fitted to timeseries data to better understand the data, to predict future points in the series, and to determine the effects of an independent variable on the dependent measure over and above the influence of auto correlation and moving-average components. The ARIMA model is generally referred to as an ARIMA(p, d, q) model, where p, d, and q are integers greater than or equal to zero and refer to the order of the autoregressive, differencing, and moving average components of the model, respectively. Autoregressive analyses first partial out the effects of autocorrelation on a data stream and then estimate the impact of the intervention over and above the effects of autoregressive parameters. These tests are probably the most appropriate and widely used tests for analysis of autocorrelated data streams, and they can reliably indicate whether an intervention is associated with change in a dependent variable over and above the effects of serial dependence. The user can indicate how many lags of autocorrelation to partial out and, with ARIMA, to indicate whether to difference the data to improve stationarity (i.e., subtracting pairs of serial data points to remove trend), whether to model moving average components (i.e., smoothing out of short-term fluctuations by averaging groups of data points over time) of the data stream, or both. These tests can provide appropriately conservative estimates of intervention effects after controlling for unique properties of time-series data. The user can avoid ambiguity around model specification by using a technique called overfitting, in which one specifies to control for numerous lags of autocorrelation rather than determining the exact number of lags that may pose inferential problems (e.g., fitting an ARIMA(15, 0, 0) model to all data). These approaches typically require between 30 and 50 data points per phase to yield acceptable sensitivity and selectivity. As previously discussed, data sets of this length are rarely encountered in clinical singlecase research, especially when data are collected on real patients seeking outpatient psychotherapy. If one happens to have a sizable time-series data stream (e.g., from a lab-based study or from a clinical study

Time-Series Statistical Analysis of Single-Case Data

using a high-frequency sampling scheme), these methods should probably be the first-choice statistical analytic technique.

Hierarchical Linear Modeling Hierarchical linear modeling (HLM), also known as multilevel modeling, is a more advanced form of simple linear regression and multiple linear regression. Multilevel analysis allows variance in outcome variables to be analyzed at multiple hierarchical levels (e.g., within an individual, over time, and between groups of individuals), whereas in simple linear and multiple linear regression, all effects are modeled to occur at a single level. For repeatedmeasures data (such as time-series data encountered in clinical case research), time can be considered as a level that occurs within patients. Moreover, the covariance matrix structure (representing the interrelationships between all variables in a model) can be specified for each model as unstructured or even autoregressive, which can effectively relax assumptions of independence of observations. Although this approach is typically used to analyze data sets in which observations are nested within subjects who may be nested in any variety of settings (e.g., treatment groups, communities, schools), it can be used to analyze single-case data. It becomes even more appropriate when a few or even several participants in the study are providing time-series data with different subject-level characteristics of clinical interest (e.g., type of therapy used, sex, ethnicity, age) and time-based variables of interest (e.g., baseline, treatment onset, follow-up phase). HLM is a powerful and flexible statistical tool for managing complex data sets with time-series elements as well as between-subject factors. HLM handles missing data points with minimal problems and thus tolerates inconsistent temporal resolution while allowing for sophisticated modeling of numerous potential explanatory factors. As with autoregression and ARIMA, HLM also permits evaluation of slope and intercept changes over time and between groups, with minimal negative bias from autoregressive parameters. However, HLM also typically requires between 30 and 50 data points per phase to yield acceptable sensitivity and selectivity.

Interrupted Time-Series Experiments The interrupted time-series experiment (ITSE) approach uses autocorrelation information along with phase slope and mean data to model the effects of interventions on a single case’s time-series data stream. This approach provides information regarding the individual phase means and slopes and whether they differ from each other and from zero after controlling for autoregressive parameters. Although the number of data points required to permit valid estimation of autoregressive parameters has been suggested as limiting its use with short time-series data streams often encountered in clinical practice, Crosbie (1993, 1994) proposed the use of a modified ITSE approach wherein an autocorrelation correction for short data streams is applied (ITSACORR). ITSACORR permits analysis of short autocorrelated data streams with phase lengths as low as 7 data points each. This approach appears to control for Type II error with short autocorrelated data sets because false positive rates associated with this test are roughly equal to nominal alpha (.05). Evidence has even suggested good false-positive control with phase lengths as short as 5 data points each (Crosbie, 1993,1994). ITSACORR offers familiar F and t statistics regarding the significance of process change with respect to both slope and level across treatment phases. Although ITSACORR appears to offer good selectivity with short autocorrelated data streams, its power to detect effects may be a limiting factor to its widespread use. Crosbie (1993, 1994) showed adequate power of ITSACORR to detect large effects in short data streams. The definition of large warrants elaboration. Crosbie and others have suggested that a large effect in time-series is represented by a 10–standard deviation change from baseline to the intervention phase, and a medium effect is represented by a 5–standard deviation change. Although some evidence has supported the presence of such large effects in the single-case research literature, time-series statistical tests should probably reliably detect much smaller effects than 10–standard deviation changes between phases. The software to run ITSACORR has historically been offered for free; however, this software has become hard to find over the past several years, and it appears Crosbie has 259

Borckardt et al.

pulled it out of circulation. This is unfortunate because, despite its limitations with respect to power, it may still have a place in analysis of short, autocorrelated clinical time-series data, especially when one wants to ensure highly conservative inferences. Although the stand-alone software for ITSACORR appears to be unavailable at the present time, the mathematical models for it are provided by Crosbie (1993, 1994).

many popular statistical software packages have yet to implement robust methods (Stromberg, 2004). Although it is still unclear which robust methods would be best suited for handling short autocorrelated data streams of the like encountered in clinical research, there is some hope that derivations of existing robust analytic techniques may become useful to clinical researchers in the near future if these tests become more accessible.

Robust Regression

Control-Chart Techniques and Celeration Trend-Line Analysis

Robust statistics tend to emulate traditional statistical methods but are designed to be robust to outliers or other small departures from traditional model assumptions. Thus, robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and nonparametric methods. Traditional least-squares estimates are pulled toward outliers (abnormally high or low values of a few data points in a data set), thereby inflating error variance estimates, whereas robust statistical estimates are less influenced by violation of assumptions. Huber (1972) introduced maximum likelihood estimation for regression. One criticism of the method is that it appears to be robust to outliers in the dependent measure but may not be resistant to outliers in predictor variables. In fact, when there are outliers in predictor variables, this method may offer little advantage over OLS. Fortunately, in casebased time-series research, the time-based phase variable rarely has outliers. Robust statistics appear to be a viable alternative to traditional statistical analytic approaches, especially when assumption violations are expected to be common. Moreover, they appear to be slowly growing in popularity in recent years. Nonetheless, despite their superior performance over leastsquares estimation in many situations in which assumption violations are present, robust methods for regression are still not widely used. Several reasons may explain their lack of popularity (Hampel et al., 1986). Several competing (traditional) methods are better known among researchers. Although computation of robust estimates is much more computer intensive than traditional least-squares estimation, this problem has become less relevant as computing power has increased greatly. Unfortunately, 260

Visual analytic techniques offer a straightforward approach to evaluating the effects of various interventions on single-case outcome measures (see Chapter 9, this volume); however, some of these techniques have been criticized for being too subjective (e.g., Matyas & Greenwood, 1990). To enhance the objectivity of visual analysis procedures, a suggestion has been that they can be combined with some traditional statistical analytic approaches and probability theory. Numerous control-chart techniques exist that involve graphically representing a dependent variable over time (Pfadt & Wheeler, 1995; Stoumbos, Reynolds, Ryan, & Woodall, 2000). Users can then also visually represent some variability estimates and apply decision rules to determine the “unusualness” of patterns seen in the dependent measure. Figure 11.3 shows a hypothetical time-series data set representing behavioral ratings during a baseline phase and an intervention phase. The mean from the baseline phase is represented with a solid black line and is carried over into the intervention phase. Dotted lines are included representing 1, 2, and 3 standard deviations (sigma) around the mean of the baseline phase data and also carried over into the intervention phase. As can be seen in Figure 11.3, 9 of the 11 data points in the intervention phase fall outside 3 standard deviations of the baseline mean. If there is reason to expect that the intervention will produce a gradual effect (e.g., the intervention involves the acquisition of complex skills), then one would be tempted to conclude that this 3–standard deviation difference is significant. Notice in Figure 11.3 that the baseline data have an upward slope, and most traditional control-chart techniques do little to account for such trends.

Time-Series Statistical Analysis of Single-Case Data

10

BASELINE PHASE

INTERVENTION PHASE

9

Behavioral Ratings

8 7 6

3-Si gma 2-Si gma

5 4

1-Si gma Ba s el i ne Mea n

3 2 1 0 1

3

5

7

9

11

13

15

17

19

21

23

Time Figure 11.3. Hypothetical example of behavioral ratings over time spanning a baseline phase and an intervention phase. As is common with many control-chart techniques, the mean and standard deviation estimates of the baseline phase are graphed and extended into the intervention phase. The data points in the intervention phase can then be evaluated relative to these extended mean and variability lines. 10

BASELINE PHASE

INTERVENTION PHASE

9

Behavioral Ratings

8 7 6 5 4 3 2 1 0 1

3

5

7

9

11

13

15

17

19

21

23

Time Figure 11.4. Celeration trend-line analysis using the same hypothetical data as Figure 11.3. The data points are graphed along with a ordinary least-squares trend line representing the baseline phase, and this line is extended into the intervention phase. Data points in the intervention phase falling on either side of the extended trend line are counted, and a binomial test can be applied.

However, in celeration trend-line analysis, the user draws a trend line based on the baseline-phase data and extends it into the intervention phase (see Figure 11.4). From here, the user can examine the

number of data points in the treatment phase falling on either side of the extended trend line. Simple binomial tests can be run on the number of data points falling on either side of the trend line, and the 261

Borckardt et al.

user can even calculate a p value representing the pattern (Franklin et al., 1996). If one assumes that each data point in the intervention phase has an equal probability of falling above or below the trend line (i.e., no phase effect), and only 1 of the 11 intervention phase data points fall below the extended trend line (Figure 11.4), then a simple binomial test would indicate that the probability of this occurring by chance alone is quite low (p = .005). Thus, the null hypothesis is rejected, and one may regard the change in behavior across phases as significant. Control-chart techniques are intuitive approaches that combine user-friendly and logical visual techniques with familiar parametric (or even nonparametric, in some cases) statistics. This statistical calculation portion of these approaches can be implemented with virtually any statistical software package. Unfortunately, these approaches (along with most visual and statistical approaches to short time-series data analysis) are negatively influenced by serial dependence (Matyas & Greenwood, 1990). When autocorrelation is present, Type I error rates become inflated and researchers are more likely to infer the presence of an effect when none is there. To address this concern, Fisher, Kelly, and Lomas (2003) conducted Monte Carlo simulations to evaluate the Type I and Type II error rates of additional control-chart techniques. Their results supported the use of a conservative dual-criterion (CDC) method. The first criterion for determining whether a set of time-series intervention data points deviates significantly from baseline is evaluated by first fitting a least-squares regression line through the baseline data and extending the line through the intervention phase (as in Figure 11.4). If a binomial test indicates that behavior has changed significantly from what would be predicted from the baseline trend, then the first criterion has been satisfied. The second criterion is evaluated by drawing a horizontal line (slope = 0) through the baseline mean and extending it through the treatment data, and then conducting a binomial test as before. The conservative modifier of the CDC involves adjusting both lines by 0.25 standard deviation in the direction of the expected treatment effect. If both binomial tests reveal a significant deviation from baseline, 262

then one may regard the difference as significant. Monte Carlo simulations indicated that the CDC method had acceptable rates of Type I error with data sets as small as five observations in baseline and treatment. The statistical power of the CDC was more acceptable than that of the ITSACORR, requiring a difference of 2 standard deviations (10 baseline and 10 treatment data points) to detect a difference with 80% probability. An added benefit of the CDC method is that Fisher et al. provided data suggesting that a service delivery staff could be efficiently trained to use the CDC method with minimal error.

Tryon’s C Tryon’s (1982) C is a time-series–specific test based on the mean-squares successive difference test. It is primarily designed to indicate the presence of trends in a data stream. If there is a consistent trend in the data, the value of C increases. One can use C to test for trend in the baseline and treatment phases. If no trend is found in the baseline, but one is detected in the treatment phase, it might indicate a shift in trend, presumably associated with the onset of the intervention. Although not offered in many commercial statistical packages, the formula for calculating C is straightforward and not too labor intensive and thus can be hand calculated easily (Franklin et al., 1996): n−1

C =1−

∑ (Y − Y i

i+1

)2

i=1

n

2∑(Yi − Y )

,

2

i=1

where Yi is the dependent variable at time i within – the time series (e.g., the baseline) and Y the mean of the time series. This statistic seems well suited to handle the common types of questions that a single-case researcher might ask. However, Tryon’s C does not appear to be a unique indicator of trends and shifts in data streams that are associated with process change alone. In other words, the presence of serial dependence in the data stream might contribute substantially to the appearance of a trend in a data stream, and C would not permit the user to

Time-Series Statistical Analysis of Single-Case Data

ifferentiate the artifacts caused by autocorrelation d from genuine trends. In fact, C has been suggested to be a better index of serial dependence than of the presence of trends in the data (Franklin et al., 1996; Robey et al., 1999). Additionally, C becomes much more powerful with longer data streams, and its significance test is highly influenced by series length.

Simulation Modeling Analysis Based loosely on bootstrapping methods (Wilcox, 2001), simulation modeling analysis (SMA) is specially designed for analyzing short, autocorrelated data streams typically found in clinical practice (Borckardt et al., 2008). SMA first estimates the autocorrelation coefficient for the data stream in question and corrects it for small-n bias using Crosbie’s (1993, 1994) method. Separately, SMA quantifies the relation between the dependent measure and a dummy-coded intervention (e.g., zeros for baseline and ones for treatment phase) using either a parametric or a nonparametric correlation coefficient. Next, SMA uses the phase n-sizes and autocorrelation estimate to generate thousands of random-normal data streams programmed with the same autocorrelation and n-size information as the clinical data stream in question. For each of these short, autocorrelated, randomly generated data streams, a correlation coefficient is calculated between it and the intervention phase vector. Each of these correlation coefficients that exceed the value of the correlation between the original data and the phase vector is flagged and counted. The empirical p value provided by SMA is the proportion of flagged correlations to the total number of random simulation data streams generated (determined by the user or set to 5,000 by default). Thus, SMA answers the question, how likely is it for a completely random data stream of the same length as yours, with the same amount of autocorrelation as yours, to evidence a correlation with your phase vector as large as your data stream did? This approach appears to offer good Type I error control with very short (approximately 5 data points per phase) autocorrelated data streams. It also appears to offer adequate power to detect much smaller effect sizes than ITSACORR (adequate

power [>.80] with 5 data points per phase with a 5–standard deviation effect size). The user does not have to implement simple phase vectors (zeros and ones) against which to correlate the dependent measure; rather, the user can establish independent measures of any type or pattern to correspond with the research design used (e.g., slopes, A-B-A-B designs [see Chapter 5, this volume], custom patterns). Software to implement SMA is freely available for Windows and Macintosh operating systems (http://clinicalresearcher.org/software.htm). However, SMA does not provide any information regarding the fidelity of the simulation data to the actual properties of the original data, and problems with the reliability of parameter estimates from short time-series data streams may have an impact on test accuracy. SMA does perform well with longer time-series data streams (>30 points per phase). Its performance has not yet been evaluated with skewed distributions or extreme outliers. Recommendations To date, there is no clear, best statistical approach to analyze short streams of single-case time-series data. Although many approaches have been proposed, and some appear better than others, they all have some limitations. When data sets are large enough (approximately 30 data points per phase), the researcher has a reasonably rich palette of statistical tools from which to choose, all of which are flexible and perform quite well (e.g., ARIMA, autoregression, GLS regression, HLM). However, when the data set is sparse, the analytic options dwindle. At this point, it is probably too early to endorse any specific statistical approaches for short, autocorrelated time-series data streams. Although robust regression techniques, CDC, ITSACORR, and SMA show promise, more work is needed to verify that these methods are capable of reliable Type I and Type II error performance with short, serially dependent streams of data. Conclusions The behavioral data yielded by single subjects in naturalistic and controlled settings likely contain 263

Borckardt et al.

information valuable to scientists and practitioners alike. Although some of the properties unique to this data complicate statistical analysis, progress has been made in developing specialized techniques for rigorous data evaluation. Certainly, we did not exhaustively cover all available time-series analysis techniques; rather, we highlighted a few time-series analytic genres. No perfect tests are currently available to analyze short autocorrelated data streams, but there are some promising approaches that warrant further development. From a clinical perspective, case-based time-series designs will likely not dissolve the formidable epistemological gap between practice and research, but their use might help bring the two disciplines within shouting distance of each other on a more regular basis. Enhanced communication pivots on compromise by both parties. For their part, practitioners must concede that replicable systematic observation is a necessary requirement of evidence; in turn, researchers must concede (indeed, rediscover) that carefully conducted ideographic studies can yield empirically sound findings about change processes. For some, these concessions will not be forthcoming. Still, a robust clinical science requires an ongoing productive discourse between a critical mass of researchers and practitioners. Herein is the twofold promise of case-based time-series designs. First, their careful use enables clinical researchers and practitioners to make contributions that are fully congruent with the evidence-driven ethos of scientific discourse. By rising above mere clinical anecdote, practitioners earn a more prominent and respected voice on matters of theory, research, policy, and training. The clinical setting can indeed become the natural laboratory envisioned by Westen and Bradley (2005) and Peterson (2004). Second, time-series designs yield findings especially pertinent to how therapeutic change unfolds, not in aggregate, but individually. Although almost entirely neglected by contemporary investigators, single-organism research of this kind has a luminous and storied lineage. By harnessing time-series designs and rigorous analytic techniques alongside group experimental methodologies, we might well accelerate the progress we are making in understanding the anatomy and mechanisms of behavioral change. 264

References APA Presidential Task Force on Evidence-Based Practice. (2005). Report of the 2005 Presidential Task Force on Evidence-Based Practice. Washington, DC: American Psychological Association. Arndt, S., Cizadlo, T., Andreasen, N. C., Heckel, D., Gold, S., & O’Leary, D. S. (1996). Tests for comparing images based on randomization and permutation methods. Journal of Cerebral Blood Flow and Metabolism, 16, 1271–1279. doi:10.1097/00004647199611000-00001 Barlow, D. H., & Hersen, M. (1984). Single case experimental designs. New York, NY: Pergamon Press. Bergin, A. E., & Strupp, H. H. (1970). The directions in psychotherapy research. Journal of Abnormal Psychology, 76, 13–26. Borckardt, J. J., Nash, M. R., Murphy, M. D., Moore, M., Shaw, D., & O’Neil, P. (2008). Clinical practice as natural laboratory for psychotherapy research: A guide to case-based time-series analysis. American Psychologist, 63, 77–95. doi:10.1037/0003-066X.63.2.77 Busk, P. L., & Maracuilo, R. C. (1988). Autocorrelation in single-subject research: A counter-argument to the myth of no autocorrelation. Behavioral Assessment, 10, 229–242. Center, B. A., Skiba, R. J., & Casey, A. (1985). A methodology for the quantitative synthesis of intra-subject design research. Journal of Special Education, 19, 387–400. doi:10.1177/002246698501900404 Chambless, D. L., & Hollon, D. D. (1998). Defining empirically supported therapies. Journal of Consulting and Clinical Psychology, 66, 7–18. doi:10.1037/0022006X.66.1.7 Chambless, D. L., & Ollendick, T. H. (2001). Empirically supported psychological interventions: Controversies and evidence. Annual Review of Psychology, 52, 685–716. doi:10.1146/annurev.psych.52.1.685 Crosbie, J. (1993). Interrupted time-series analysis with brief single-subject data. Journal of Consulting and Clinical Psychology, 61, 966–974. doi:10.1037/0022006X.61.6.966 Crosbie, J. (1994). Interrupted time-series analysis with short series: Why it is problematic; how it can be improved. In J. M. Gottman (Ed.), The analysis of change (pp. 361–395). Mahwah, NJ: Erlbaum. Ebbinghaus, H. (1913). Memory (H. A. R. C. E. Busenius, Trans.). New York, NY: Teachers College. Edgington, E. S., & Onghena, P. (2007). Randomization tests (4th ed.). Boca Raton, FL: Chapman & Hall/CRC. Ferron, J. (2002). Reconsidering the use of the general linear model with single case data. Behavior Research Methods, Instruments and Computers, 34, 324–331. doi:10.3758/BF03195459

Time-Series Statistical Analysis of Single-Case Data

Ferron, J., Foster-Johnson, L., & Kromrey, J. D. (2003). The functioning of single-case randomization tests with and without random assignment. Journal of Experimental Education, 71, 267–288. doi:10.1080/ 00220970309602066 Ferron, J., & Onghena, P. (1996). The power of randomization tests for single-case phase designs. Journal of Experimental Education, 64, 231–239. doi:10.1080/00 220973.1996.9943805 Ferron, J., & Sentovich, C. (2002). Statistical power of randomization tests used with multiple-baseline designs. Journal of Experimental Education, 70, 165, 178. Fisher, W. W., Kelley, M. E., & Lomas, J. E. (2003). Visual aids and structured criteria for improving visual inspection and interpretation of single-case designs. Journal of Applied Behavior Analysis; 36, 387–406. doi:10.1901/jaba.2003.36-387 Franklin, R. D., Allison, D. B., & Gorman, B. S. (1996). Design and analysis of single-case research. Mahwah, NJ: Erlbaum. Good, P. (1994). Permutation tests: A practical guide to resampling methods for testing hypotheses. New York, NY: Springer-Verlag. Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust statistics: The approach based on influence functions. New York, NY: Wiley. Hayes, S. C., Barlow, D. H., & Nelson-Gray, R. O. (1999). The scientist practitioner: Research and accountability in the age of managed care (2nd ed.). Needham Heights, MA: Allyn & Bacon.

Psychological Methods, 3, 104–116. doi:10.1037/ 1082-989X.3.1.104 Huitema, B. E., McKean, J. W., & McKnight, S. (1999). Autocorrelation effects on least-squares intervention analysis of short time series. Educational and Psychological Measurement, 59, 767–786. doi:10.1177/ 00131649921970134 Jacobson, N. S., & Christensen, A. (1996). Studying the effectiveness of psychotherapy: How well can clinical trials do the job? American Psychologist, 51, 1031–1039. doi:10.1037/0003-066X.51.10.1031 Jones, R. R., Vaught, R. S., & Weinrott, M. R. (1977). Time-series analysis in operant research. Journal of Applied Behavior Analysis, 10, 151–166. doi:10.1901/ jaba.1977.10-151 Jones, R. R., Weinrott, M. R., & Vaught, R. S. (1978). Effects of serial dependency on the agreement between visual and statistical inference. Journal of Applied Behavior Analysis, 11, 277–283. doi:10.1901/ jaba.1978.11-277 Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. New York, NY: Oxford University Press. Kazdin, A. E. (1992). Research design in clinical psychology (2nd ed.). Boston, MA: Allyn & Bacon. Kohler, W. (1925). The mentality of apes. New York, NY: Harcourt. Kotkin, M., Daviet, C., & Gurin, J. (1996). The Consumer Reports mental health survey. American Psychologist, 51, 1080–1082. doi:10.1037/0003-066X.51.10.1080

Hibbs, D. A. (1974). Problems of statistical estimation and causal inference in time-series regression models. In H. L. Costner (Ed.), Sociological methodology, 1973–1974 (Vol. 5, pp. 252–308). San Francisco, CA: Jossey-Bass.

Ludbrook, J. (1994). Advantages of permutation (randomization) tests in clinical and experimental pharmacology and physiology. Clinical and Experimental Pharmacology and Physiology, 21, 673–686. doi:10.1111/j.1440-1681.1994.tb02570.x

Hooton, J. W. (1991). Randomization tests: Statistics for experimenters. Computer Methods and Programs in Biomedicine, 35, 43–51. doi:10.1016/0169-2607(91) 90103-Z

Manly, B. F. J. (1997). Randomization, bootstrap, and Monte Carlo methods in biology. Boca Raton, FL: Chapman & Hall/CRC.

Howard, K. I., Krause, M. S., Caburnay, C. A., Noel, S. B., & Saunders, S. M. (2001). Syzygy, science, and psychotherapy: The Consumer Reports study. Journal of Clinical Psychology, 57, 865–874. doi:10.1002/ jclp.1055 Huber, P. J. (1972). Robust statistics. Annals of Mathe matical Statistics, 43, 1041–1067. doi:10.1214/ aoms/1177692459

Maruish, M. E. (2004). The use of psychological testing for treatment planning and outcomes assessment. Mahwah, NJ: Erlbaum. Matyas, T. A., & Greenwood, K. M. (1990). Visual analysis of single-case time series: Effects of variability, serial dependence, and magnitude of intervention of effects. Journal of Applied Behavior Analysis, 23, 341–351. doi:10.1901/jaba.1990.23-341

Huitema, B. E. (1985). Autocorrelation in applied behavior analysis: A myth. Behavioral Assessment, 7, 107–118.

Michael, J. (1974). Statistical inference for individual organism research: Mixed blessing or curse? Journal of Applied Behavior Analysis, 7, 647–653. doi:10.1901/jaba.1974.7-647

Huitema, B. E., & McKean, J. W. (1998). Irrelevant autocorrelation in least-squares intervention models.

Mielke, P. W., & Berry, K. J. (1994). Permutation tests for common locations among samples with unequal 265

Borckardt et al.

variances. Journal of Educational and Behavioral Statistics, 19, 217–236. Morgan, D. L., & Morgan, R. K. (2001). Single-participant research design: Bringing science to managed care. American Psychologist, 56, 119–127. doi:10.1037/ 0003-066X.56.2.119 Morrison, K. H., Bradley, R., & Westen, D. (2003). The external validity of controlled clinical trials of psychotherapy for depression and anxiety: A naturalistic study. Psychology and Psychotherapy: Theory, Research, and Practice, 76, 109–132. doi:10.1348/ 147608303765951168 Nathan, P. E., Stuart, S. P., & Dolan, S. L. (2000). Research on psychotherapy efficacy and effectiveness: Between Scylla and Charybdis? Psychological Bulletin, 126, 964–981. doi:10.1037/0033-2909. 126.6.964 Nielsen, S. L., Smart, D. W., Isakson, R. L., Worthen, V. E., Gregersen, A. T., & Lambert, M. J. (2004). The Consumer Reports effectiveness score: What did consumers report? Journal of Counseling Psychology, 51, 25–37. doi:10.1037/0022-0167.51.1.25 Pavlov, I. P. (1927). Conditioned reflexes. New York, NY: University Press. Peterson, D. R. (2004). Science, scientism, and professional responsibility. Clinical Psychology: Science and Practice, 11, 196–210. doi:10.1093/clipsy.bph072

Skinner, B. F. (1956). A case history in scientific method. American Psychologist, 11, 221–233. doi:10.1037/ h0047662 Stoumbos, Z. G., Reynolds, M. R., Ryan, T. P., & Woodall, W. H. (2000). The state of statistical process control as we proceed into the 21st century. Journal of the American Statistical Association, 95, 992–998. doi:10.2307/2669484 Stromberg, A. J. (2004). Why write statistical software? The case of robust statistical methods. Journal of Statistical Software, 10(5). Retrieved from http:// www.jstatsoft.org/v10/i05 Strupp, H. H. (1996). The tripartite model of the Consumer Reports Study. American Psychologist, 51, 1017–1024. doi:10.1037/0003-066X.51.10.1017 Suen, H. K. (1987). On the epistemology of autocorrelation in applied behavior analysis. Behavioral Assessment, 9, 113–124. Task Force for Promotion and Dissemination of Psychological Procedures. (1995). Training in and dissemination of empirically-validated psychological treatments: Report and recommendations. Clinical Psychologist, 48, 3–23. Tryon, W. W. (1982). A simplified time-series analysis for evaluating treatment interventions. Journal of Applied Behavior Analysis, 15, 423–429. doi:10.1901/ jaba.1982.15-423

Pfadt, A., & Wheeler, D. J. (1995). Using statistical process control to make data-based clinical decisions. Journal of Applied Behavior Analysis, 28, 349–370. doi:10.1901/jaba.1995.28-349

VandenBos, G. R. (1996). Outcome assessment of psychotherapy. American Psychologist, 51, 1005–1006. doi:10.1037/0003-066X.51.10.1005

Recchia, M., & Rocchetti, M. (1982). The simulated randomization test. Computer Programs in Biomedicine, 15, 111–116. doi:10.1016/0010-468X(82)90062-9

Westen, D., & Bradley, R. (2005). Empirically supported complexity. Current Directions in Psychological Science, 14, 266–271. doi:10.1111/j.0963-7214.2005.00378.x

Robey, R. R., Schultz, M. C., Crawford, A. B., & Sinner, C. A. (1999). Single-subject clinical-outcome research: Designs, data, effect sizes, and analyses. Aphasiology, 13, 445–473. doi:10.1080/026870399402028

Westen, D., & Morrison, K. (2001). A multidimensional meta-analysis of treatments for depression, panic, and generalized anxiety disorder: An empirical examination of the status of empirically supported therapies. Journal of Consulting and Clinical Psychology, 69, 875–899. doi:10.1037/0022-006X.69.6.875

Seligman, M. E. P. (1995). The effectiveness of psychotherapy: The Consumer Reports study. American Psychologist, 50, 965–974. doi:10.1037/0003-066X. 50.12.965 Sharpley, C. F. (1987). Time-series analysis of behavioural data: An update. Behaviour Change, 4, 40–45. Sharpley, C. F., & Alavosius, M. P. (1988). Autocorrelation in behavioral data: An alternative perspective. Behavioral Assessment, 10, 243–251. Sidman, M. (1960). Tactics of scientific research. New York, NY: Basic Books. Skinner, B. F. (1938). The behavior of organisms. New York, NY: Appleton-Century-Crofts.

266

Watson, J. B. (1925). Behaviorism. New York, NY: Norton.

Westen, D., Novotny, C. M., & Thompson-Brenner, H. K. (2004). The empirical status of empirically supported psychotherapies: Assumptions, findings, and reporting in controlled clinical trials. Psychological Bulletin, 130, 631–663. doi:10.1037/0033-2909. 130.4.631 Wilcox, R. R. (2001). Fundamentals of modern statistical methods: Substantially improving power and accuracy. New York, NY: Springer-Verlag. Wilson, J. B. (2007). Priorities in statistics, the sensitive feet of elephants, and don’t transform data. Folia Geobotanica, 42, 161–167. doi:10.1007/BF02893882

Chapter 12

New Methods for Sequential Behavior Analysis Peter C. M. Molenaar and Tamara Goode

Sequential behavior analysis is firmly rooted in learning theory (Sharpe & Koperwas, 2003). The identification and analysis of antecedents and consequences of target behavior are basic to behavior theory; sequential behavior analysis arranges and assesses explicit temporal dependencies between present and past behavior, and present and future behavior, using time-series data. Behavior theory is currently expanding in exciting new directions such as artificial neural network simulation (Schmajuk, 1997) and neuroscience (Timberlake, Schaal, & Steinmetz, 2005). This expanded scope of behavior theory presents new challenges to sequential behavior analysis, as do recent developments in psychometrics and measurement. In this chapter, we review new methods for sequential behavior analysis that can be instrumental in meeting these challenges. The earliest methods for analyzing time-series data were simple curve-fitting procedures, in which slopes and intercepts were compared pre- and postintervention. These methods are greatly limited, owing to ubiquitous violations of the assumption that data collected in two different conditions are independent. A generating function procedure has also been used, in which signal (process) is separated from noise (error). Curve-fitting procedures and generating function procedures both quantify the dependence of present observations on those preceding (also known as lead–lag relations; Gottman, McFall, & Barnett, 1969). Newer methods for

sequential behavior analysis, such as those we describe here, no longer require the quantification of these dependencies in this manner. The inclusion of the state in the model, which we discuss in the next section, both simplifies these models and allows for the estimation of more complex systems and processes. In the following, our focus is new statistical models describing the sequential dependencies of behavior processes. The statistical models are derived as special instances of a general dynamic systems model—the so-called state space model (SSM). The SSM allows for the estimation of a complex system that includes inputs, outputs, and states. A simple but useful example of such a system is one that many students experience in an introductory psychology lab: that of training a rat to press a lever. An input could be a tone signaling that reinforcement is available, the state could be hunger, and the output could be the pressing of the lever. The general SSM not only provides for a transparent classification of a wide range of behavioral process models but also enables the specification of a general estimation scheme to fit these models to the data. Moreover, and more important, the general SSM constitutes an excellent paradigm to introduce optimal control of behavior processes—a powerful extension of sequential behavior analysis that, to the best of our knowledge, is considered here for the first time. In behavior analysis, optimal control is understood to mean the least variability possible in behavior as a

We gratefully acknowledge funding provided by National Science Foundation Grant 0852147, which made Molenaar’s work possible, and the University Graduate Fellowship provided by the Pennsylvania State University, which made Goode’s work possible. DOI: 10.1037/13937-012 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

267

Molenaar and Goode

function of maximally controlling the environmental variables. Here, we are using optimal control as engineers use the term, which is somewhat similar. As in behavior analysis, variability in the output, or behavior that is being modeled, is minimized. However, researchers are also able to specifically model the desired level of the behavior and estimate the level of those variables that influence the behavior. Additionally, researchers can set parameters to minimize cost of any of those input variables, that is, determine the minimum level of input (e.g., medication, reinforcement) that is necessary to obtain the desired level of behavior. Thus, in this chapter we discuss the SSM model, which has been applied, albeit sparingly, to sequential behavior analysis. We conclude by introducing optimal control models, which have not to our knowledge been applied to sequential behavior analysis but have the potential to more precisely model variables controlling behavior as noted earlier. This model can then be applied to whatever process is under examination, and behavior can be controlled much more precisely. The presentation in this chapter is heuristic, emphasizing the interpretation of process models while providing published references for further details. Some formal notation is indispensible, and we define it in the course of the development of the different methods. We first discuss, however, individual versus group analyses, which has important implications for any sequential analysis of behavioral processes. Individual Versus Group Analysis The question of the relation between individual and aggregated models of behavior—the application of group findings to an individual—has a long history. Sidman (1960) reviewed the older literature, ranging from Merrell (1931) to Estes (1956), and built a case for single-case designs while criticizing groupdata experiments (see also Chapters 5 and 7, this volume). Hannan’s (1991) monograph is devoted to the issue of aggregation and disaggregation in the social sciences, that is, the relation (if any) between individual and group analyses. This is also a major theme in macroeconomics, in which similar negative conclusions about the relation between results 268

obtained in individual and group analyses have been noted (Aoki, 2002). This lack of lawful relations between individual and aggregated data pertains not only to mean trends but to all aspects of probability distributions (variances, correlations, etc.). From a theoretical point of view, a main emphasis of developmental systems theory (Ford & Lerner, 1992) is that developmental processes should be investigated at the individual level, not the group level. In this chapter, we deal with analysis at the individual level, on the basis of these and other findings that illustrate the problems of relations between aggregated and individual findings (for a mathematical discussion of this problem and the theory of ergodicity, see Molenaar, 2004, 2008). Dynamic Systems Models and General State Space Models: A Scheme for Sequential Analysis In what follows, we refer to the unit of analysis (individual subject, dyad, triad, etc.) as a system. The behavior of a system unfolds in time, constituting a behavior process. Hence, the appropriate class of models for behavior processes is the class of dynamic systems models, that is, models that can represent change over time. That change may be linear over the course of the observation, changing consistently from one unit of time to the next; however, nonlinear change is more common. For example, most representations of a complete learning process are represented by an initial asymptote at 0, during which the target behavior is not observed at all. During the learning process, the target behavior is observed more and more frequently; finally, the target behavior asymptotes again when fluency is achieved. This process is most often represented by the nonlinear logistic function (Howell, 2007). The class of dynamic systems models is large, encompassing linear and nonlinear time-series models, artificial neural network models, and many more. A considerable variety of dynamic systems models, however, can be conceived of as special cases of the SSM. In particular, most dynamic systems models that are of interest for sequential analysis of behavioral processes are special cases of the SSM. In this section, the SSM serves as the

New Methods for Sequential Behavior Analysis

organizing principle in presenting a range of dynamic models for the sequential analysis of behavioral processes.

different from zero, and specifically, for every oneunit change in the latent state at time t − 1, one would expect to see a change in the latent state at time t of 0.6 units.

Orientation and Notational Conventions In the material that follows, we denote matrices (i.e., two-dimensional data structures [arrays] composed of rows and columns) by capital boldface letters and vectors (one-dimensional arrays; i.e., a single column of rows) by lowercase boldface letters. The superscript T denotes transposition (interchanging rows and columns). Because vectors are column vectors, transposing a vector makes it a row vector. Random processes are not denoted in a special way; which processes are random is defined in the text. Manifest (observed) processes are denoted by Roman letters; latent processes are denoted by Greek letters. A manifest process is that which is measured (e.g., the number of problem behaviors a child with autism exhibits in a period of time). A latent process is a process, state, or class of behavior that one is measuring with manifest variables. For example, for these models one may label aggression a latent state and measure it by observing several behaviors, such as hair pulling, hitting, biting, and kicking. A more technical definition is that the latent state contains all available information about the history of the relevant behavior or class of behavior. For example, Kuhn, Hardesty, and Luczynski (2009) conducted a functional analysis of antecedent social events and their effect on motivation to access preferred edible items by an individual with severe problem behavior. If one was to analyze these data using an SSM, the manipulation of antecedent social events by having the therapist consume food items would be the inputs in the SSM and the change in the participant’s motivation, as reflected by decreased socially appropriate responding, would be the latent state in the SSM. We also discuss parameter estimation, which is the quantification of the relation between two variables and the assessment of whether that relation is significant. In the example for the linear Gaussian SSM presented in the next section, the relation between the latent state at time t − 1 and the latent state at time t is 0.6 and is significant, which means that the relation between the two time points is

Linear Gaussian SSM We start with the linear Gaussian SSM. The term Gaussian refers to a normally distributed data set; a common example of a Gaussian distributed data set is the heights of adult men. An example of a simple linear Gaussian model is the relation of adult men’s height and weight; generally speaking, as height increases, weight increases proportionally at a constant rate. The linear Gaussian SSM consists of two submodels: the measurement submodel and the dynamic submodel. As we describe in subsequent sections of this chapter, this decomposition into measurement and dynamic submodels applies to each SSM, not just the Gaussian SSM. The measurement submodel of the Gaussian SSM is made up of a Gaussian (normally distributed) behavioral process that has been observed at equidistant measurement occasions (e.g., 20-minute observation sessions conducted at the same time each day), a latent Gaussian state process (to be explained shortly) η, a manifest (observed) input series u, and a Gaussian error process . Thus, the measurement submodel specifies that the behavior process y at time t is a function of the individual mean (μ), the latent process, the input, and error: y(t) = μ + Λη(t) + Πu(t) + (t),

(1a)

with Λ and Π containing regression coefficients (i.e., values allowed to vary so as to provide a better fit to the data; these values may be interpreted in ways discussed later). Equation 1a can be interpreted as a traditional factor model of behavior, one in which the latent state process η(t) contains all available information about the history of the behavior process y(t). The dynamic submodel of the linear Gaussian SSM is expressed as η(t) = κ + βη(t − 1) + Γu(t) + ζ(t).

(1b)

The dynamic model links the previous time point t − 1 to the current time point t, which is to say that 269

Molenaar and Goode

the previous state affects the present state. We specify the latent Gaussian process η at time t (i.e., the left side of Equation 1b) as a function of the mean κ, η at the previous time point (t − 1), input u at time t, and error ζ. The matrix β contains regression coefficients linking the past state process η(t − 1) to the current state process η(t). The matrix Γ also contains regression coefficients, values that quantify the effect of the input u(t) on the state process η(t). The degree to which one cannot predict η(t) given η(t − 1) is represented by ζ(t). The vector κ contains mean levels (intercepts). Examples. For the first example, we refer to Glass, Willson, and Gottman (1975) and their argument for interrupted time series as an alternative to experimental designs. Here we have a simulated single-case univariate time series y(t) obtained during a baseline condition lasting from t = 1 until t = T1 (the end of the baseline condition) and an experimental condition lasting from t = T1 + 1 until T2. We set T1 = 50 and T2 = 100. For example, this simulation could represent the number of appropriate responses emitted before and after the introduction of some behavioral intervention with a child diagnosed with autism. The mean μ is 1 for the baseline condition; our input u(t) = 0 for the baseline condition (i.e., the absence of the intervention

manipulation) and 1 for the introduction of the intervention condition. These specifications yield the following special case of the linear Gaussian SSM, in which Λ = 1 and (t) = 0; the state process is being treated as observed (i.e., no free parameter is used in scaling the Gaussian state process and no error); thus, we can drop them from the measurement submodel: Measurement submodel: y(t) = η(t) + Πu(t). Dynamic submodel: η(t) = βη(t − 1) + ζ(t). (2) Similarly, κ and Γu(t) are dropped from the dynamic submodel because the model is centered so that the mean is 0; thus, κ = 0 and u(t) = 0 for the baseline condition and 1 for the experimental condition. Figure 12.1 illustrates these simulated data. The continuous line depicts y(t) (our behavioral outcome); the broken line, u(t) (our input, which, in our example, would correspond to pre- and postintervention). The obtained estimates of the parameters (designated with a caret above each parameter) in Equation 2 are ˆ = 0.9 and ˆ = 0.6, and the estimated variance ˆ 2 of error ζ(t) is ˆ 2 = 0.9 (ˆ is the estimate of the effect on the outcome of the experimental manipulation). All parameter estimates are significant at nominal α = .01 (details not presented),

6 5

Magnitude

4 3 2 1 0 y(t)

-1

u(t)

-2 -3 0

10

20

30

40

50

60

70

80

90

Time

Figure 12.1. Single-subject time series with experimental manipulation at time = 51. 270

100

New Methods for Sequential Behavior Analysis

which implies that ˆ = 0.9 is a significant effect of the experimental intervention condition and that ˆ = 0.6 is a significant effect of η at the previous time point t − 1; in other words, there is a significant relation between the quantified value of the latent state η at time t − 1 on time t. For example, if the state being targeted was aggressive behavior, as measured by instances of hitting, kicking, and biting, the significant result of η at the previous time point t − 1 would indicate that the level of aggressive behavior at time point t − 1 has a significant relation to the level of aggressive behavior at time point t. An example of data from applied behavior analysis in which this model might be applied is in Kuhn et al. (2009). They conducted a functional analysis of antecedent social events and their effect on motivation to access preferred edible items by an individual with severe problem behavior. Manipulating antecedent social events by having the therapist consume food items (inputs, u[t]) changed the participant’s motivation (as measured by an increase or decrease in socially inappropriate responding), a latent state, η. Although Kuhn et al.’s analysis did not explore a sequential relation between motivation at time t − 1 and time t, it is possible that analysis of these data using the linear Gaussian SSM would identify any such dependencies, assuming sufficient occasions of data (a total of 100 occasions or more) were collected to perform the analysis. This first example involves what is perhaps the simplest possible instance of a linear Gaussian SSM. The linear Gaussian SSM can be straightforwardly extended to continuous time (Simon, 2006). Implementation. These analyses can be implemented in the Fortran computer program MKFM6, which uses an expectation-maximization (EM) algorithm to fit linear Gaussian SSMs. The EM algorithm is more efficient than the more commonly used maximum likelihood algorithm. This computer program can also estimate multivariate Gaussian time series obtained with multiple subjects in a replicated time-series design.

Hidden Markov Model A hidden Markov model (HMM) is similar in many ways to the linear Gaussian SSM. One similarity is

that the HMM would be used when the outcome, y(t), is known but the state is unknown, or hidden (i.e., latent, as in the Gaussian linear SSM example); a Markov model is one in which the state can be determined simply by the value of y(t). Recalling that in a behavioral analysis, y(t) is the behavioral outcome at a given point in time (e.g., one of the data points in Figure 12.1), the experimenter is interested in the organism’s state so as to predict the next behavioral event y(t + 1), or the next state. The HMM is used to estimate the probability of transition from one state to another, given the observed data. An HMM could be used to analyze the transitions between different categories of behavior; for example, observational data could be gathered over time of problem behavior (e.g., aggression, self-injurious behavior, and disruption). These data could then be analyzed using an HMM, which would result in the probabilities of transition from one state (category of problem behavior) to another. The resulting probabilities could be used to more precisely tailor a behavioral intervention. For example, if the target behavior was self-injurious behavior, and an HMM revealed that the probability of transition from aggression to self-injurious behavior was much higher than the probability of transition from aggression to disruption, intervention could be targeted more precisely before the occurrence of self-injurious behavior. The defining equations for the HMM are similar to Equations 1a and 1b defining the linear Gaussian SSM. For the HMM, y(t) is a categorical process with p categories—the number of states. The latent state process η(t) is also a categorical process. The standard HMM can be defined as follows: Measurement submodel: y(t) = Λη(t) + (t). (3a) Dynamic submodel: η(t) = βη(t − 1) + ζ(t). (3b) Λ is the probability that y(t) is a certain category, given a value of η(t). β is the conditional probability that η(t) is a certain category, given a value of η(t − 1). The measurement error process (t) and the process noise ζ(t) are categorical processes with a priori fixed properties (Elliott, Aggoun, & Moore, 1995). Examples. Until now, applications of HMMs for the sequential analysis of behavior processes have been 271

Molenaar and Goode

rare. Notable exceptions are Courville and Touretzky (2002) and Visser, Raijmakers, and Molenaar (2007). We discuss Visser, Raijmakers, and Molenaar’s (2002) application of the HMM to implicit learning, or learning without awareness, of a rule-governed sequence, using simulated data to illustrate. The data were simulated according to a state process η(t) with two categories (q = 2) to represent two states, an unlearned state and a learned state. The observed process y(t) was simulated with three categories (p = 3) to represent three possible responses; that is, three different buttons that participants could press in sequence during the task to represent the rule governing the observed sequence. In this instance, the probabilities for η(t) given the values of η(t − 1) contained in the matrix β are shown in Table 12.1. Thus, when the previous state is the guessing state (η[t − 1] = 1), the probability that the current state is also the unlearned state is .9; when the previous state is the learned state (η[t − 1] = 2), the probability that the current state is the guessing state is .3.

We can similarly understand the elements of the Λ matrix as shown in Table 12.2. When the present state is the guessing state (η[t] = 1), the probability that the outcome is 1 (continued guessing) is .7; when the present state is the learned state (η[t] = 2), the probability that the outcome is 1 is 0; when the present state is the guessing state (η[t] = 1), the probability that the outcome is 2 is 0; when the present state is the unsure state (η[t] = 3), the probability that the outcome is 1 is .3. Figure 12.2 depicts the observed process y(t) and latent state process η(t). The category values of the latent state process have been increased by four units for clarity. The estimates of the conditional probabilities specified in Tables 12.1 and 12.2 are very close to their true values (see Visser et al., 2002, for further details). As can be observed by the plot of the data, when the state is in the learned category (a value of 6 on the y-axis), the response emitted is more likely to be in Categories 2 or 3.

Table 12.1

Table 12.2

Probabilities of η(t) Given η(t − 1) = 1 and η(t − 1) = 2 (t) =

(t − 1) = 1

(t − 1) = 2

.9 .1

.3 .7

1 2

Probabilities of y(t) Given η(t) = 1 and η(t) = 2 y(t) = 1 2 3

(t) = 1

(t) = 2

.7 0 .3

0 .4 .6

7

Category Number

6 5 4 3 2 1

Y η

0 0

10

20

30

40

50

60

70

80

Time

Figure 12.2. Hidden Markov model process y and latent process η. 272

90

New Methods for Sequential Behavior Analysis

An HMM could be used to analyze the probability of transition among different categories of problem behavior in an individual with mental retardation, as in Kuhn et al. (2009). Observational data of stereotypic movements, self-injurious behavior, and disruptive behavior could be collected over time. If the target behavior is disruptive behavior, an HMM may reveal, for example, that the probability of transitioning from stereotypic movement to disruptive behavior is lower than the probability of transitioning from self-injurious behavior to disruptive behavior. An intervention could then be designed accordingly and targeted more precisely. Parameter estimation. Parameter estimation in HMMs and hidden hierarchical Markov models can be accomplished by means of the EM algorithm mentioned in the Linear Gaussian SSM section. This algorithm, when used for HMMs and hidden hierarchical Markov models, is known as the Baum–Welch (forward–backward) algorithm. It has been implemented in the R software package DEPMIXS4. Visser and Speekenbrink (2010) provided illustrative examples and instruction. Fraser (2008) presented a transparent derivation of this algorithm.

Generalized Linear Dynamic Model Another instance of the SSM is the generalized linear dynamic model (GLDM). In the GLDM, the manifest process y(t) is categorical (as in the standard HMM) and the latent state process η(t) is linear (as in the linear Gaussian SSM). This model can be conceived of as a dynamic generalization of log-linear analysis and serves as an alternative to the HMM. The GLDM is defined as y(t) = λ[η(t)] + ε(t)

(4a)

η(t) = κ + βη(t − 1) + Γu(t) + ζ(t).

(4b)

and

These formulas are similar to the formulas in the HMM and Linear Gaussian SSM sections. In the GLDM, λ[η(t)] is a nonlinear function of η(t), the socalled response function (Fahrmeir & Tutz, 2001). The matrix β again contains regression coefficients linking the prior state process η(t − 1) to the current state process η(t). The matrix Γagain also contains

regression coefficients. The process noise ζ(t) again represents the lack of predictability of η(t) given η(t − 1). The vector κ contains mean levels (intercepts). Example. An example of data from the behavior analysis literature that could be analyzed using this approach is found in Cunningham (1979). The prediction of extinction or conditioned suppression of a conditioned emotional response y(t) as a function of sobriety η(t) could be assessed. Cunningham trained a conditioned emotional response in sober rats. Extinction was conducted under conditions of high doses of alcohol. Conditioned suppression returned when those rats were in a sober condition. The GLDM could be used to determine at what level of sobriety (e.g., different doses of alcohol) conditioned suppression returns. Parameter estimation. Parameter estimation in the GLDM is accomplished by means of the EM algorithm described in the Linear Gaussian SSM section. The nonlinear response function in Equation 4a necessitates a special approach to carry out the estimation of η(t) (Fahrmeir & Tutz, 2001). A beta version of the computer program concerned can be obtained from Peter C. M. Molenaar.

Linear Gaussian SSM With Time-Varying Parameters The SSMs considered in the previous sections all have parameters that are constant—consistent in both level and pattern—over time. Yet learning and developmental processes often have changing statistical characteristics that can only be captured accurately by allowing model parameters to be time varying. For example, erratic environments with inconsistent contingencies have been associated with the development of attachment problems in children (Belsky, Garduque, & Hrncir, 1984). Several model types exist for the sequential analysis of processes with changing statistical characteristics (so-called nonstationary processes), several of which are described by Priestley (1988) and Tong (1990). Here, we focus on a direct generalization of the linear Gaussian SSM, allowing its parameters to be time varying in arbitrary ways. This approach is based on related work by Young, McKenna, and Bruun (2001) and Priestley (1988). 273

Molenaar and Goode

For this generalization, we introduce a new vector to our formulas, θt, which contains all unknown parameters. Notice that θt is time varying, as indicated by the subscript t. The individual parameters in θt may, or may not, be time varying; one of the tasks of the estimation algorithm is to determine which individual parameters in θt are time varying and, if so, how they vary in time. With this specification, the linear Gaussian SSM with time-varying parameters is obtained as the following generalization of Equations 1a and 1b: y(t) = μ[θt] + Λ[θt]η(t) + Π[θt]u(t) + ε(t), (5a) η(t) = κ[θt] + β[θt]η(t − 1) + Γ[θt]u(t) + ζ(t),

(5b)

θt = θt − 1 + ξ(t).

(5c)

and The interpretation of the parameter matrices in Equations 5a and 5b is the same as for Equations 1a and 1b, but now each parameter matrix or vector depends on elements of the vector θt in which all unknown parameters are collected. For instance, Λ[θt] specifies that the unknown parameters in the matrix Λ are elements of the vector θt of timevarying parameters. Equation 5c is a new addition, tracking the possibly time-varying behavior of the

unknown parameters. The process ξ(t) represents lack of predictability of θt. If an element of ξ(t) has zero (or small) variance, then the corresponding parameter in θt is (almost) constant in time, whereas this parameter will be time varying if the variance of this element of ξ(t) is significantly different from zero. Examples. One example of the application of this model is based on data gathered from a father and his biological son. This application could allow an intervention to be planned and implemented to reduce escalating negative interactions that might improve the long-term relationship between father and son. These data were presented by Molenaar and Campbell (2009) in what appears to be the first application of this model to psychological process data. The data concern a three-variate time series of repeated measures of emotional experiences of sons with their biological fathers during 80 consecutive interaction episodes over a period of about 2 months. The self-report measures collected at the conclusion of each episode were involvement, anger, and anxiety. Here we focus on a single biological son, whose data are depicted in Figure 12.3. The following instance of Equations 5a through 5c was used to analyze the data in Figure 12.3: y(t) = η(t),

Figure 12.3. Observed series for biological son. From “Analyzing Developmental Processes on an Individual Level Using Nonstationary Time Series Modeling,” by P. C. M. Molenaar, K. O. Sinclair, M. J. Rovine, N. Ram, and S. E. Corneal, 2009, Developmental Psychology, 45, p. 263. Copyright 2009 by the American Psychological Association. 274

(6a)

New Methods for Sequential Behavior Analysis

η(t) = β[θt − 1]η(t − 1) + ζ(t),

(6b)

θt = θt −1 + ξ(t).

(6c)

and The matrix β[θt] contains the possibly timevarying regression coefficients linking η(t) to η(t − 1). Here we only consider the part of the model explaining the involvement process. This part can be represented as Inv(t) = β1(t − 1) * Inv(t − 11) + β2(t − 11) * Ang(t − 11) + β3(t − 11) * Anx(t − 11) + ζ(t), (7) where Inv = involvement, Ang = anger, and Anx = anxiety. Thus, involvement at time t is a function of involvement at the previous time point (t − 1), anger at the previous time point (t − 1), and anxiety at the previous time point. The beta coefficients at time t in Equation 7 indicate that involvement, anger, and anxiety vary with time; the coefficient quantifies the relation between time t and time t − 1. Figure 12.4 shows the estimates of β1(t), β2(t), and β3(t) across the 80 interaction episodes and illustrates the intertwined nature of involvement, anger, and anxiety. β1(t), which quantifies the effect of involvement at time t − 1 on involvement at time t decreases across the initial half of the interaction

episodes, after which it stabilizes during the final half of the interaction episodes. β3(t) in Equation 7, which quantifies the effect of anxiety at the previous interaction sequence on involvement at the next interaction sequence, increases across the initial half of the interaction episodes, after which it stabilizes during the final half of the interaction episodes. It is noteworthy that β3(t) is negative during the initial interaction episodes, but positive during the later interaction episodes. When β3(t) is negative, there is a negative relation between anxiety at time t − 1 and involvement at time t, that is, increased anxiety predicts decreased involvement at each subsequent interaction or decreased anxiety predicts increased involvement at each subsequent interaction. When β3(t) is positive, there is a positive relation between anxiety at time t − 1 and involvement at time t, that is, increased anxiety predicts increased involvement at each subsequent interaction or decreased anxiety predicts decreased involvement at each subsequent interaction. When β3(t) is zero, anxiety at time t − 1 has no effect on involvement during the subsequent interaction. Additional procedural details and data can be found in Molenaar, Sinclair, Rovine, Ram, and Corneal (2009). An example of data from the behavior analysis literature that could be analyzed using this approach

Figure 12.4. Involvement at t + 1. From “Analyzing Developmental Processes on an Individual Level Using Nonstationary Time Series Modeling,” by P. C. M. Molenaar, K. O. Sinclair, M. J. Rovine, N. Ram, and S. E. Corneal, 2009, Developmental Psychology, 45, p. 267. Copyright 2009 by the American Psychological Association. 275

Molenaar and Goode

can be found in McSweeney, Murphy, and Kowal (2004). They examined habituation as a function of reinforcer duration. Rate of responding y(t) as a function of habituation η(t) as habituation varies over time β[θt − 1]η(t −1) could be assessed more precisely using a linear Gaussian SSM with timevarying parameters. Parameter estimation. Parameter estimation in Equations 5a through 5c is accomplished by the EM algorithm. The estimation step requires considerable reformulation of the original model, resulting in a nonlinear analogue. A beta version of the Fortran program with which the linear Gaussian SSM with time-varying parameters is fitted to the data can be obtained from Peter C. M. Molenaar.

SSMs in Continuous Time The class of SSMs in continuous time is large, including (non)linear stochastic differential equations, Fokker-Planck equations, and so forth, and therefore cannot be reviewed within the confines of this chapter. One application of SSMs in continuous time, the point process, could be applied to data reported by Nevin and Baum (1980). Point processes can be used to determine the timing of a repeated discrete response. Nevin and Baum discussed probabilities of the termination and initiation of a responding burst in free-operant behavior. An assumption of one model discussed is that, during a burst, interresponse times are approximately constant. A point process could test this assumption as well as whether the transition from responding to termination is a function of that timing. Rate of responding and rate of reinforcement data could be analyzed using the point process, and the results would indicate whether interresponse times are constant or varied during a burst and whether the transition from responding to termination is a function of constant interresponse time or varied interresponse time. For excellent reviews of SSMs in continuous time, see, for instance, Gardiner (2004) and Risken (1984). The linear Gaussian SSMs considered here are special in that the analogues in continuous time share most of their statistical properties. Molenaar and Newell (2003) applied nonlinear stochastic 276

differential equations to analyze bifurcations (sudden qualitative reorganizations) in human motion dynamics. Haccou and Meelis (1992) gave a comprehensive treatment of continuous time Markov chain models for the analysis of ethological processes. Remarks on Optimal-Control-ofBehavior Processes The concepts of homeostasis and control have long played a role in behavior theory (Carver, 2004; Johnson, Chang, & Lord, 2006; McFarland, 1971). This role, however, has been confined to the theoretical level. If one of the SSMs considered earlier has been fitted to a behavioral process, then one can use this model to optimally control the process as it unfolds by means of powerful mathematical techniques. Here, we present optimal control as a computational technique to steer a behavioral process to desired levels. The fitting of this model to a complex behavioral process, such as the development of aggressive behavior, allows the precise estimation of parameters that can then be applied to ameliorate the problem behavior and correct those problems in an efficient manner. To introduce the concept of computational feedback, consider the following simple behavior process model in discrete time: y(t + 1) = βy(t) + γu(t) + ε(t + 1).

(8)

Here y(t) denotes a manifest univariate behavior process, u(t) is a univariate external input, and (t + 1) is process noise. From an intervention perspective, the desired level of the behavioral process is y*, and achieving this level is obtained by experimentally manipulating u(t) (e.g., medication, reinforcement contingency). Given Equation 8 and y*, the goal of the computational feedback problem is to set the value of u(t) (i.e., the level of the value of the independent variable) in a way that minimizes the difference between the expected value of y(t) and y*. To do this requires an estimate of the expected value of the behavior of interest given the extant record of that behavior up to time t − 1: E[y(t|t − 1)]. The deviation between expected and desired behavior is referred to as a cost function. A simple cost function

New Methods for Sequential Behavior Analysis

is C(t) = E[y(t|t − 1) − y*]2. Again, the external input u(t) will be chosen in a way that minimizes C(t) for all times t. Because E[y(t|t – 1)] = βy(t) + γu(t), the optimal level of the independent variable, u*(t), is given by u*(t) = [−βy(t) + y*] / γ.

(9)

Equation 9 is a feedback function in which the optimal input u*(t), which will codetermine the value y(t + 1), depends on y(t). Thus, the optimal input for time t + 1 to steer the behavior y to an optimal value y* is dependent on the value of y at time t. Examples. Figure 12.5 depicts a manifest behavioral process y(t), t = 1, . . ., 50 (labeled Y). This y(t) is uncontrolled (i.e., not subject to any experimental manipulation) and was simulated according to Equation 8) with β = 0.7 and γ = 0.9. The goal of our intervention will be to achieve a desired level of behavior, which we have set at zero, y* = 0. Application of Equation 9 yields the u*(t) values shown in the optimal external input function in Figure 12.5. Application of these optimal values of the independent variable, u*(t), in Equation 8 yields the optimally controlled behavioral process that is labeled optimally controlled behavioral process in Figure 12.5. It is evident from Figure 12.5 that the

deviation of the optimally controlled behavioral process from y* = 0 is much smaller than the analogous deviation of the behavioral process without control Y. This example of optimal feedback control is simple in several respects. The behavioral process to be controlled is univariate, or a single dependent variable; in actuality, both are often multivariate. For example, escalating aggression may be measured by frequency and co-occurrence of several other responses, such as hair pulling, slapping, hitting, kicking, and scratching. Also, Equation 8 assumes a known linear model. In reality, the model can be any one of the SSMs considered previously, with parameters that are unknown and have to be estimated. In our simple example, the cost function is a quadratic function that does not penalize the costs of exercising control actions. In general, both the deviations of the behavioral process from the desired level and the external input (e.g., medication, reinforcement contingency) are penalized according to separate cost functions that can have much more intricate forms. This example, however, serves to illustrate that control of behavior can be quantified and estimated, thus allowing for much more precise and efficient manipulations of experimental variables designed to modify behavior.

15

Levels of Control and Response

10 5 0 -5 -10 Y

-15

UO YO

-20 0

5

10

15

20

25

30

35

40

45

Time

Figure 12.5. Simple optimal feedback control. UO = optimal external input; Y = behavioral process without control; YO = optimally controlled behavioral process. 277

Molenaar and Goode

An example from the behavior-analytic literature in which this analysis might be applied is an extension of the Kuhn et al. (2009) study mentioned earlier in this chapter. Once the evaluation of antecedent events to problem behavior had been completed, and those events that led to reduction of target behavior identified, implementation of the treatment plan could be assessed using an optimal control feedback model. In general, well-developed mathematical theories exist for optimal feedback control in each of the types of SSMs considered here. Kwon and Han (2005) presented an in-depth description of optimal control in linear SSMs (see also Molenaar, 2010); Elliott et al. (1995) is the classic source for optimal control in HMMs; Costa, Fragoso, and Marques (2005) discussed optimal control in SSMs with regime shifting (i.e., systems models for processes undergoing sudden changes in their dynamics). Conclusion Because the class of dynamic systems models is too large to comprehensively cover within the confines of this chapter, we have reviewed several simple examples of this class of models: the linear Gaussian SSM, the HHM, the hierarchical hidden Markov model, the GLDM, the linear Gaussian SSM with time-varying parameters, and SSMs in continuous time. Notable omissions from our review are regime shifts (Costa et al., 2005), modeling processes having long-range sequential dependencies (Palma, 2007), and chaotic processes having dynamics evolving at multiple time scales (Gao, Cao, Tung, & Hu, 2007). These types of process models are of interest for sequential behavior analysis (Torre & Wagenmakers, 2009) and again can be formulated as special instances of SSMs; however, their discussion requires consideration of much more technical concepts, and for that reason, we have omitted them from this review. We therefore refer to the preceding excellent references given for further details about these model types. These new methods for the sequential analysis of behavior processes reviewed here can be conceived of as special cases of a general SSM, a model that allows for the quantification of complex systems that are made up of inputs, feedback, outputs, and 278

states. Several instances of this general SSM were presented; however, the SSM covers a much broader range of dynamic models. For instance, a close correspondence exists between artificial neural network models and state space modeling (Haykin, 2001). This correspondence opens up the possibility to reformulate artificial neural network models as specific instances of nonlinear SSMs. For instance, the neural network models considered by Schmajuk (1997, 2002) are variants of adaptive resonance theory networks that, in their most general form, constitute coupled systems of nonlinear differential equations (Raijmakers & Molenaar, 1996) obeying the general nonlinear SSM format. Similar remarks apply to reinforcement networks (Wyatt, 2003). Our hope is that this presentation of the class of models known as SSMs has illustrated how they can be useful in behavior analysis—specifically in modeling behavior and behavior change as a process and to include all aspects of the behavior change process: the state of the organism, the inputs and outputs occurring during the process, and the feedback that occurs as the process continues. Use of these models can enhance the further development and growth of the field.

References Aoki, M. (2002). Modeling aggregate behavior and fluctuations in economics: Stochastic views of interacting agents. Cambridge, England: Cambridge University Press. Belsky, J., Garduque, L., & Hrncir, E. (1984). Assessing performance, competence, and executive capacity in infant play: Relations to home environment and security of attachment. Developmental Psychology, 20, 406–417. doi:10.1037/0012-1649.20.3.406 Carver, S. C. (2004). Self-regulation of action and affect. In R. F. Baumeister & K. D. Vohs (Eds.), Handbook of self-regulation (pp. 13–39). New York, NY: Guilford Press. Costa, O. L. V., Fragoso, M. D., & Marques, R. P. (2005). Discrete-time Markov jump linear systems. London, England: Springer-Verlag. Courville, A. C., & Touretzky, D. S. (2002). Modeling temporal structure in classical conditioning. In T. J. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems (Vol. 14, pp. 3–10). Cambridge, MA: MIT Press. Cunningham, C. L. (1979). Alcohol as a cue for extinction: State dependency produced by conditioned

New Methods for Sequential Behavior Analysis

inhibition. Animal Learning and Behavior, 7, 45–52. doi:10.3758/BF03209656 Elliott, R. J., Aggoun, L., & Moore, J. B. (1995). Hidden Markov models: Estimation and control. New York, NY: Springer-Verlag. Estes, W. K. (1956). The problem of inference from curves based on group data. Psychological Bulletin, 53, 134–140. doi:10.1037/h0045156 Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modeling based on generalized linear models (2nd ed.). Berlin, Germany: Springer-Verlag. Ford, D. H., & Lerner, R. M. (1992). Developmental systems theory: An integrative approach. Newbury Park, CA: Sage. Fraser, A. M. (2008). Hidden Markov models and dynamical systems. Philadelphia, PA: Society for Industrial and Applied Mathematics. doi:10.1137/1.978089 8717747 Gao, J., Cao, Y., Tung, W. W., & Hu, J. (2007). Multiscale analysis of complex time series: Integration of chaos and random fractal theory, and beyond. Hoboken, NJ: Wiley. Gardiner, C. W. (2004). Handbook of stochastic methods: For physics, chemistry and the natural sciences (3rd ed.). Berlin, Germany: Springer-Verlag. Glass, G., Willson, V., & Gottman, J. M. (1975). Design and analysis of time-series experiments. Boulder: Colorado University Press. Gottman, J. M., McFall, R. M., & Barnett, J. T. (1969). Design and analysis of research using time-series. Psychological Bulletin, 72, 299–306. doi:10.1037/ h0028021 Haccou, P., & Meelis, E. (1992). Statistical analysis of behavioural data: An approach based on time-structured models. Oxford, England: Oxford University Press. Hannan, M. T. (1991). Aggregation and disaggregation in the social sciences. Lexington, MA: Lexington Books.

McSweeney, F. K., Murphy, E. S., & Kowal, B. P. (2004). Varying reinforcer duration produces behavioral interactions during multiple schedules. Behavioural Processes, 66, 83–100. doi:10.1016/j.beproc.2004. 01.004 Merrell, M. (1931). The relationship of individual growth to average growth. Human Biology, 3, 37–70. Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific psychology, this time forever. Measurement: Interdisciplinary Research and Perspective, 2, 201–218. doi:10.1207/s15366359mea0204_1 Molenaar, P. C. M. (2008). On the implications of the classical ergodic theorems: Analysis of developmental processes has to focus on intra-individual variation. Developmental Psychobiology, 50, 60–69. doi:10.1002/dev.20262 Molenaar, P. C. M. (2010). Note on optimization of individual psychotherapeutic processes. Journal of Mathematical Psychology, 54, 208–213. doi:10.1016/j. jmp.2009.04.003 Molenaar, P. C. M., & Campbell, C. G. (2009). The new person-specific paradigm in psychology. Current Directions in Psychological Science, 18, 112–117. doi:10.1111/j.1467-8721.2009.01619.x Molenaar, P. C. M., & Newell, K. M. (2003). Direct fit of a theoretical model of phase transition in oscillatory finger motions. British Journal of Mathematical and Statistical Psychology, 56, 199–214. doi:10.1348/000711003770480002 Molenaar, P. C. M., Sinclair, K. O., Rovine, M. J., Ram, N., & Corneal, S. E. (2009). Analyzing developmental processes on an individual level using nonstationary time series modeling. Developmental Psychology, 45, 260–271. doi:10.1037/a0014170

Haykin, S. (Ed.). (2001). Kalman filtering and neural networks. New York, NY: Wiley. doi:10.1002/0471221546

Nevin, J. A., & Baum, W. M. (1980). Feedback functions for variable-interval reinforcement. Journal of the Experimental Analysis of Behavior, 34, 207–217. doi:10.1901/jeab.1980.34-207

Howell, D. C. (2007). Statistical methods for psychology. Belmont, CA: Thomson.

Palma, W. (2007). Long-memory time series: Theory and methods. Hoboken, NJ: Wiley. doi:10.1002/9780470131466

Johnson, R. E., Chang, C. H., & Lord, R. G. (2006). Moving from cognition to behavior: What the research says. Psychological Bulletin, 132, 381–415. doi:10.1037/0033-2909.132.3.381

Priestley, M. B. (1988). Non-linear and non-stationary time series analysis. London, England: Academic Press.

Kuhn, D. E., Hardesty, S. L., & Luczynski, K. (2009). Further evaluation of antecedent social events during a functional analysis. Journal of Applied Behavior Analysis, 42, 349–353. doi:10.1901/jaba.2009.42-349 Kwon, W. H., & Han, S. (2005). Receding horizon control. London, England: Springer-Verlag. McFarland, D. (1971). Feedback mechanisms in animal behavior. London, England: Academic Press.

Raijmakers, M. E. J., & Molenaar, P. C. M. (1997). Exact ART: A complete implementation of an ART network, including all regulatory and logical functions, as a system of differential equations capable of standalone running in real time. Neural Networks, 10, 649–669. doi:10.1016/S0893-6080(96)00111-6 Risken, H. (1984). The Fokker-Planck equation: Methods of solution and applications. Berlin, Germany: Springer-Verlag. 279

Molenaar and Goode

Schmajuk, N. A. (1997). Animal learning and cognition: A neural network approach. Cambridge, England: Cambridge University Press. Schmajuk, N. (2002). Latent inhibition and its neural substrates. Norwell, MA: Kluwer Academic. doi:10.1007/978-1-4615-0841-0 Sharpe, T., & Koperwas, J. (2003). Behavior and sequential analyses: Principles and practice. Thousand Oaks, CA: Sage. Sidman, M. (1960). Tactics of scientific research. Oxford, England: Basic Books. Simon, D. (2006). Optimal state estimation: Kalman, H∞, and nonlinear approaches. Hoboken, NJ: Wiley. doi:10.1002/0470045345 Timberlake, W., Schaal, D. W., & Steinmetz, J. E. (Eds.). (2005). Special issue on the relation of behavior and neuroscience [Special issue]. Journal of the Experimental Analysis of Behavior, 84(3). Tong, H. (1990). Non-linear time series: A dynamical system approach. Oxford, England: Clarendon Press. Torre, K., & Wagenmakers, E. J. (2009). Theories and models for 1/fβ noise in human movement science.

280

Human Movement Science, 28, 297–318. doi:10.1016/ j.humov.2009.01.001 Visser, I., Raijmakers, M. E. J., & Molenaar, P. C. M. (2002). Fitting hidden Markov models to psychological data. Science Progress, 10, 185–199. Visser, I., Raijmakers, M. E. J., & Molenaar, P. C. M. (2007). Characterizing sequence knowledge using on-line measures and hidden Markov models. Memory and Cognition, 35, 1502–1517. doi:10.3758/ BF03193619 Visser, I., & Speekenbrink, M. (2010). depmixS4: An R-package for hidden Markov models. Journal of Statistical Software, 36(7), 1–21. Wyatt, J. (2003). Reinforcement learning: A brief overview. In R. Kühn, R. Menzel, W. Menzel, U. Ratsch, M. M. Richter, & I. O. Stamatescu (Eds.), Adaptivity and learning: An interdisciplinary debate (pp. 243–264). Berlin, Germany: Springer-Verlag. Young, P. C., McKenna, P., & Bruun, J. (2001). Identification of non-linear stochastic systems by statedependent parameter estimation. International Journal of Control, 74, 1837–1857. doi:10.1080/002071 70110089824

Chapter 13

Pavlovian Conditioning K. Matthew Lattal

The study of Pavlovian conditioning in the laboratory has been ongoing for more than 100 years, having documented roots in early demonstrations of learning mechanisms involved in stimulus-elicited reflexes (Twitmeyer, 1905). In 1927, the publication of Pavlov’s Conditioned Reflexes, summarizing years of research in his and his colleagues’ laboratories, laid the groundwork for research and theory on classical conditioning, which continues to be a major focus of behavioral and neurobiological analysis today. This book was more than a summary of Pavlov’s research; it was a blueprint for how a rigorous experimental analysis of behavior should be undertaken. Although the experimental preparations for studying Pavlovian conditioning have expanded over the years, Pavlov’s work with dogs describes many of the key empirical phenomena and theoretical processes that modern researchers continue to pursue. Today, research on Pavlovian (sometimes referred to as classical, or respondent) conditioning is motivated by a theoretical approach that attempts to determine the ways in which organisms represent their world and learn to respond accordingly (see Rescorla, 1988b). The study of basic associative learning processes underlying Pavlovian conditioning has led to numerous insights into experimental design, how learning occurs, and how basic processes may lay the foundation for many putative higher forms of learning. At an empirical level, the key findings that form the cornerstone of modern thinking about Pavlovian conditioning have been demonstrated in multiple

preparations. Table 13.1 lists some of the preparations that are frequently used in modern studies of Pavlovian conditioning. Common Pavlovian conditioning mechanisms occur with aversive stimuli (such as fear conditioning with a shock or flavor aversion learning induced by nausea) and appetitive stimuli (such as magazine-approach conditioning with food or context–drug associations with drugs of abuse). These common findings in many different preparations point to the generality of the conditioning principles that I describe in this chapter. At a theoretical level, the study of Pavlovian conditioning is an area of behavior analysis that has long been dominated by approaches that focus on internal theoretical mechanisms to explain the emergence and maintenance of learning (see Pearce & Bouton, 2001). This theoretical approach, in contrast to the Skinnerian approach that has long dominated the study of operant behavior, relies heavily on internal mechanisms and is explicit in making assumptions about how the organism encodes the stimulus relations that it encounters in its environment (e.g., Dickinson, 1989; Rescorla, 1988b). Indeed, it is impossible to characterize the nature of Pavlovian research over the past 50 years without describing the central importance of certain internal theoretical concepts, such as associative strength and stimulus representations. In many cases, differences in terms are a simple function of differences in perspective. Describing certain findings in terms of selective stimulus control or selective stimulus representations may, in many cases, have the same

Preparation of this chapter was supported by National Institutes of Health Grants DA025922 and MH077111. DOI: 10.1037/13937-013 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

283

K. Matthew Lattal

Table 13.1 Common Preparations Used in the Study of Pavlovian Conditioning Unconditioned

Conditioned

Unconditioned

Representative

stimulus

response

response

reference

Preparation

Conditioned stimulus

Autoshaping

Discrete keylight

Food or liquid

Keypeck

Eating or drinking

Eyeblink

Diffuse auditory– visual stimulus Diffuse auditory– visual stimulus

Airpuff or shock

Eyeblink

Eyeblink

Shock

Freezing

Activity

Flavored liquid or food Diffuse auditory– visual stimulus Tactile–visual contextual cues

Lithium chloride

Drinking suppression Nausea

Brown & Jenkins (1968) Hupka, Kwaterski, & Moore (1970) Westbrook, Iordanova, McNally, Richardson, & Harris (2002) Blair & Hall (2003)

Food or liquid

Magazine entry

Eating or drinking

Holland (1977)

Drug

Place preference

Euphoria

Cunningham, Gremel, & Groblewski (2006)

Fear conditioning

Flavor aversion Magazine approach Place conditioning

functional consequences, but what is known about Pavlovian conditioning has clearly come from a specific theoretical perspective that makes certain assumptions about learning that are absent in most operant accounts. Rather than provide an epistemological defense of any one perspective, in this chapter I describe the behavioral phenomena and related theories as they are characterized in the literature, which is dominated by a theoretical approach that focuses on internal mechanisms. Outside of the laboratory, clinicians have long recognized the importance of Pavlovian cues in contributing to maladaptive behavior. Understanding the interaction between Pavlovian cues and pharmacological interventions for disorders involving fear, anxiety, and substance abuse has been a major focus of recent behavioral neurobiological research (e.g., Davis, Myers, Chhatwal, & Ressler, 2006; Rothbaum & Davis, 2003; Rothbaum & Schwartz, 2002; Stafford & Lattal, 2011). These approaches continue to be shaped by the discoveries and sophisticated experimental approaches taken to the study of behavior by researchers interested in the basic processes involved in Pavlovian conditioning. My goal in this chapter is to provide an overview of modern behavioral research and theory on Pavlovian conditioning. The chapter is structured 284

around three basic questions that encompass much of modern research on Pavlovian conditioning, as it has been characterized in several influential reviews (Rescorla, 1988a, 1988b; Rescorla & Holland, 1976): (a) What are the circumstances that produce learning, (b) what is the content of learning, and (c) how is learning expressed in performance? What are the Circumstances that Produce Pavlovian Conditioning? On the surface, there seems to be little to understand about how to produce Pavlovian conditioning. Pavlov described many experiments in which a neutral stimulus, such as the ticking of a metronome (the conditioned stimulus, or CS) was followed by a salient, biologically relevant stimulus, such as meat powder or vinegar delivered to the mouth (the unconditioned stimulus, or US). By itself, the US evokes its own response (salivation; the unconditioned response, or UR). After enough pairings, responding emerges in the presence of the CS, even when the US is absent (the conditioned response, or CR). Thus, if a CS and a US are presented together enough times, an association between them will develop, resulting in the CS coming to elicit the response previously associated with the US. Indeed,

Pavlovian Conditioning

this is often the case, but there are many caveats to the idea that contiguity between stimuli will promote learning about those stimuli. In some cases, contiguity is completely ineffective (e.g., Rescorla, 1968); in other cases, it may actually weaken the association between stimuli (e.g., Kremer, 1978; Lattal & Nakajima, 1998); and in still other cases, it may be completely unnecessary for associative learning to occur (e.g., conditioned inhibition; Rescorla, 1969). Moreover, the CR that develops may have very little resemblance to the UR (e.g., Fanselow, 1980, 1982; Holland, 1977). Although most early theories of Pavlovian conditioning agreed that temporal contiguity between two events would produce an association between those events, it was clear from the beginning that any number of manipulations could affect the consequence of that contiguity for learning. For example, Pavlov (1927) documented the importance of considerations of CS and US salience, previous history with the CS or US, consistency of other cues in the environment, how CS–US contiguities were

arranged in time, and the animal’s motivational state, among other factors. It is therefore very clear that early in the history of Pavlovian conditioning, Pavlov and his students recognized that the same CS–US contiguity could produce differences in behavior, depending on any number of considerations.

Experimental Arrangements Between a Conditioned Stimulus and an Unconditioned Stimulus Experimentally, CS–US contiguity (also known as a conditioning trial) can be arranged and manipulated in many ways. Four of the most commonly used procedures are shown in Figure 13.1. These procedures result in different levels of conditioned responding to the CS, but those differences do not always mean that there are differences in learning (as I review in a later section). In delayed conditioning, the onset of the CS precedes the onset of the US by some amount of time. Typically, the US is presented coincidentally either with the last part of the CS or with CS offset.

Figure 13.1. Four common procedural variations on the temporal relation between the conditioned stimulus (CS) and the unconditioned stimulus (US). The trial is defined as the time between CS onset and US offset, and the intertrial interval is defined as the time between CS offset and the next CS onset. In delayed conditioning, CS onset and US presentation have a forward relation. In simultaneous conditioning, the US is present throughout CS presentation. In trace conditioning, a delay occurs between CS offset and US onset (the trace interval). Finally, in backward conditioning, the US precedes CS onset. 285

K. Matthew Lattal

In simultaneous conditioning, the CS and US cooccur. In trace conditioning, a delay is inserted between the offset of the CS and the onset of the US, resulting in a post-CS interval (the trace interval) in which the CS is absent before the US presentation. In backward conditioning, the US precedes the CS. These procedural variations have been examined in all of the preparations listed in Table 13.1. Of the four arrangements shown, delayed conditioning results in the most robust conditioned responding in the CS when it is tested alone after conditioning. Trace conditioning results in weaker responding to the CS, and simultaneous and backward conditioning frequently result in the weakest levels of responding. The challenge with these variations on the CS–US contiguity is determining to what extent the response in the presence of the CS reflects what the organism has learned about the CS–US contiguity, as I describe later in the chapter.

What Defines an Optimal Conditioning Trial? In delayed conditioning, the CS precedes the US, which is the most commonly used procedural variation of the CS–US arrangement. Although the CS must precede the US for conditioning to occur, the optimal interval for that delay differs among preparations. In a review of studies of the CS–US interval from several different Pavlovian preparations, Rescorla (1988a) demonstrated that within each preparation, the relation between the conditioning trial temporal parameters (CS–US interval) and conditioned responding was clear. When the delay between CS and US onset is zero (simultaneous presentation of the CS and US), there is little or no responding (e.g., Fanselow, 1990). As the delay is increased, responding emerges, but then falls again as the delay lengthens. Thus, the forward relation between the CS and the US that may be needed to generate conditioning appears to be optimal. Defining the optimal forward CS–US interval for Pavlovian conditioning is difficult because the temporal characteristics vary from task to task. For example, in preparations such as eyeblink conditioning, in which a CS precedes the delivery of either a shock or an airpuff to the eye, the optimal delay between CS onset and US onset is on the order 286

of milliseconds (e.g., Kehoe, Olsen, Ludvig, & Sutton, 2009). In other preparations, such as magazineapproach conditioning, in which a CS precedes the delivery of a food pellet to a magazine in an operant chamber, the optimal delay is on the order of seconds (e.g., Holland, 2000). In still other preparations, such as flavor aversion learning, the optimal delay may be on the order of minutes to hours (e.g., Schafe, Sollars, & Bernstein, 1995). What is clear from the many studies of CS–US intervals in Pavlovian conditioning is that there is a functional relation between CS–US interval and conditioned responding, but that the absolute values making up this delay vary widely, from milliseconds to hours, depending on the preparation. These differences in absolute durations defining an optimal CS–US interval between preparations suggest that what is considered effective depends greatly on the Pavlovian preparation under consideration. In any given preparation, however, the general finding is that performance is greatest with a slightly asynchronous forward CS–US interval (Rescorla, 1988a). Other studies have found that in addition to the CS–US interval (or the duration of the conditioning trial, T), the interval between trials (the intertrial interval, or ITI) is important for generating Pavlovian conditioning (e.g., Barela, 1999). The same trial can have very different effects on behavior, depending on how that trial is arranged in time. For example, with long trials, very little conditioning may occur when those trials are separated by short ITIs, but when those trials are separated by long ITIs, conditioning will emerge. These trial-spacing effects are robust across many different Pavlovian preparations, including preparations that require very short CS–US intervals, such as autoshaping (e.g., Perkins et al., 1975), and those that require longer CS–US intervals (on the order of minutes or hours), such as flavor aversion learning (e.g., Domjan, 1980). Although the absolute values that make up these durations differ widely from task to task, conditioning is generally greater with shorter trial durations and longer intertrial durations.

Interactions Between Trial and Intertrial Intervals in Delayed Conditioning A key caveat to trial and intertrial duration findings is that the predictive effects of a given trial

Pavlovian Conditioning

or intertrial duration depend on the overall temporal context in which they are placed. Not only does the effect of a trial duration depend on the ITI, it depends on the ratio of the ITI to the trial duration (the ITI:T ratio; Gibbon, Baldock, Locurto, Gold, & Terrace, 1977). Figure 13.2A illustrates experimental groups in which the duration of the ITI and the trial were manipulated. As described earlier, a general finding is that when the ITI is held constant (e.g., Groups 96:32 and 96:8 in Figure 13.2A), conditioned responding emerges more quickly with the shorter trial duration. When the trial duration is held constant (e.g., Groups 96:8 and 24:8 in Figure 13.2A), conditioned responding emerges more quickly with the longer ITI. A study by Gibbon et al. (1977) demonstrated that although trial and ITI durations are important, the key variable that determined the emergence of conditioned responding was the ratio between them. As the ITI:T ratio increases, rate of acquisition (i.e., number of conditioning trials required to reach a predefined criterion) increases, largely independent of the durations that fall into those ratios. An example of this ITI:T ratio acquisition effect from a magazine-approach experiment is shown in Figure 13.2B (Lattal, 1999). In this experiment, each trial consisted of a white-noise CS followed by the delivery of a food pellet into the magazine. The elevation in responding (food magazine entries) during the CS compared with the ITI was used as the dependent variable. As can be seen in Figure 13.2B, groups that shared the same ITI:T ratio acquired the response at approximately the same rate and to approximately the same terminal levels, regardless of the absolute durations of the ITI or T within that ratio. This ITI:T ratio effect discovered by Gibbon et al. (1977) has since been replicated in many other studies (Burns & Domjan, 2001; Holland, 2000), and the acquisition effects such as those shown in Figure 2b for the most part tend to reflect differences in learning (Lattal, 1999). These findings have had tremendous theoretical importance, leading to the development of several influential theories suggesting that time and the calculation of rates of CS and US occurrence are at the root of learning (e.g., Gallistel, Fairhust, & Balsam, 2004; Gallistel & Gibbon, 2000; Gibbon & Balsam, 1981). Although the

Figure 13.2. A: Trial (T) and intertrial interval (ITI) durations used in an experiment examining the ITI:T ratio effect. B: Effects of ITI:T ratio on acquisition of magazine-approach responding, defined as the ratio between responding during the T and ITI periods over four daily sessions (elevation ratio = conditioned stimulus response rate/(conditioned stimulus response rate + ITI response rate). Adapted from “Trial and Intertrial Durations in Pavlovian Conditioning: Issues of Learning and Performance,” by K. M. Lattal, 1999, Journal of Experimental Psychology: Animal Behavior Processes, 25, p. 437. Copyright 1999 by the American Psychological Association.

ITI:T ratio may not capture the entire story of trial and intertrial duration effects (e.g., Holland, 2000; Lattal, 1999), research around this finding has demonstrated that there are no optimal intervals for producing conditioning. Instead, the intervals that will 287

K. Matthew Lattal

produce conditioning will depend on the conditioning preparation being examined and the overall temporal context in which these intervals are placed.

Does the Absence of Conditioned Stimulus–Induced Responding Reflect an Absence of Learning? In the literature reviewed so far, I have not distinguished between conditioned responding, learning, and conditioning. The general assumption has been that more responding as a result of differential treatment reflects more learning. However, this assumption is not always valid, because in many cases the behavior measured during an assessment of learning does not reflect the learning that has occurred. In studies of delayed conditioning reviewed earlier, for example, the general finding is that a forward temporal relation between the CS and the US (i.e., CS precedes US) will result in conditioning, depending on the overall temporal context in which CS–US pairings occur. In the other CS–US variations illustrated in Figure 13.1 (simultaneous, trace, and backward conditioning), responding is generally much weaker than in delayed conditioning. The challenge in interpreting this weak responding is to determine the appropriate experimental conditions for assessing learning with these variations. Typically, learning is inferred from responding to a CS during a common test some time after conditioning. During these tests, the CS is presented without the US so that conditioned behavior in the presence of the CS is not confounded with unconditioned behavior evoked by the US (see Rescorla & Holland, 1976). For example, on Day 1 of a fear conditioning preparation, a rat may receive pairings of a tone with a shock. On Day 2, the tone is presented without the shock, and the conditioned freezing response (absence of body movement) in the presence of the tone is measured. Generally, the inference that one tends to make is that there is a direct relation between the levels of responding and the strength of the CS–US association: Higher levels of freezing reflect stronger learning of the tone–shock association in fear conditioning. Inferring the state of the CS–US association from observation of the strength of conditioned responding 288

is difficult and becomes especially problematic when analyzing the results of trace, simultaneous, or backward conditioning. One of the challenges in assessing learning from responding to the CS is that the conditions present during initial learning and during testing will often be quite different depending on the procedural variation in the CS–US arrangement. In the case of delayed and trace conditioning, for example, the animal has a history of experiencing the CS before the US (and therefore in the absence of the US). Thus, when the CS is tested in the absence of the US, there will be at least one test trial in which the conditions of acquisition and testing are the same. This means that the first test trial will allow for an assessment of learning under identical conditions as occurred during acquisition. However, assessing the learning that occurs during backward US–CS pairings or simultaneous CS– US pairings is difficult. During acquisition, the presence of the US in each trial makes it difficult to disentangle the contributions of the CS and the US to conditioned responding (because any potential CR to the CS is confounded by the expression of a UR to the US). Testing the organism by presenting the CS in the absence of the US is also difficult, because the absence of the US on a test trial means that the conditions during acquisition and testing are different (during acquisition, the US is always present immediately before the CS or simultaneously with the CS, but during testing the US is absent, which may immediately signal different contingencies). Moreover, if the CS and US get encoded as a compound stimulus during acquisition, testing the CS in the absence of the US will mean that only part of the conditioning compound is assessed, which may be expected to weaken responding. Thus, analyses of performance during test trials after simultaneous or backward conditioning will be confounded by the difference in conditions from acquisition to testing. A similar challenge occurs when interpreting the relatively weak responding that occurs after trace conditioning. Typically, as the trace interval is increased between CS offset and US onset, responding to the CS decreases. Thus, the temptation is to conclude that associative learning between the CS

Pavlovian Conditioning

and the US is impaired with increasing trace intervals. However, it is also possible that instead of learning a simple association between the CS and the US that is strong or weak, the organism learns about the specific temporal relation between the CS and US. Studies have found that when the CS is tested in the absence of the US, conditioned responding in the presence of the CS is low but increases as the time of the US presentation approaches (e.g., McEchron, Tseng, & Disterhoft, 2003). Such increases in responding as the US approaches have been demonstrated in many conditioning preparations with a variety of intervals (see Balsam, Sanchez-Castillo, Taylor, Van Volkinburg, & Ward, 2009). The idea that animals may learn about the temporal relations between CSs and USs has also been helpful in documenting the kind of learning that may occur during simultaneous and backward conditioning. In a series of studies (e.g., Arcediano, Escobar, & Miller, 2003; Barnet, Arnold, & Miller, 1991; Chang, Stout, & Miller, 2004; Miller & Barnet, 1993; see also Heth, 1976), Miller and colleagues have demonstrated that organisms do learn about the CS–US relations in backward and simultaneous conditioning, but that learning is not expressed in behavior when it is examined in typical test situations. After a series of backward conditioning trials, Miller and colleagues demonstrated that replacing the US with a second CS results in conditioning of that CS. They have suggested that this occurs because the new CS is presented at the time the original US was expected, resulting in an association being established. Thus, even though behavior to the target CS is absent, there is evidence that a backward temporal association is formed between the US and the CS. The clear implication of these studies is that a wide range of conditions lead to learning. A single measure of behavior in the presence of a CS may lead one to conclude that nothing has been learned, but if experimental designs can include additional assessments, then evidence of learning may emerge in behavior. Together, these studies of delayed, simultaneous, backward, and trace conditioning reveal levels of complexity in the seemingly simple contiguous condition between a CS and a US that

promotes associative learning. The absolute durations that make for an effective forward relation between a CS and a US will be very different among preparations, and within a given preparation, the efficacy of a given CS–US interval will change depending on the overall temporal context in which that interval is arranged. Moreover, those CS–US relations (such as simultaneous, backward, and trace) that appear to lead to little conditioning can actually result in a great deal of learning that is not necessarily expressed in responding to the CS during acquisition or testing. Thus, in evaluating the circumstances that produce learning, it is critical to think about what the absence of responding reflects and whether alternative strategies for assessing learning exist.

Challenges to Contiguity The findings reviewed in the preceding section suggest that CS–US contiguity can be arranged in many ways, and there is no straightforward answer to the question of what is an effective contiguous relation. These findings and older findings dating back to Pavlov all speak to complexities in contiguity—what may be regarded as an appropriate contiguous relation for promoting learning depends on many factors, including the conditioning preparation, the time between conditioning trials, and how learning is assessed in behavior. Although all of these findings clearly demonstrate that many factors contribute to Pavlovian conditioning, none reviewed so far challenges the idea that contiguity is critical for learning. A series of experiments in the 1960s (Table 13.2) changed this view, resulting in a paradigm shift in the way that Pavlovian conditioning was approached experimentally and theoretically.

Predictability and Expectancy in Pavlovian Conditioning The idea of contiguity between a CS and a US being critical for learning was challenged by three findings published around the same time in the late 1960s (Kamin, 1968; Rescorla, 1968; Wagner, Logan, Haberlandt, & Price, 1968). These findings all demonstrated that CS–US contiguity that promotes learning in one situation may result in no learning in 289

K. Matthew Lattal

Table 13.2 Experimental Designs and Results of Experiments by Kamin (1968, 1969); Rescorla (1966, 1967, 1968); and Wagner, Logan, Haberlandt, and Price (1968) Group

Phase 1

Phase 2

Test

Result

Explanation

Kamin blocking experiment Informative (I)

—

AX+

X–

Redundant (R)

A+

AX+

X–

R

In Group I, A and X are equal predictors of shock. In Group R, A is the better predictor of shock.

I

Rescorla correlational experiment Informative (I) Random (Ran)

p(US with X) p(US without X) p(US with X) = p(US without X)

X–

Ran

I

X–

In Group I, X is best signal of shock. In Group Ran, X and background context are equal predictors of shock.

Wagner et al. relative validity experiment Informative (I) Uninformative (U)

AX+, AX−, BX+, BX− AX+, BX−

X– X–

U

I

In Group I, all stimuli are equally informative. In Group U, A and B are better predictors of shock/no shock compared with X.

Note. A, B, and X are conditioned stimuli (CS). + = unconditioned stimulus (US; shock in each experiment); – = no US.

others, even when all other variables that may affect learning (such as number of CS and US presentations, CS durations, and ITIs) are held constant. Blocking. One finding that was important for the development of Pavlov’s (1927) theories was overshadowing: If two CSs (Stimulus A and Stimulus X) are presented in compound (AX) and followed by a US (+), the learning that occurs to either of the elements is weaker than the learning that occurs if either element were conditioned alone. Thus, each stimulus overshadowed the other, resulting in shared conditioning of A and X after the AX+ treatment. Kamin (1968, 1969) extended this finding by conditioning Stimulus A (A+) before the compound phase, resulting in the treatment shown in Table 13.2 (redundant group). Note that in the second phase, the contiguity between X and the US is identical in the two groups; what differs is whether the other stimulus in the 290

c ompound was conditioned before the compound phase. Kamin found that conditioning of A before the compound phase resulted in much less conditioning to X than occurred in the group that did not have the A+ treatment before the compound phase. Thus, the previous conditioning of A blocked the conditioning of X when the two stimuli were reinforced in compound. Rescorla’s correlational experiment. As in Kamin’s (1968, 1969) experiments, Rescorla’s (1966, 1967, 1968) experiments arranged the same CS–US contiguity in all experimental groups. These studies manipulated the probability of the US in the presence or absence of the CS. Rescorla found that the effects of a given CS–US probability on responding depended on the probability of the US in the absence of the shock; as the shock became more

Pavlovian Conditioning

likely in the presence of the CS than in its absence, conditioned responding to the CS was strengthened (informative group in Table 13.2). As the shock became no more likely in the CS than in its absence, responding was weakened (random group in Table 13.2). Thus, the same CS–US probability produced very different results on conditioned responding during the test, depending on the probability of the US in the absence of the CS. Relative validity effect. A final demonstration that contiguity was not sufficient for learning comes from the experiments of Wagner et al. (1968). They adopted a strategy similar to Kamin’s (1968, 1969): Arrange a common CS–US contiguity but manipulate the identity of the other stimuli present during compound trials. The design of their experiments is shown in Table 13.2. Here, Stimulus X has the same contiguity with the US in the two groups (reinforced on 50% of the trials), but X provides more predictive information about the occurrence of the US in the informative group than in the uninformative group. In the uninformative group, the strongest predictor of the US is Stimulus A (always paired with the US), and the strongest predictor of the absence of the US is Stimulus B (never paired with the US). The information provided by X is redundant with that provided by A and B, which results in little to no associative strength accumulating to X. In the informative group, however, Stimuli A, B, and X are each equally predictive of the US, resulting in all stimuli, including X, gaining a moderate amount of associative strength. During the test, responding was greater to Stimulus X in the informative group than in the uninformative group. The key implication of these findings is that organisms learn only when the CS provides nonredundant information about the US. In Kamin’s (1968, 1969) experiments, previous conditioning of Stimulus A meant that when Stimulus X was introduced to the compound and followed by the same US, no learning occurred to X because it did not signal anything new. In the case of Rescorla’s (1966, 1967, 1968) random groups, the presentation of the CS similarly did not signal anything about the US that was not already signaled by the conditioning

chamber itself. In the case of Wagner et al.’s (1968) experiment, Stimulus X did not gain associative strength if it was presented in compound with a more predictive cue. These three findings have since been replicated in many Pavlovian (and even in operant) procedures, demonstrating the utility of the notion of surprise and expectancy in conditioning—for a US to effectively condition a CS, that US must not be predictable on the basis of the other cues available at the time of US presentation. These findings formed the basis for much of the current thinking about Pavlovian conditioning. Indeed, one of the tests of any new theory of Pavlovian conditioning is to provide a framework for viewing these three findings, and certainly many theoretical accounts have attempted to do this. Most of these theories share the common assumption that redundant CSs are poorly conditioned because they provide no new information to the organism (but see Stout & Miller, 2007, for a different perspective). This notion of information and expectancy has been quantified in the idea of predictive error; learning occurs as a function of the difference (or error) between the US that the organism expects and the US that is actually received. Large predictive errors result in large amounts of learning; small predictive errors result in little learning. As the predictive error approaches zero (i.e., the US is almost perfectly predicted by the CS), no further learning occurs. Predictive error and the Rescorla–Wagner model. In 1972, Rescorla and Wagner collaborated on the development of a model designed to provide a quantitative theoretical basis for the notion of surprise and expectancy (Rescorla & Wagner, 1972; Wagner & Rescorla, 1972). The key concepts advanced by this Rescorla–Wagner model were that the organism gathers information from stimuli in an environment and uses that information to predict the US. The goal of the model was to explain the changes in associative strength (V) that occur during a given conditioning trial. The predictive error component of the equation is represented by the difference between the asymptote of learning that a US could support (λ) and the associative strength of all stimuli present during a conditioning trial (Σ V). 291

K. Matthew Lattal

Parameters associated with the CS (α) and US (β) control the rate of learning. ΔVCS = αCSβUS (λUS − Σ V).

(1)

The amount of learning (change in associative strength of the CS, ΔVCS) was directly related to the size of the discrepancy (λUS − Σ V) between the expected (on the basis of the available cues) and obtained US; more learning occurred when the discrepancy was large than when it was small. This model nicely captures the data from the Kamin (1968, 1969), Rescorla (1966, 1967, 1968), and Wagner et al. (1968) experiments. For example, in the case of blocking, if enough conditioning of A occurs in Phase 1, the associative strength of A (VA) will approach asymptote. In Phase 2, when X is presented in compound with A (AX+ trials), the total associative strength of that compound (VAX) is equal to the sum of the associative strengths of the elements (VA and VX), which is already near asymptote (owing to the large value of VA). The newly introduced Stimulus X, which has no previous value, provides no predictive information above and beyond that provided by Stimulus A; that is, the discrepancy between the expected and obtained US does not change, so learning does not accrue to the novel Stimulus X (VX remains near zero). Thus, when tested alone, the failure to observe responding to Stimulus X reflects impairments in conditioning of Stimulus X. The key insights from the Rescorla–Wagner model were that (a) the predictive error term was calculated on the basis of the summation of all cues that were present on a conditioning trial and (b) the associative strength of a stimulus could have a negative value (which makes it inhibitory, signaling the absence of an otherwise expected US; see Pearce & Bouton, 2001). The Rescorla–Wagner model has faced many challenges by other models that also rely on predictive error (e.g., Mackintosh, 1975; Pearce, 1987) and by models that emphasize other features of the conditioning situation (e.g., Gallistel & Gibbon, 2000; Stout & Miller, 2007). Nonetheless, the Rescorla–Wagner model continues to be the gold standard against which all models of Pavlovian conditioning are compared. As with other long-standing 292

models in learning (e.g., Herrnstein, 1970), there is elegance in its simplicity. Any student can quickly understand it and use it to make very clear predictions about the conditions that should promote or retard learning. These predictions are not always correct, but some surprising predictions about the nature of expectancy and operant reinforcement have resulted from this model (e.g., Lattal & Nakajima, 1998; Vaughan, 1982). Researchers continue to investigate the nature of predictive error learning, with new techniques being developed to examine some fundamental predictions about the negatively accelerated nature of learning curves (e.g., Rescorla, 2001a; Thein, Westbrook, & Harris, 2008) and about the underlying neurobiological circuits that mediate predictive error learning (e.g., Cole & McNally, 2007; Iordanova, McNally, & Westbrook, 2006; reviewed in McNally & Westbrook, 2006). This concept of predictive error has been extended beyond conditioning into the realm of substance abuse and reward systems (e.g., Schultz, 1998) and neural network modeling (e.g., Schmajuk, Gray, & Lam, 1996). The findings reviewed in this section demonstrate that there are many circumstances in which Pavlovian conditioning will and will not occur. A unifying theoretical theme from all of this work is that organisms code predictive relations, and responding occurs to the extent that conditioned stimuli provide useful information to the organism. Quantitative theories such as the Rescorla–Wagner model propose various mechanisms for this prediction encoding and make explicit predictions about those circumstances in which Pavlovian conditioning will occur. These models make some rudimentary assumptions about what organisms learn (in terms of associative strength), but they were designed to say very little about the nature of the associative relations that form during conditioning. In the next section, I review some work illustrating the content of what the organism might learn as a result of Pavlovian conditioning. What is the Content of Pavlovian Learning? A major question that has motivated research on Pavlovian conditioning focuses on understanding

Pavlovian Conditioning

what it is that the animal learns when the experimenter arranges a Pavlovian relation. Why does the animal show a CR in the presence of the CS? To answer this question, one needs to consider (a) how the animal might learn about the properties inherent in the CS and US, (b) how associations might be formed between those properties, and (c) how the CR becomes connected to those properties.

What Is the Nature of the Conditioned Stimulus Representation? Theories of learning have long speculated about how the animal encodes the stimulus environment and how that stimulus representation enters into associations with the US. Broadly, theories can be classified into those that view learning as occurring between elements of the CS and the US (e.g., Estes, 1950; Pearce & Hall, 1980; Rescorla & Wagner, 1972; reviewed in Harris, 2006) and those that view learning as occurring between the CS as a configuration and the US (e.g., Pearce, 1987, 1994; Rudy & O’Reilly, 1999). The differences between these elemental and configural theories are apparent in the case of compound conditioning, in which two or more neutral stimuli are presented together, followed by a US. Figure 13.3 illustrates an example of compound

conditioning in pigeon autoshaping. Each trial consists of a 5-second presentation of a compound with three visually distinct components (three triangles that together form a trapezoid) followed immediately by response-independent food. After several pairings, keypecking begins to occur during the compound trial presentation. According to a purely elemental learning theory (e.g., Rescorla & Wagner, 1972), responding occurs to the compound because each stimulus element (the individual triangles) has its own associative strength (elemental learning in Figure 13.3), which sum together to produce the response to the compound. According to a purely configural account (e.g., Pearce, 1987), responding to the compound occurs because the compound itself is its own configural stimulus (a trapezoid that is not really a compound at all, in fact) that gains strength as that configuration is reinforced (configural learning in Figure 13.3). Testing a triangle by itself consistently reveals less responding than the testing of the compound (overshadowing; see Good & Macphail, 1994). According to elemental theories, responding to the element is less than to the compound because each element shares the total strength of the compound. In the example in Figure 13.3, if all three triangles are equally salient, they

Figure 13.3. Elemental and configural learning approaches to conditioning. During a conditioning trial, a pigeon is presented with a visual stimulus (conditioned stimulus) that is followed by food presentation (unconditioned stimulus). After several trials, the pigeon pecks the key. According to an elemental account, the pigeon pecks because it learns that each component of the compound stimulus is associated with food. According to a configural account, the pigeon pecks because it learns that the stimulus as a single configuration is associated with food. CR = conditioned response. 293

K. Matthew Lattal

will each accumulate one third of the total associative strength that the US can support. According to configural accounts, responding to the element is less than to the compound because the element is a different stimulus that is only partially similar to the originally reinforced stimulus. In the example in Figure 13.3, responding to an individual element should be about one third as strong as that to the configuration because about one third of the strength of the configuration will generalize to that element. Because some of the more influential elemental (e.g., Rescorla & Wagner, 1972) and configural (e.g., Pearce, 1987, 1994) models make precise quantitative predictions, experiments can be devised to define the experimental conditions that distinguish the two approaches. The answers to the experimental questions of whether elemental or configural learning occurs are therefore answered on the basis of which mathematical formula better predicts the results. Many experiments have been devised to distinguish between configural and elemental accounts of learning. It is clear that certain conditions may favor one account over another (reviewed in McLaren & Mackintosh, 2000; Rescorla, 2003; Rescorla & Coldwell, 1995; Wagner, 2003). For example, with stimuli from the same modality, as in the autoshaping example in Figure 13.3, a configural account may better capture the learning that occurs. With stimuli from different modalities (such as when a flashing light and a tone are paired with food in magazine-approach conditioning), there may be more learning about the individual elements. The major implication of an elemental approach to learning is that the component stimulus elements that are present on a conditioning trial may become associated with the US. Other experiments have demonstrated that these elements may also become associated with each other (known as withincompound learning). For example, if a tone (T) and a light (L) are paired with shock (TL+), further conditioning of the tone (T+) will strengthen the fear response to the light, whereas extinction of the tone (T−) will decrease the fear response to the light (e.g., Cunningham, 1981; Rescorla & Cunningham, 1978). Together, experiments examining withincompound learning and those trying to distinguish 294

elemental and configural theories of learning have demonstrated the importance of examining the nature of the CS in understanding the content of associative learning. The CS representation can be relatively simple or complex, with multiple withinstimulus associations or with subtle differences in stimulus configurations driving learning and performance. The importance of characterizing elemental and configural perspectives in an introduction to Pavlovian conditioning is to emphasize that what one considers to be a CS is often derived from the perspective of the experimenter. How the animal codes the stimulus environment will depend on many factors that are not at present well understood. Multiple aspects of the stimulus situation, including stimuli present on the conditioning trial as well as background stimuli (such as the context in which conditioning occurs), will contribute to the learning that occurs.

What Is the Nature of the Unconditioned Stimulus Representation? As described in the previous section, some theories of learning have focused on the nature of the CS representation (elemental vs. configural) and the resulting impacts on learning that a particular approach may predict. What one thinks of as a CS may, from the animal’s perspective, be multiple CSs that are combined together or may only be a small part of a broader configuration that includes additional stimuli that are outside of the experimenter’s control. Other theories of learning have suggested that the US may also be a complex stimulus that has multiple features. The influential works of Konorski (1948, 1967) appealed to different aspects of the US in promoting associative learning. His ideas were central to the development of Wagner’s (1981) sometimes opponent process theory of learning; the idea that a US may have different components was nicely incorporated into the affective extension of that theory by Wagner and Brandon (1989). These theories have appealed to two components of the US: (a) the sensory component, which consists of the physical properties of the stimulus (i.e., how it feels, tastes, or smells) and (b) the affective component, which consists of the emotional

Pavlovian Conditioning

reactions triggered by that stimulus. These components may have different properties, such as the immediate salience and time course of decay from short-term memory. CSs may enter into associations with these different US components, depending on how the CS and US representations overlap in time (Wagner & Brandon, 1989; see also Delamater & Holland, 2008; Cunningham, Okorn, & Howard, 1997; Cunningham, Smith, & McMullin, 2003). The idea that a US may have multiple components is nicely illustrated in ethanol place-conditioning studies by Cunningham and colleagues (e.g., Cunningham et al., 1997, 2003). In their experiments, mice received an intraperitoneal injection of ethanol (the US) at different time intervals relative to placement into the place-conditioning apparatus (the CS). A strong place aversion developed when placement immediately followed an ethanol injection, but a strong place preference developed when placement was delayed for a few minutes after the injection. These findings suggest that ethanol has multiple unconditioned properties—a short-term aversive effect that is followed by a longer lasting positive effect. The direction of conditioning (place aversion or preference) in those experiments depended on which aspect of the US becomes associated with the CS. Together, these findings demonstrate that the general concept of associative strength supported by a US does not capture the nuances of the specific associations that may develop between components of the US and the CS. One of the key contributions of theories such as sometimes opponent process is that, as with CSs, USs that support conditioning also have multiple properties that may become incorporated into the learning that occurs during a trial.

What Is the Nature of the Pavlovian Association? Clearly, the CS and the US have multiple aspects that need to be included in the associative representation of stimuli. Thus far, I have considered the different aspects of the CS and the US that may be important for associative learning. In this section, I consider how the CS and the US enter into associations with each other and whether a CS–US association is necessary for conditioned responding to occur. This basic issue of what gets associated as a

result of Pavlovian conditioning has been at the forefront of theoretical speculation since Pavlov, who wrote of CS and US centers in the brain and the various connections that they may have with each other and with response systems. Research on this topic has characterized two very general accounts of the learning that may result from Pavlovian conditioning: stimulus–stimulus (S-S) learning and stimulus–response (S-R) learning. The S-S approach to learning says that responding occurs because presentation of the CS activates a representation of the US. After several conditioning trials, the link to that US representation is strong. On subsequent trials, presentation of the CS triggers the memory of the US, which serves as an intermediary between the stimulus and the response. According to the S-R approach, responding occurs because the stimulus acquires the ability to elicit a response and the US is no longer involved once that S-R association is established. This account views the US as a catalyst that strengthens the S-R bond, but it is not part of the association that forms as a result of conditioning. One way to separate these two accounts is to change the value of the US after conditioning (e.g., Colwill & Motzkin, 1994; Holland & Straub, 1979; see Colwill, 1994, and Dickinson, 1989, for applications to operant behavior). If the S-S approach is correct, devaluing the US before a test with the CS should weaken responding to the CS. This weakening would occur because the presentation of the CS would activate a devalued representation of the US, thereby producing a weak response. If the S-R approach is correct, altering the value of the US should have no effect on behavior because once the CS–CR connection is established, the US is no longer needed to generate a response. A nice illustration of the role of the US representation in mediating conditioned responding comes from a magazine-approach experiment by Colwill and Motzkin (1994), the design of which is summarized in Figure 13.4A. In their experiment, rats learned that the presentation of one CS (CS1; a tone) would lead to a specific outcome in the food magazine (US1; food pellets). On other trials, a different CS (CS2; a light) was presented with a different outcome (US2; liquid sucrose) in the same magazine. 295

K. Matthew Lattal

Figure 13.4. A: Design of an experiment examining the effects of outcome devaluation on Pavlovian responding. During Phase 1 (cue conditioning), one conditioned stimulus (CS; a tone) signaled pellet delivery, and another CS (a light) signaled sucrose delivery. During Phase 2 (outcome devaluation), a flavor aversion was conditioned to the pellets. Finally, during the test, magazine-approach behavior was assessed during nonreinforced presentations of each CS. The CS– unconditioned stimulus (US) pairings and the US that was devalued were counterbalanced in the experiment. B: Responding during the test, defined as the percentage of observations in which the rat directed behavior toward the magazine. LiCl = lithium chloride. Data are from Colwill and Motzkin (1994).

As a result of these trials, the rats enter the magazine during the presentation of either CS as they learn the CS–US relation. Thus, the magazine response is similar for the two CSs (which were counterbalanced, as were the outcomes). After conditioning, the animals received access to one of the outcomes (US1) followed by injection of lithium chloride, which induces nausea (the devaluation phase). By the end of the final session of devaluation, all subjects had stopped consuming the devalued outcome. During an extinction test, animals received each CS on alternating trials. According to an S-S account, there should be less response to CS1 (the tone) than to CS2 (the light) because presentation of the tone should activate a representation of the food pellets, which are now associated with nausea. According to an S-R account, there should be no difference in responding to the two CSs because responding is now under the control of the S-R association, which is independent of the status of the US. 296

The results of this experiment are shown in Figure 4B. Less responding occurred to the CS that had its outcome devalued than to the CS that did not. This difference is consistent with an S-S account of learning: Postconditioning devaluation of the outcome weakened conditioned responding to the CS that was initially paired with that outcome. These results, as well as many others, suggest that the CR adapted to a change in the value of the US with which it was associated, consistent with an S-S approach. Other data are consistent with an S-R account (e.g., Rizley & Rescorla, 1972), and Figure 4b shows that some responding continued in the presence of the CS associated with the devalued outcome, even after extensive outcome devaluation that caused all subjects to reject the outcome by the end of the devaluation phase. This result demonstrates that, although Colwill and Motzkin (1994) found strong evidence for S-S learning, they also found that some of the response was independent of the status of the outcome. Thus, the idea that a single S-S or S-R association controls behavior may be simplistic, especially when one considers the multiple aspects of CSs (such as elemental and configural learning) and USs (such as sensory and affective components) that may become associated (see Delamater & Holland, 2008).

Putting Associations in Context When thinking about the mechanisms underlying Pavlovian conditioning, one often thinks in terms of CSs and USs and how they do or do not become associated. These issues are critical for understanding the nature of learning, but considering that conditioning trials always occur in a context is also important. Context is more than just the background against which these CS–US pairings occur; it is a key component of the development and expression of learning during and after conditioning (see Bouton, 2004; Hall & Honey, 1989; Westbrook, Iordanova, McNally, Richardson, & Harris, 2002). A nice demonstration of the importance of context in modulating CS–US associations comes from a study by Peck and Bouton (1990). They used a counterconditioning paradigm, in which a CS was paired with an appetitive US in Phase 1 and then an aversive US in Phase 2 (or vice versa, in separate

Pavlovian Conditioning

Table 13.3 Experimental Design and Results of Peck and Bouton (1990) Results (relative to Group

Phase 1

Phase 2

Test

Aversive–appetitive

Context B: tone–shock

Context A: tone–food

Context B: tone?

Appetitive–aversive

Context B: tone–food

Context A: tone–shock

Context B: tone?

experiments). In this type of procedure, the CR normally changes after the US that is predicted by the CS changes. For example, a tone paired with food in Phase 1 will elicit appetitive CRs (head jerking or magazine entries), but when the tone is paired with shock in Phase 2, it will elicit aversive CRs (conditioned freezing). Peck and Bouton (1990) added a contextual manipulation to this paradigm, resulting in the aversive and appetitive associations forming in different contexts. Table 13.3 shows Peck and Bouton’s general design (the two groups were examined in different experiments). Rats were exposed to a distinct context (Context B) during Phase 1, in which some rats (aversive–appetitive [Av-Ap] group;) received tone–shock pairings and others (appetitive–aversive [Ap-Av] group) received tone–food pairings. In Phase 2, which occurred in Context A, the contingencies were switched, resulting in rats in the Av-Ap group receiving tone–food pairings and the Ap-Av group receiving tone–shock pairings. When testing of the tone occurred in Context B, the Ap-Av group spent more time engaged in appetitive responding (head jerking) relative to control groups, and the Av-Ap group spent more time engaged in aversive responding (freezing behavior) relative to control groups. These results demonstrate that the CS–US association is at least somewhat under the control of contextual cues (see also Bouton & Peck, 1992). Many other studies have demonstrated that contexts can modulate conditioned responding, through direct associations with either the CS or the US or through a more hierarchical association in which the context sets the occasion for certain CS–US contingencies (e.g., Nelson, 2002). Together, the findings reviewed in this section show that pairing a CS

control group) Aversive conditioned responding increased Appetitive conditioned responding increased

with a US can result in different associations that may be dependent on conditioning contexts for retrieval. These findings lay the foundation for the material in the next section, in which I consider the issue of how learning is expressed in behavior. Many kinds of associations can be formed as a result of conditioning, and the empirical challenge is to provide evidence of these associations in behavior. How is Learning Expressed in Performance? Perhaps the major challenge confronting Pavlovian conditioning research is determining how to measure learning—a theoretical concept that is not directly observable—in behavior. It is clear that overt behavior often belies what the organism has learned about a particular contingency. For example, the presence of an overt behavioral response may be due not to learning but to some general internal aroused state; similarly, the absence of behavior may occur for many reasons other than the absence of learning. This distinction between learning and the expression of learning in performance is perhaps clearest in the study of extinction.

Extinction: Preserving the Association While Eliminating the Response Although the mechanisms underlying initial associative learning remain to be fully understood, predictive information is clearly critical for new learning to occur. Research demonstrating the importance of predictive information has also focused on the conditions that promote response loss after conditioning. The most widely studied of these conditions is extinction, which occurs when a CS is presented 297

K. Matthew Lattal

repeatedly in the absence of the US with which it was previously paired. After some number of such extinction trials, the CR is weakened and often entirely eliminated. Many studies have demonstrated that acquisition and extinction share common properties, including the importance of predictive error, the shape of the learning curve, and the generality between stimuli, among other properties (e.g., Rescorla, 2001a). Extinction clearly produces response loss without necessarily producing a loss of the original association. This point was emphasized by Pavlov (1927) and formed the basis for much of his theorizing, and most theories since Pavlov’s time have agreed that extinction generates new learning that does not permanently alter the original association (see Delamater, 2004; Lattal, 2007; Rescorla, 2001b). Several key findings have demonstrated that extinction suppresses behavior without significantly altering the content of the original learning.

since extinction. Figure 13.5 shows the results of two extinction experiments reported by Pavlov (1927). During conditioning, the CS was paired with delivery of vinegar (the US) directly into the dog’s mouth. After these pairings, tests of the CS revealed high levels of salivation (Extinction Trial 1 in Figure 13.5). Over the course of extinction trials, the continued absence of the US caused the CR to extinguish. Pavlov’s key experimental insight in these extinction experiments was simply to wait and test the animals again after some delay. When he did that (Test in Figure 13.5), he found that the CR that was partially (Figure 13.5A) or almost completely (Figure 13.5B) eliminated during extinction returned. This he termed spontaneous recovery, which subsequent research has shown to be influenced by several factors, including the strength of conditioning, amount of extinction, and length of the postextinction retention interval (see Rescorla, 2004).

Spontaneous recovery. Perhaps the best documented demonstration that extinction eliminates the response while preserving the original learning comes from studies showing that extinguished behavior returns after some time has passed

Reinstatement. A second finding demonstrating the preservation of initial learning through extinction is reinstatement. After extinction, noncontingent exposure to the US alone can reinstate conditioned responding to the CS that had previously

Figure 13.5. Two examples of extinction and spontaneous recovery plotted from data reported by Pavlov (1927). During conditioning, dogs received pairings of an auditory conditioned stimulus (ticking of a metronome) with the delivery of vinegar (the unconditioned stimulus) to the mouth. During extinction, the conditioned stimulus was presented without the unconditioned stimulus. During the first extinction trial, conditioned responding (salivation) was high but decreased over the course of extinction sessions. A postextinction test conducted after a delay revealed small (A) or large (B) spontaneous recovery of the conditioned response. From “Extinction and the Erasure of Memories,” by K. M. Lattal, Psychological Science Agenda, 21, p. 3. Copyright 2007 by the American Psychological Association. 298

Pavlovian Conditioning

been extinguished. As with spontaneous recovery, reinstatement is specific to the CS that was extinguished and to the US that was delivered during conditioning (e.g., Delamater, 1997). Reinstatement as a procedural tool has been particularly effective in the study of substance abuse and the relapse of drug seeking after extinction (e.g., Malvaez, SanchisSegura, Vo, Lattal, & Wood, 2010) Contextual renewal. As noted earlier, conditioned behavior is modulated by contexts. The clearest examples of contextual determinants of behavior come from the study of extinction. If conditioning of a CS occurs in one context (Context A) followed by extinction in a second context (Context B), testing of the CS in Context A results in a return (renewal) of conditioned responding. This contextual A-B-A renewal effect is robust, having been documented in fear conditioning (e.g., Bouton & Bolles, 1979), appetitive operant conditioning (e.g., Nakajima, Tanaka, Urushihara, & Imada, 2000), flavor aversion (e.g., Rosas & Bouton, 1997), and spatial learning (e.g., Lattal, Mullen, & Abel, 2003), among many other preparations. Moreover, this contextual renewal effect is not limited to a return to the conditioning context, as with other forms of renewal such as A-A-B (e.g., Bouton & Ricker, 1994; see also Pavlov, 1927, p. 99, for an early description of this phenomenon) and A-B-C (e.g., Denniston, Chang, & Miller, 2003) renewal. These findings demonstrate that a change of context between extinction and testing will result in a return of conditioned responding. Pavlovian-to-instrumental transfer. Another finding that does not receive as much attention as spontaneous recovery, reinstatement, and renewal is the Pavlovian-to-instrumental transfer effect. When a Pavlovian CS (such as a tone) and an instrumental (operant) response (such as a lever press) lead to the same outcome in different conditioning sessions, activating the tone with the lever present will produce an increase in instrumental responding. This Pavlovianto-instrumental transfer procedure has been used to assess the status of CS–US associations in many studies (e.g., Holland, 2004), and studies of extinction have shown that even when conditioned responding has been eliminated in the presence of a CS, that stimulus shows remarkably little disruption in its

ability to augment an instrumental response that leads to a common outcome (e.g., Delamater, 1996). Together, these and several other findings (e.g., Leri & Rizos, 2005) all demonstrate that extinction is a response-weakening process that dampens but does not eliminate the original association. From a theoretical perspective, the challenge is to determine the mechanisms that allow this to occur. Formal theories of learning, such as the Rescorla–Wagner model, suggest that the key processes that produce new learning also operate during extinction. These theories suggest that during conditioning and extinction, the discrepancy between the US that is expected and the US that is obtained is what drives learning. Early in conditioning and early in extinction, this discrepancy is large: Before conditioning, the organism expects no US but then receives a US, and before extinction, the organism expects a US but then receives no US. Throughout the course of conditioning or extinction, learning continues until the expected outcome matches the obtained outcome (a US in the case of conditioning; no US in the case of extinction). Thus, the amount of learning that occurs on a given extinction trial is subject to the same predictive error mechanisms that operate during conditioning. Although error-correction models appear to capture the negatively accelerated learning curves that underlie extinction, such models do not have an obvious mechanism that allows the original learning to be preserved throughout extinction. Indeed, in its simplest form, associative strength to a stimulus increases and decreases depending on predictive error, but it occurs via a single path that does not allow the history of the stimulus to be retained. Consequently, there is not a clear mechanism that would allow phenomena such as spontaneous recovery after extinction to occur. Most theories developed to provide specific accounts of extinction appeal to some inhibitory process that operates during extinction (see Delamater, 2004). There is good evidence that the error correction mechanisms described in the Rescorla– Wagner model operate during extinction (e.g., Leung & Westbrook, 2008), but the open question is, what is being inhibited? Formal theories have proposed inhibition within several pathways, 299

K. Matthew Lattal

including in the representation of the CS or US itself (e.g., Rescorla & Cunningham, 1977; Robbins, 1990) and inhibition in the associations between the CS and US or CS and CR (e.g., Rescorla, 1993). One influential theory of extinction has suggested that a key component in the development and expression of extinction is the context in which it occurs (e.g., Bouton, 2004). Bouton and colleagues have appealed to memory retrieval processes in accounting for phenomena such as contextual renewal and spontaneous recovery (e.g., Bouton, 1991, 1993; Bouton, Westbrook, Corcoran, & Maren, 2006). Central to Bouton’s (2004) theory is the idea that initial conditioning and extinction result in the formation of memories that differ in their dependence on the context for retrieval. Memories of initial conditioning are independent of the context in which that learning occurred, as revealed in the transfer of conditioned responding across testing contexts. Memories for extinction, however, rely heavily on contextual cues for retrieval; when testing occurs outside of the extinction context, those cues are absent, resulting in renewal of responding. According to Bouton’s theory, spontaneous recovery can also be thought of as a form of contextual renewal, because the passage of time itself creates a context that differs from the extinction context (see Bouton, Nelson, & Rosas, 1999, for further discussion). Applying Bouton’s ideas about memory retrieval in extinction has led to many novel findings that have been applied with success to clinical treatments (e.g., Collins & Brandon, 2002; Dibbets, Havermans, & Arntz, 2008; Vansteenwegen et al., 2006). The study of extinction has been one of the great success stories of basic research in terms of extrapolation of findings with animals to clinical settings (see Chapter 4, this volume). Extinction is a widely used behavioral intervention for many psychiatric disorders, including posttraumatic stress disorder, phobias, and substance abuse. The repeated demonstrations from the laboratory that extinction does not eliminate the original learning challenges clinicians to design interventions that make extinction learning persistent, weakening contextual renewal and spontaneous recovery with time, which is currently the focus of a great deal of neurobiological work on extinction (see Davis et al., 2006). 300

Do Impairments in Pavlovian Conditioning Reflect Impairments in Learning or Performance? Extinction is probably the most widely known and studied example of a learning process in which the behavior that is expressed in the presence of a stimulus clearly does not reflect the entirety of what the animal knows about the stimulus. As noted in several places in this chapter, it is clear that in many other cases, examination of performance in the presence of a CS after conditioning will often not reflect the status of the learning. A common example of this is that the CR that occurs after CS–US pairings may differ depending on the physical properties of the CS, so limiting the behavioral analysis to a single response system may lead to a very different interpretation than an examination of a different response system. This is illustrated nicely in an experiment by Holland (1977), who found that auditory and visual CSs evoke different responses when they are paired with a food pellet in a magazine-approach procedure. Auditory stimuli evoked a head-jerking response, whereas visual stimuli evoked an orienting–rearing response. Thus, simply examining a single performance measure (anticipatory entries into the food magazine) would result in a mischaracterization of the learning that had occurred. As noted earlier, certain procedural variations of the CS–US relation will result in low levels of responding to the CS that do not necessarily reflect the status of the associative learning about that CS. In the case of trace conditioning, for example, weak responding in the presence of the CS belies the often precise temporal information that the organism has encoded. Moreover, some theories of Pavlovian conditioning argue that many of the core phenomena that have been used to generate theories of learning (such as the trial-spacing effect and the Kamin [1968, 1969] blocking effect) do not reflect deficits in learning. Instead, these deficits in conditioned behavior reflect deficits in the expression of learned associations in behavior. One performance-based theory developed by Miller and colleagues (e.g., Stout & Miller, 2007) suggests that CS–US contiguity results in learning, but performance rules determine whether that

Pavlovian Conditioning

learning is expressed in behavior. Conditioned responding is evident as a function of the associative strength of the other cues that were present during conditioning (or testing; see Gibbon & Balsam, 1981). As the strength of the target CS is increased relative to those other cues (comparator stimuli), conditioned responding emerges; as it is decreased, conditioned responding is attenuated. Thus, in the case of blocking, the blocked stimulus acquires an association with the US that is not expressed in performance owing to the established association between the blocking stimulus and the US. Evidence for this idea comes from many studies by Miller and colleagues demonstrating that postconditioning manipulations of the associative strength of comparator stimuli can strengthen or weaken responding to a target CS (e.g., Stout & Miller, 2007). Other studies have demonstrated that the status of comparator stimuli has little to no impact on target CS behavior (e.g., Holland, 1999; Lattal, 1999; Robbins, 1988). Thus, whether this specific comparator mechanism always controls performance is not clear. What is clear is that this alternative way of thinking about learning and performance has revealed novel insights about learning and has offered the very important point that the absence of behavior should not be taken as evidence for the absence of learning. This is a critical point that is often unappreciated in neurobiological analyses of learning (see Lattal & Stafford, 2008). Future of Pavlovian Conditioning Research Remarkable progress has been made in understanding the mechanisms that underlie Pavlovian conditioning. There is now broad appreciation that Pavlovian conditioning reflects sophisticated learning processes that underlie many aspects of animal and human behavior. The development of advanced neurobiological techniques has resulted in a recent emphasis on the cellular and molecular mechanisms of conditioning. The behavioral analysis of Pavlovian conditioning will continue to inform the direction of these neurobiological studies, because the key questions that have motivated Pavlovian research are now shifting into the neurobiological domain (see Chapter 15, this volume). The sophisticated

approach to Pavlovian conditioning that incorporates many theoretical perspectives about the nature of stimulus representation and response expression will surely play an important role in the future refinement of these neurobiological techniques. Certainly, there are many unanswered questions about the basic nature of Pavlovian conditioning, meaning that the field’s emphasis on rigorous behavioral analyses will continue to reveal insights about basic learning processes.

References Arcediano, F., Escobar, M., & Miller, R. R. (2003). Temporal integration and temporal backward associations in human and nonhuman subjects. Learning and Behavior, 31, 242–256. doi:10.3758/BF03195986 Balsam, P., Sanchez-Castillo, H., Taylor, K., Van Volkinburg, H., & Ward, R. D. (2009). Timing and anticipation: Conceptual and methodological approaches. European Journal of Neuroscience, 30, 1749–1755. doi:10.1111/j.1460-9568.2009.06967.x Barela, P. B. (1999). Theoretical mechanisms underlying the trial-spacing effect in Pavlovian fear conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 25, 177–193. doi:10.1037/00977403.25.2.177 Barnet, R. C., Arnold, H. M., & Miller, R. R. (1991). Simultaneous conditioning demonstrated in second-order conditioning: Evidence for similar associative structure in forward and simultaneous conditioning. Learning and Motivation, 22, 253–268. doi:10.1016/0023-9690(91)90008-V Blair, C. A., & Hall, G. (2003). Perceptual learning in flavor aversion: Evidence for learned changes in stimulus effectiveness. Journal of Experimental Psychology: Animal Behavior Processes, 29, 39–48. doi:10.1037/0097-7403.29.1.39 Bouton, M. E. (1991). Context and retrieval in extinction and in other examples of interference in simple associative learning. In L. Dachowski & C. F. Flaherty (Eds.), Current topics in animal learning: Brain, emotion, and cognition (pp. 25–53). Hillsdale, NJ: Erlbaum. Bouton, M. E. (1993). Context, time, and memory retrieval in the interference paradigms of Pavlovian learning. Psychological Bulletin, 114, 80–99. doi:10.1037/0033-2909.114.1.80 Bouton, M. E. (2004). Context and behavioral processes in extinction. Learning and Memory, 11, 485–494. doi:10.1101/lm.78804 Bouton, M. E., & Bolles, R. C. (1979). Contextual control of the extinction of conditioned fear. Learning 301

K. Matthew Lattal

and Motivation, 10, 445–466. doi:10.1016/00239690(79)90057-2 Bouton, M. E., Nelson, J. B., & Rosas, J. M. (1999). Stimulus generalization, context change, and forgetting. Psychological Bulletin, 125, 171–186. doi:10.1037/0033-2909.125.2.171 Bouton, M. E., & Peck, C. A. (1992). Spontaneous recovery in cross-motivational transfer (counterconditioning). Animal Learning and Behavior, 20, 313–321. doi:10.3758/BF03197954 Bouton, M. E., & Ricker, S. T. (1994). Renewal of extinguished responding in a second context. Animal Learning and Behavior, 22, 317–324. doi:10.3758/ BF03209840 Bouton, M. E., Westbrook, R. F., Corcoran, K. A., & Maren, S. (2006). Contextual and temporal modulation of extinction: Behavioral and biological mechanisms. Biological Psychiatry, 60, 352–360. doi:10.1016/j.biopsych.2005.12.015 Brown, P. L., & Jenkins, H. M. (1968). Auto-shaping of the pigeon’s key-peck. Journal of the Experimental Analysis of Behavior, 11, 1–8. doi:10.1901/jeab. 1968.11-1 Burns, M., & Domjan, M. (2001). Topography of spatially directed conditioned responding: Effects of context and trial duration. Journal of Experimental Psychology: Animal Behavior Processes, 27, 269–278. doi:10.1037/0097-7403.27.3.269 Chang, R. C., Stout, S., & Miller, R. R. (2004). Comparing excitatory backward and forward conditioning. Quarterly Journal of Experimental Psychology B: Comparative and Physiological Psychology, 57, 1–23. doi:10.1080/02724990344000015 Cole, S., & McNally, G. P. (2007). Temporal-difference prediction errors and Pavlovian fear conditioning: Role of NMDA and opioid receptors. Behavioral Neuroscience, 121, 1043–1052. doi:10.1037/07357044.121.5.1043 Collins, B. N., & Brandon, T. H. (2002). Effects of extinction context and retrieval cues on alcohol cue reactivity among nonalcoholic drinkers. Journal of Consulting and Clinical Psychology, 70, 390–397. doi:10.1037/0022-006X.70.2.390 Colwill, R. M. (1994). Associative representations of instrumental contingencies. Psychology of Learning and Motivation, 31, 1–72. doi:10.1016/S0079-7421 (08)60408-9 Colwill, R. M., & Motzkin, D. K. (1994). Encoding of the unconditioned stimulus in Pavlovian conditioning. Animal Learning and Behavior, 22, 384–394. doi:10.3758/BF03209158 Cunningham, C. L. (1981). Association between the elements of a bivalent compound stimulus. Journal of Experimental Psychology: Animal Behavior Processes, 7, 425–436. doi:10.1037/0097-7403.7.4.425 302

Cunningham, C. L., Gremel, C. M., & Groblewski, P. A. (2006). Drug-induced conditioned place preference and aversion in mice. Nature Protocols, 1, 1662–1670. doi:10.1038/nprot.2006.279 Cunningham, C. L., Okorn, D. M., & Howard, C. E. (1997). Interstimulus interval determines whether ethanol produces conditioned place preference or aversion in mice. Animal Learning and Behavior, 25, 31–42. doi:10.3758/BF03199022 Cunningham, C. L., Smith, R., & McMullin, C. (2003). Competition between ethanol-induced reward and aversion in place conditioning. Learning and Behavior, 31, 273–280. doi:10.3758/BF03195988 Davis, M., Myers, K. M., Chhatwal, J., & Ressler, K. J. (2006). Pharmacological treatments that facilitate extinction of fear: Relevance to psychotherapy. Neurotherapeutics, 3, 82–96. doi:10.1016/j.nurx. 2005.12.008 Delamater, A. R. (1996). Effects of several extinction treatments upon the integrity of Pavlovian stimulusoutcome associations. Animal Learning and Behavior, 24, 437–449. doi:10.3758/BF03199015 Delamater, A. R. (1997). Selective reinstatement of stimulus-outcome associations. Animal Learning and Behavior, 25, 400–412. doi:10.3758/BF03209847 Delamater, A. R. (2004). Experimental extinction in Pavlovian conditioning: Behavioural and neuroscience perspectives. Quarterly Journal of Experimental Psychology B: Comparative and Physiological Psychology, 57, 97–132. doi:10.1080/02724990344000097 Delamater, A. R., & Holland, P. C. (2008). The influence of CS–US interval on several different indices of learning in appetitive conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 34, 202–222. doi:10.1037/0097-7403.34.2.202 Denniston, J. C., Chang, R. C., & Miller, R. R. (2003). Massive extinction treatment attenuates the renewal effect. Learning and Motivation, 34, 68–86. doi:10.1016/S0023-9690(02)00508-8 Dibbets, P., Havermans, R., & Arntz, A. (2008). All we need is a cue to remember: The effect of an extinction cue on renewal. Behaviour Research and Therapy, 46, 1070–1077. doi:10.1016/j.brat.2008.05.007 Dickinson, A. (1989). Expectancy theory in animal conditioning. In S. B. Klein & R. R. Mowrer (Ed.), Contemporary learning theories: Pavlovian conditioning and the status of traditional learning theory (pp. 279–308). Hillsdale, NJ: Erlbaum. Domjan, M. (1980). Effects of the intertrial interval on taste-aversion learning in rats. Physiology and Behavior, 25, 117–125. doi:10.1016/0031-9384(80)90191-2 Estes, W. K. (1950). Toward a statistical theory of learning. Psychological Review, 57, 94–107. doi:10.1037/ h0058559

Pavlovian Conditioning

Fanselow, M. S. (1980). Conditioned and unconditional components of post-shock freezing. Pavlovian Journal of Biological Science, 15, 177–182. Fanselow, M. S. (1982). The postshock activity burst. Animal Learning and Behavior, 10, 448–454. doi:10. 3758/BF03212284 Fanselow, M. S. (1990). Factors governing one-trial contextual conditioning. Animal Learning and Behavior, 18, 264–270. doi:10.3758/BF03205285 Gallistel, C. R., Fairhurst, S., & Balsam, P. (2004). The learning curve: Implications of a quantitative analysis. Proceedings of the National Academy of Sciences of the United States of America, 101, 13124–13131. doi:10.1073/pnas.0404965101 Gallistel, C. R., & Gibbon, J. (2000). Time, rate, and conditioning. Psychological Review, 107, 289–344. doi:10.1037/0033-295X.107.2.289 Gibbon, J., Baldock, M. D., Locurto, C., Gold, L., & Terrace, H. S. (1977). Trial and intertrial durations in autoshaping. Journal of Experimental Psychology: Animal Behavior Processes, 3, 264–284. doi:10.1037/0097-7403.3.3.264 Gibbon, J., & Balsam, P. (1981). Spreading association in time. In C. M. Locurto, H. S. Terrace, & J. Gibbon (Eds.), Autoshaping and conditioning theory (pp. 219– 253). New York, NY: Academic Press. Good, M., & Macphail, E. M. (1994). Hippocampal lesions in pigeons (Columba livia) disrupt reinforced preexposure but not overshadowing or blocking. Quarterly Journal of Experimental Psychology B: Comparative and Physiological Psychology, 47, 263–291. Hall, G., & Honey, R. C. (1989). Contextual effects in conditioning, latent inhibition, and habituation: Associative and retrieval functions of contextual cues. Journal of Experimental Psychology: Animal Behavior Processes, 15, 232–241. doi:10.1037/00977403.15.3.232 Harris, J. A. (2006). Elemental representations of stimuli in associative learning. Psychological Review, 113, 584–605. doi:10.1037/0033-295X.113.3.584 Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243–266. doi:10.1901/jeab.1970.13-243 Heth, C. D. (1976). Simultaneous and backward fear conditioning as a function of number of CS–UCS pairings. Journal of Experimental Psychology: Animal Behavior Processes, 2, 117–129. doi:10.1037/00977403.2.2.117 Holland, P. C. (1977). Conditioned stimulus as a determinant of the form of the Pavlovian conditioned response. Journal of Experimental Psychology: Animal Behavior Processes, 3, 77–104. doi:10.1037/00977403.3.1.77

Holland, P. C. (1999). Overshadowing and blocking as acquisition deficits: No recovery after extinction of overshadowing or blocking cues. Quarterly Journal of Experimental Psychology B: Comparative and Physiological Psychology, 52, 307–333. doi:10.1080/ 027249999393022 Holland, P. C. (2000). Trial and intertrial durations in appetitive conditioning in rats. Animal Learning and Behavior, 28, 121–135. doi:10.3758/BF03200248 Holland, P. C. (2004). Relations between Pavlovianinstrumental transfer and reinforcer devaluation. Journal of Experimental Psychology: Animal Behavior Processes, 30, 104–117. doi:10.1037/00977403.30.2.104 Holland, P. C., & Straub, J. J. (1979). Differential effects of two ways of devaluing the unconditioned stimulus after Pavlovian appetitive conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 5, 65–78. doi:10.1037/0097-7403.5.1.65 Hupka, R. B., Kwaterski, S. E., & Moore, J. W. (1970). Conditioned diminution of the UCR: Differences between the human eyeblink and the rabbit nictitating membrane response. Journal of Experimental Psychology, 83, 45–51. doi:10.1037/h0028584 Iordanova, M. D., McNally, G. P., & Westbrook, R. F. (2006). Opioid receptors in the nucleus accumbens regulate attentional learning in the blocking paradigm. Journal of Neuroscience, 26, 4036–4045. doi:10.1523/JNEUROSCI.4679-05.2006 Kamin, L. J. (1968). “Attention-like” processes in classical conditioning. In M. R. Jones (Ed.), Miami Symposium on the Prediction of Behavior, 1967: Aversive stimulation (pp. 9–31). Coral Gables, FL: University of Miami Press. Kamin, L. J. (1969). Predictability, surprise, attention, and conditioning. In B. A. Campbell & R. M. Church (Eds.), Punishment and aversive behavior (pp. 279– 296). New York, NY: Meredith. Kehoe, E. J., Olsen, K. N., Ludvig, E. A., & Sutton, R. S. (2009). Scalar timing varies with response magnitude in classical conditioning of the nictitating membrane response of the rabbit (Oryctolagus cuniculus). Behavioral Neuroscience, 123, 212–217. doi:10.1037/ a0014122 Konorski, J. (1948). Conditioned reflexes and neuron organization. Cambridge, England: Cambridge University Press. Konorski, J. (1967). Integrative activity of the brain: An interdisciplinary approach. Chicago, IL: University of Chicago Press. Kremer, E. F. (1978). The Rescorla–Wagner model: Losses in associative strength in compound conditioned stimuli. Journal of Experimental Psychology: Animal Behavior Processes, 4, 22–36. doi:10.1037/ 0097-7403.4.1.22 303

K. Matthew Lattal

Lattal, K. M. (1999). Trial and intertrial durations in Pavlovian conditioning: Issues of learning and performance. Journal of Experimental Psychology: Animal Behavior Processes, 25, 433–450. doi:10.1037/00977403.25.4.433

Nakajima, S., Tanaka, S., Urushihara, K., & Imada, H. (2000). Renewal of extinguished lever-press responses upon return to the training context. Learning and Motivation, 31, 416–431. doi:10.1006/ lmot.2000.1064

Lattal, K. M. (2007). Extinction and the erasure of memories. Psychological Science Agenda, 21, 3–4, 16–18.

Nelson, J. B. (2002). Context specificity of excitation and inhibition in ambiguous stimuli. Learning and Motivation, 33, 284–310. doi:10.1006/lmot.2001.1112

Lattal, K. M., Mullen, M. T., & Abel, T. (2003). Extinction, renewal, and spontaneous recovery of a spatial preference. Behavioral Neuroscience, 117, 1017–1028. doi:10.1037/0735-7044.117.5.1017 Lattal, K. M., & Nakajima, S. (1998). Overexpectation in appetitive Pavlovian and instrumental conditioning. Animal Learning and Behavior, 26, 351–360. doi:10.3758/BF03199227 Lattal, K. M., & Stafford, J. M. (2008). What does it take to demonstrate memory erasure? Theoretical comment on Norrholm et al. Behavioral Neuroscience, 122, 1186–1190, 2008. doi:10.1037/a0012993

Pavlov, I. P. (1927). Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. London, England: Oxford University Press. Pearce, J. M. (1987). A model for stimulus generalization in Pavlovian conditioning. Psychological Review, 94, 61–73. doi:10.1037/0033-295X.94.1.61 Pearce, J. M. (1994). Similarity and discrimination: A selective review and a connectionist model. Psychological Review, 101, 587–607. doi:10.1037/ 0033-295X.101.4.587

Leri, F., & Rizos, Z. (2005). Reconditioning of drugrelated cues: A potential contributor to relapse after drug reexposure. Pharmacology, Biochemistry and Behavior, 80, 621–630. doi:10.1016/j.pbb.2005.01.013

Pearce, J. M., & Bouton, M. E. (2001). Theories of associative learning in animals. Annual Review of Psychology, 52, 111–139. doi:10.1146/annurev. psych.52.1.111

Leung, H. T., & Westbrook, R. F. (2008). Spontaneous recovery of extinguished fear responses deepens their extinction: A role for error-correction mechanisms. Journal of Experimental Psychology: Animal Behavior Processes, 34, 461–474. doi:10.1037/00977403.34.4.461

Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552. doi:10.1037/0033-295X.87.6.532

Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276–298. doi:10.1037/h0076778 Malvaez, M., Sanchis-Segura, C., Vo, D., Lattal, K. M., & Wood, M. A. (2010). Modulation of chromatin modification facilitates extinction of cocaine-induced conditioned place preference. Biological Psychiatry, 67, 36–43. doi:10.1016/j.biopsych.2009.07.032 McEchron, M. D., Tseng, W., & Disterhoft, J. F. (2003). Single neurons in CA1 hippocampus encode trace interval duration during trace heart rate (fear) conditioning in rabbit. Journal of Neuroscience, 23, 1535–1547. McLaren, I. P. L., & Mackintosh, N. J. (2000). An elemental model of associative learning: I. Latent inhibition and perceptual learning. Animal Learning and Behavior, 28, 211–246. doi:10.3758/BF03200258 McNally, G. P., & Westbrook, R. F. (2006). Predicting danger: The nature, consequences, and neural mechanisms of predictive fear learning. Learning and Memory, 13, 245–253. doi:10.1101/lm.196606 Miller, R. R., & Barnet, R. C. (1993). The role of time in elementary associations. Current Directions in Psychological Science, 2, 106–111. doi:10.1111/14678721.ep10772577 304

Peck, C. A., & Bouton, M. E. (1990). Context and performance in aversive-to-appetitive and appetitive-toaversive transfer. Learning and Motivation, 21, 1–31. doi:10.1016/0023-9690(90)90002-6 Perkins, C. C., Jr., Beavers, W. O., Hancock, R. A., Jr., Hemmendinger, P. C., Hemmendinger, D., & Ricci, J. A. (1975). Some variables affecting rate of key pecking during response-independent procedures (autoshaping). Journal of the Experimental Analysis of Behavior, 24, 59–72. doi:10.1901/jeab.1975.24-59 Rescorla, R. A. (1966). Predictability and number of pairings in Pavlovian fear conditioning. Psychonomic Science, 4, 383–384. Rescorla, R. A. (1967). Pavlovian conditioning and its proper control procedures. Psychological Review, 74, 71–80. doi:10.1037/h0024109 Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66, 1–5. doi:10.1037/h0025984 Rescorla, R. A. (1969). Conditioned inhibition of fear resulting from negative CS-US contingencies. Journal of Comparative and Physiological Psychology, 67, 504–509. doi:10.1037/h0027313 Rescorla, R. A. (1988a). Behavioral studies of Pavlovian conditioning. Annual Review of Neuroscience, 11, 329–352. doi:10.1146/annurev.ne.11.030188.001553

Pavlovian Conditioning

Rescorla, R. A. (1988b). Pavlovian conditioning: It’s not what you think it is. American Psychologist, 43, 151–160. doi:10.1037/0003-066X.43.3.151 Rescorla, R. A. (1993). Inhibitory associations between S and R in extinction. Animal Learning and Behavior, 21, 327–336. doi:10.3758/BF03197998 Rescorla, R. A. (2001a). Are associative changes in acquisition and extinction negatively accelerated? Journal of Experimental Psychology: Animal Behavior Processes, 27, 307–315. doi:10.1037/0097-7403.27.4.307 Rescorla, R. A. (2001b). Experimental extinction. In R. R. Mowrer & S. Klein (Eds.), Handbook of contemporary learning theories (pp. 119–154). Mahwah, NJ: Erlbaum. Rescorla, R. A. (2003). Elemental and configural encoding of the conditioned stimulus. Quarterly Journal of Experimental Psychology B: Comparative and Physiological Psychology, 56, 161–176. doi:10.1080/ 02724990244000089 Rescorla, R. A. (2004). Spontaneous recovery. Learning and Memory, 11, 501–509. doi:10.1101/lm.77504 Rescorla, R. A., & Coldwell, S. E. (1995). Summation in autoshaping. Animal Learning and Behavior, 23, 314–326. doi:10.3758/BF03198928 Rescorla, R. A., & Cunningham, C. L. (1977). The erasure of reinstated fear. Animal Learning and Behavior, 5, 386–394. doi:10.3758/BF03209584

Rosas, J. M., & Bouton, M. E. (1997). Renewal of a conditioned taste aversion upon return to the conditioning context after extinction in another one. Learning and Motivation, 28, 216–229. doi:10.1006/ lmot.1996.0960 Rothbaum, B. O., & Davis, M. (2003). Applying learning principles to the treatment of post-trauma reactions. Annals of the New York Academy of Sciences, 1008, 112–121. doi:10.1196/annals.1301.012 Rothbaum, B. O., & Schwartz, A. C. (2002). Exposure therapy for posttraumatic stress disorder. American Journal of Psychotherapy, 56, 59–75. Rudy, J. W., & O’Reilly, R. C. (1999). Contextual fear conditioning, conjunctive representations, pattern completion, and the hippocampus. Behavioral Neuroscience, 113, 867–880. doi:10.1037/07357044.113.5.867 Schafe, G. E., Sollars, S. I., & Bernstein, I. L. (1995). The CS–US interval and taste aversion learning: A brief look. Behavioral Neuroscience, 109, 799–802. doi:10.1037/0735-7044.109.4.799 Schmajuk, N. A., Gray, J. A., & Lam, Y. W. (1996). Latent inhibition: A neural network approach. Journal of Experimental Psychology: Animal Behavior Processes, 22, 321–349. doi:10.1037/0097-7403.22.3.321 Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27.

Rescorla, R. A., & Cunningham, C. L. (1978). Within-compound flavor associations. Journal of Experimental Psychology: Animal Behavior Processes, 4, 267–275. doi:10.1037/0097-7403.4.3.267

Stafford, J. M., & Lattal, K. M. (2011). Is an epigenetic switch the key to persistent extinction? Neurobiology of Learning and Memory, 96, 35–40. doi:10.1016/j. nlm.2011.04.012

Rescorla, R. A., & Holland, P. C. (1976). Some behavioral approaches to the study of learning. In M. R. Rosenzweig & E. L. Bennett (Eds.), Neural mechanisms of learning and memory (pp. 165–192). Cambridge, MA: MIT Press.

Stout, S. C., & Miller, R. R. (2007). Sometimescompeting retrieval (SOCR): A formalization of the comparator hypothesis. Psychological Review, 114, 759–783. doi:10.1037/0033-295X.114.3.759

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York, NY: Appleton-Century-Crofts. Rizley, R. C., & Rescorla, R. A. (1972). Associations in second-order conditioning and sensory preconditioning. Journal of Comparative and Physiological Psychology, 81, 1–11. doi:10.1037/h0033333 Robbins, S. J. (1988). Role of context in performance on a random schedule in autoshaping. Journal of Experimental Psychology: Animal Behavior Processes, 14, 413–424. doi:10.1037/0097-7403.14.4.413 Robbins, S. J. (1990). Mechanisms underlying spontaneous recovery in autoshaping. Journal of Experimental Psychology: Animal Behavior Processes, 16, 235–249. doi:10.1037/0097-7403.16.3.235

Thein, T., Westbrook, R. F., & Harris, J. A. (2008). How the associative strengths of stimuli combine in compound: Summation and overshadowing. Journal of Experimental Psychology: Animal Behavior Processes, 34, 155–166. doi:10.1037/0097-7403.34.1.155 Twitmeyer, E. B. (1905). Knee-jerks without stimulation of the patellar tendon. Psychological Bulletin, 2, 43–44. Vansteenwegen, D., Vervliet, B., Hermans, D., Beckers, T., Baeyens, F., & Eelen, P. (2006). Stronger renewal in human fear conditioning when tested with an acquisition retrieval cue than with an extinction retrieval cue. Behaviour Research and Therapy, 44, 1717–1725. doi:10.1016/j.brat.2005.10.014 Vaughan, W., Jr. (1982). Choice and the Rescorla-Wagner model. In M. L. Commons, R. J. Herrnstein, & H. Rachlin (Eds.), Quantitative analyses of behavior: Vol. 2. Matching and maximizing accounts (pp. 263–279). Cambridge, MA: Ballinger. 305

K. Matthew Lattal

Wagner, A. R. (1981). SOP: A model of automatic memory processing in animal behavior. In N. E. Spear & R. R. Miller (Eds.), Information processing in animals: Memory mechanisms (pp. 5–47). Hillsdale, NJ: Erlbaum.

Wagner, A. R., Logan, F. A., Haberlandt, K., & Price, T. (1968). Stimulus selection in animal discrimination learning. Journal of Experimental Psychology, 76 (2, Pt. 1), 171–180. doi:10.1037/h0025414

Wagner, A. R. (2003). Context-sensitive elemental theory. Quarterly Journal of Experimental Psychology B: Comparative and Physiological Psychology, 56, 7–29. doi:10.1080/02724990244000133

Wagner, A. R., & Rescorla, R. A. (1972). Inhibition in Pavlovian conditioning: Application of a theory. In R. A. Boakes & M. S. Halliday (Eds.), Inhibition and learning (pp. 301–336). New York, NY: Academic Press.

Wagner, A. R., & Brandon, S. E. (1989). Evolution of a structured connectionist model of Pavlovian conditioning (AESOP). In S. B. Klein & R. R. Mowrer (Eds.), Contemporary learning theories: Pavlovian conditioning and the status of traditional learning theory (pp. 149–189). Hillsdale, NJ: Erlbaum.

Westbrook, R. F., Iordanova, M., McNally, G., Richardson, R., & Harris, J. A. (2002). Reinstatement of fear to an extinguished conditioned stimulus: Two roles for context. Journal of Experimental Psychology: Animal Behavior Processes, 28, 97–110. doi:10.1037/0097-7403.28.1.97

306

Chapter 14

The Allocation of Operant Behavior Randolph C. Grace and Andrew D. Hucks

Choice has long been a major topic of research in the experimental analysis of behavior. Although quantifying just how major is difficult, a search of the PsycInfo database found that of the 3,576 articles published in the Journal of the Experimental Analysis of Behavior from 1958 through 2008, 452 (12.6%) included the word choice in the abstract. However, this figure likely underestimates the extent of research on response allocation in the operant literature. Figure 14.1 displays results of the database search at 5-year intervals and shows that the incidence of choice has increased from less than 3% of the abstracts in the journal’s first 10 years to an average of approximately 20% of the abstracts in the past 25 years. Mazur (2001) found that the proportion of Journal of the Experimental Analysis of Behavior articles reporting experiments in which “subjects could choose among two or more operant responses” (p. 96) was 47% in 1997 through 1998. There is no doubt that researchers in the operant tradition have allocated a considerable proportion of their behavior to the study of choice. The article usually cited as the starting point for research on operant choice is Herrnstein (1961). This classic experiment was one of the first published reports (although not the first; see Findley, 1958) to use the concurrent-schedules procedure. In Herrnstein’s version of this procedure, hungry pigeons could respond on two lighted keys in an experimental chamber. Each key was associated with an independent variable-interval (VI) schedule, which reinforced responding with access to food

after an unpredictable delay had elapsed according to the schedule value. Pigeons were trained with a given pair of VI schedules in daily sessions until response rates had stabilized, after which the schedules were changed and a new condition was begun. Herrnstein found that the proportion of responses to each key was approximately equal to the proportion of reinforcers obtained from that key: BL BL = BR + BL BL + BR or BL R L = . BR R R

(1)

In Equation 1, known as the matching law, B is the response rate and R is the reinforcement rate, with the subscripts L and R representing the left and right alternative schedules, respectively. Herrnstein’s (1961) report of matching of relative response and reinforcement rates in concurrent schedules stimulated a great deal of subsequent research on choice. For previous reviews of research on matching and choice behavior, see Davison and McCarthy (1988), Fisher and Mazur (1997), R. R. Miller and Grace (2003), Nevin (1998), Staddon and Cerutti (2003), and Williams (1988, 1994). Most research on choice after Herrnstein (1961) can be described in terms of two major categories: (a) studies that have explored the origin of matching behavior by testing different explanatory theories and models and (b) studies that have used the

DOI: 10.1037/13937-014 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

307

Grace and Hucks

Relave Frequency

0.25 0.20 0.15 0.10 0.05 0

Year

Figure 14.1. The relative frequency of articles published in the Journal of the Experimental Analysis of Behavior that included choice in the keywords or abstract.

empirical generalization of matching as a starting point for developing quantitative models for choice that are applicable to a wide range of situations. Many different theories for matching have been proposed, including momentary maximizing (Hinson & Staddon, 1983a, 1983b; Shimp, 1966, 1969), melioration (Herrnstein & Vaughan, 1980), molar maximizing (Baum, 1981; Rachlin, Green, Kagel, & Battalio, 1976; Staddon & Motheral, 1978), the kinetic model (Myerson & Miezin, 1980), the cumulative effects model (Davis, Staddon, Machado, & Palmer, 1993), fix and sample (Baum, Schwendiman, & Bell, 1999), and the stay–switch model (MacDonall, 2009). Research to test these and other theories has yielded many interesting results and increased researchers’ understanding of behavior, but it is fair to say that no consensus has emerged in favor of a generally accepted explanation for matching. Because this issue has no clear resolution and space is limited, we focus on research in the second category. This chapter is organized as follows. In the first section, we review the evolution of the matching law as a model for choice under concurrent schedules, culminating in the generalized matching law (Baum, 1974a). Next, we consider research on choice between delayed reinforcers in the concurrentchains and adjusting-delay procedures, which has led to several models for choice based on the matching law, including delay-reduction theory 308

(Fantino, 1969), the contextual choice model (Grace, 1994), and the hyperbolic value-added model (Mazur, 2001). In the third section, we review research on the acquisition of choice behavior. In contrast with previous research that has typically used steady-state methodologies, research on acquisition has explored how response allocation adapts when the conditions of reinforcement are changed. Matching as a Descriptive Framework for Choice A major line of research following Herrnstein (1961) has been the evolution of the matching law into a descriptive framework for choice behavior. The first significant proposal was Baum and Rachlin’s (1969) suggestion that the independent variable was the relative value obtained from the alternative schedules. Noting that studies on concurrent schedules after Herrnstein had found matching of response allocation to relative reinforcement magnitude (i.e., seconds of access to grain; Catania, 1963) and to relative immediacy of reinforcement (i.e., the reciprocal of delay to reinforcement; Chung & Herrnstein, 1967), Baum and Rachlin (1969) suggested that these results, as well as Herrnstein’s, could be understood as matching to the relative value of the reinforcement obtained from the alternatives, with value defined as a

The Allocation of Operant Behavior

c ombination of reinforcement rate, immediacy, and magnitude:

a

R BL = log b + a log L . RR BR

0.4

y = 0.81x + 0.04 R² = 0.99 0

-1.2

-0.8

-0.4

0

0.4

-0.4 -0.8

0.8

1.2

Data Matching Overmatching

-1.2 Log Reinforcement Ratio

Figure 14.2. Illustration of generalized matching law. The dotted line indicates perfect matching, the dashed line indicates overmatching (i.e., a > 1). The regression line (solid line) indicates the fit of the generalized matching law to the data. The slope (0.81) is an estimate of sensitivity (a) and exemplifies undermatching. The intercept (0.04) is an estimate of log b. Data from Grace (1993).

(3a)

or, in logarithmic form, log

0.8

(2)

In Equation 2, D is the delay to reinforcement, M is the magnitude of reinforcement, X is an unspecified aspect of reinforcement (e.g., hedonic quality), and V is the reinforcement value, with subscripts L and R again indicating the left and right alternatives. According to Equation 2, known as the concatenated matching law, value is determined multiplicatively by the reinforcer dimensions on which the alternatives differ. Equation 2 is a significant conceptual advance because it proposes that matching is a general principle of choice that is potentially applicable to any reinforcement variable. The next major step in the evolution of matching was taken by Baum (1974a), who noted that some empirical deviations from matching could be understood in terms of a general version of Equation 1: R  BL = b L  , BR  RR 

1.2

Log Response Ratio

BL R L 1/ D L M L X L VL . = ⋅ ⋅ ⋅ = BR R R 1/ DR M R X R VR

Generalized Matching

(3b)

Equation 3 is known as the generalized matching law. It has two parameters: bias (b), which represents a constant proportionality in responding unrelated to reinforcement rate (e.g., position preference), and an exponent (a), which represents sensitivity to reinforcement rate. Typically, the logarithmic form of the generalized matching law is used (Equation 3b), in which response allocation (measured as the log response ratio) is a linear function of the log reinforcement ratio. Sensitivity and log bias correspond to the slope and intercept, respectively, of this function. When a = b = 1, the original matching law (Equation 1) results. Values of a that are less than 1 are termed undermatching because response allocation is less extreme than reinforcer allocation, whereas values of a that are more than 1 are termed overmatching (i.e., response allocation more extreme than reinforcer allocation). It is important to note that sensitivity and log bias are empirically derived

parameters and are not explicitly linked with an underlying process. Estimates of sensitivity and bias are typically obtained by regressing the log response ratio on the log reinforcement ratio (Figure 14.2). See Davison and Elliffe (2009) for a recent discussion of statistical issues related to regression and parameter estimation for behavioral choice data.

Sources of Undermatching Debate about the value of the sensitivity parameter in the generalized matching law has been considerable. The critical question has been whether the normative or typical value of a equals 1.0, consistent with matching, or is less than 1.0, indicating undermatching. Baum (1979) reported an analysis in which Equation 3b was fitted to all the archival data available at the time (103 data sets from 23 studies). He found that the modal value of a was 1.0 when choice was measured as time allocation, but approximately 0.8 when response allocation was the dependent variable. Baum examined variation in values of a for response allocation across studies and found that results from the laboratories of the two investigators who had contributed the 309

Grace and Hucks

largest number of data sets (Davison and Baum) differed systematically. For data sets reported by Davison, modal values were near 0.8, whereas for those reported by Baum they were approximately 1.0. Baum suggested that procedural variables (e.g., type of VI schedule, change of delay duration, level of deprivation) might have contributed to the different results and proposed several factors that could yield undermatching. These factors included asymmetrical pausing (which would affect response allocation but not time allocation as typically measured), heterogeneity of preference, and patterns of changeover responding. Other possible causes of undermatching include reduced discriminability of the schedules (Davison & Jenkins, 1985) or type of VI schedule used (e.g., exponential vs. arithmetic progressions; Elliffe & Alsop, 1996). Baum concluded that matching should be considered normative, with deviations to be attributed to procedural or other factors. Baum et al. (1999) offered an alternative explanation for undermatching in terms of a specific pattern of choice responding. They studied the relationship between average number of responses per visit to an alternative and choice in concurrent VI–VI schedules with pigeons. They found that average responses per visit for the richer alternative increased directly with response allocation but remained low and approximately constant for the leaner alternative, regardless of preference (see Baum et al., 1999, Figure 5). Baum et al. described this pattern as “fix and sample” because it implied a tendency for subjects to remain at the richer alternative while occasionally sampling the leaner alternative, and they suggested that it was consistent with expectations based on forging theory (Houston & McNamara, 1981). Baum et al. showed that when their results were expressed as preference for the richer over the leaner alternative, undermatching could be produced by a bias toward the leaner alternative, that is, for subjects to make more responses to the lean alternative per visit than predicted by matching. Whether fix and sample can provide a convincing account of undermatching remains unclear. Baum and Davison (2004) and Aparicio and Baum (2006) have found evidence for the development of the fix-and-sample pattern in experiments in which 310

pigeons and rats have responded under dynamic conditions (see the section Acquisition in Concurrent Schedules later in this chapter). However, a recent meta-analysis of concurrent-schedules studies failed to find systematic deviations in the fits of the generalized matching law that would be expected with a fix-and-sample pattern (Sutton, Grace, McLean, & Baum, 2008).

Generalized Matching Law as Descriptive Model Baum (1979) found that Equation 3b accounted for 90.3% of the variance in response allocation, averaged across the archival data sets. Thus, in the generalized form represented in Equation 3, the matching law provides an excellent description of choice in concurrent schedules. The power function generalization in Equation 3 may be joined with Baum and Rachlin’s (1969) suggestion that different dimensions of reinforcement are combined multiplicatively to yield a concatenated generalized matching law:  R  r  1/ D  d  M  m BL L = b L     L BR  R R  1/ DR   M R  a

a

a

(4a)

or log

R BL 1/ D L = log b + a r log L + a d log RR BR 1/ DR + a m log

ML . MR

(4b)

According to Equation 4b, response allocation in concurrent schedules is determined by the additive combination of different dimensions of reinforcement, each with its own sensitivity parameter (ar, ad, and am, for rate, delay, and magnitude, respectively). Equation 4 is a significant extension of the matching law, enabling it to apply to a broad range of choice situations. One of the most important is self-control, which has been a major focus of research because of its obvious relevance for human behavior (Logue, Rodriguez, Pena-Correal, & Mauro, 1984). In a self-control situation, subjects face a choice between a small reinforcer available immediately (or after a short delay) and a larger reinforcer available after a longer delay. Typically,

The Allocation of Operant Behavior

overall reinforcement is maximized by choosing the delayed, larger reinforcer, which is termed selfcontrol (Rachlin & Green, 1972; see Rachlin, 1995, for review). By contrast, choosing the smaller, less delayed reinforcer is termed impulsivity. For example, consider that pigeons are given a choice between a small reinforcer (e.g., 2-second access to grain) delayed by 1 second or a large reinforcer (e.g., 6-second access to grain) delayed by 6 seconds. If for simplicity one assumes that ad = am = 1, then Equation 4 predicts a 2:1 ratio of responses for the small reinforcer (i.e., the 6:1 delay ratio is greater than the 2:6 magnitude ratio). However, if the delays to both the small and the large reinforcers are increased by the same amount, then Equation 4 predicts a reversal of preference. For example, if the delays are both increased by 10 seconds, then predicted preference for the small reinforcer is only 33% (the 16:11 delay ratio is no longer enough to compensate for the 2:6 magnitude ratio). Empirical support for such preference reversals has been obtained in both human and nonhuman choice (Green & Snyderman, 1980; Kirby & Herrnstein, 1995; see Green & Myerson, 2004, for review). Equation 4 assumes that for each reinforcer variable, response allocation is a power function of relative reinforcement, and the effects of different variables are additive and independent. However, it may be viewed more broadly as a framework for the description of behavioral choice. From this perspective, a matching relationship is assumed—response allocation in a two-alternative choice situation is taken to be a function of the reinforcement obtained from the alternatives—and the goal of research is to identify the functional laws relating them (Killeen, 1972; Mazur, 2006). We consider in the following few sections evidence regarding the functional relation between response allocation and reinforcement rate, magnitude, and delay.

Reinforcement Rate Here we consider whether the generalized matching law’s assumption of a power function relationship between response and reinforcer allocation is valid. To determine empirically whether variables are related by a power function, or some other function, requires a comparison of different models. Model

comparison is a rich and rapidly developing branch of statistical theory (for accessible reviews, see Burnham & Anderson, 2002; Cutting, 2000; Pitt, Myung, & Zhang, 2002). As noted earlier, the generalized matching law accounts for a high proportion of the variance in concurrent-schedules data (Baum, 1979), but goodness of fit can provide only weak evidence in favor of a model (Roberts & Pashler, 2000). An alternative model might provide a better account of the data, or the predictions of the generalized matching law might systematically deviate from the data. Contingency discriminability model. Davison and Jenkins (1985) proposed an alternative model for concurrent schedules based on the assumption that subjects might not discriminate perfectly between the alternatives. This idea was supported by an earlier study by J. T. Miller, Saunders, and Bourland (1980), who arranged a procedure in which the alternative in effect was signaled by one of two line orientations projected on a key and pigeons could switch between alternatives by pecking a second, changeover key (Findley, 1958). J. T. Miller et al. found that as the difference between the line orientations decreased, sensitivity to relative reinforcement rate also decreased. On the basis of this result, Davison and Jenkins proposed a model for choice in concurrent schedules that assumed subjects might misallocate reinforcers obtained from the alternatives: R − pR L + pR R BL . =b L R R − pR R + pR L BR

(5)

Equation 5 is known as the contingency discriminability model. According to Equation 5, subjects match their response allocation to the proportion of reinforcers allocated to each alternative (with bias, b). The parameter p is a “proportional confusion” parameter that varies between .0 and .5 and determines the likelihood that reinforcers are misallocated. When p = .0, discrimination is perfect and Equation 5 predicts matching. When p = .5, the alternatives are not discriminated at all, and subjects are indifferent. For values of p between .0 and .5, response allocation undermatches relative reinforcement rate, with the degree of undermatching determined by p. Davison and Jenkins (1985) showed 311

Grace and Hucks

that Equation 5 accounted for more than 99% of the variance in predictions of the generalized matching law with reinforcer ratios ranging from 1:10 to 10:1 and a values ranging from .1 to .9. They argued that the contingency discriminability model could provide an equivalent empirical account of the data as the generalized matching law but was conceptually superior because p was linked to a behavioral process (i.e., response–reinforcer discriminability), unlike sensitivity in the generalized matching law. Davison and colleagues (e.g., Alsop & Davison, 1991; Davison & Nevin, 1999; Nevin, Davison, & Shahan, 2005) have developed the contingency discriminability concept into a range of models for conditional discrimination and choice, including signal detection and matching-to-sample procedures. Comparing the generalized matching law and contingency discriminability model. Because the contingency discriminability model and generalized matching law make very similar predictions within the usual range of reinforcer ratios studied (e.g., 1:10–10:1), the question of how response allocation behaves at extreme reinforcer ratios has assumed considerable importance. Davison and Jones (1995) noted that whereas the generalized matching law predicted that the relationship between log response and reinforcer ratios was linear over the full range of reinforcer ratios, the contingency discriminability model predicted that the relationship was nonlinear, with a flattening of preference observed at extreme reinforcer ratios. To test this prediction, they conducted an experiment in which pigeons were trained on a switching-key concurrent schedule in which the alternatives differed in terms of luminance on the main key. When an interval sampled from a VI schedule had elapsed, the reinforcer was assigned probabilistically to one alternative or the other. This arrangement is known as dependent scheduling (Stubbs & Pliskoff, 1969) and ensures that the obtained relative rate of reinforcement is equal to the programmed value. The reinforcer ratio was varied between 256:1 to 1:256 across conditions. Davison and Jones showed that the contingency discriminability model predicted that preference should deviate systematically from the predictions of the generalized matching law at extreme reinforcer 312

ratios, with response allocation closer to indifference than predicted by the generalized matching law. Their results supported the predictions of the contingency discriminability model. Baum et al. (1999) noted that some aspects of the procedure used by Davison and Jones (1995) were different from most previous studies using concurrent schedules, particularly the use of two different light intensities to signal the alternatives. In addition, Baum et al. suggested that other features of Davison and Jones’s procedure, such as the stability criterion or changeover delay, might also have contributed to their results. They conducted an experiment that examined pigeons’ preference at extreme reinforcement ratios using a two-key procedure. Baum et al. provided additional sessions of training in each condition after Davison and Jones’s stability criterion had been met and arranged conditions both with and without a changeover delay. They found, contrary to Davison and Jones’s results, that the data showed no significant deviations from linearity and were consistent with the generalized matching law, even at extreme reinforcer ratios. Baum et al. also found that their results did not change when different stability criteria were used or with the presence or absence of the changeover delay. Thus, Davison and Jones’s and Baum et al.’s results conflict and do not resolve the question of whether the generalized matching law or the contingency discriminability model provides a better account of concurrentschedule performance. Sutton et al. (2008) pursued an alternative strategy for comparing these models, which they called residual meta-analysis. The upper left panel of Figure 14.3 shows data generated by the generalized matching law for reinforcer ratios between 1:10 and 10:1 and the predictions of the contingency discriminability model when fitted to these data. Conversely, the lower left panel shows data generated by the contingency discriminability model and the predictions of the generalized matching law when fitted to these data. Although the fits of both models are excellent, with more than 99.9% of the variance accounted for, systematic deviations are apparent. The right-hand panels of Figure 14.3 show the residuals of the fits for each model (i.e., obtained minus predicted) as a function of the

The Allocation of Operant Behavior

1

0.04

0.8

0.03

0.6

Log B1 / B2

0.2 GML data

0 -1.5

-1

-0.5

-0.2

0

0.5

1

1.5

CDM pred

-0.4

CDM Residual

0.02

0.4

0.01 0 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0.4

0.6

0.8

1

-0.01 -0.02

-0.6 -0.8

-0.03

-1

-0.04

log R1 / R2

Predicted CDM

1

0.04

0.8

0.03

0.6

Log B1 / B2

0.2 GML data

0 -1.5

-1

-0.5

-0.2

0

0.5

1

1.5

CDM pred

-0.4 -0.6 -0.8 -1

log R1 / R2

GML Residual

0.02

0.4

0.01 0 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

-0.01 -0.02 -0.03 -0.04

Predicted GML

Figure 14.3. The upper left panel shows log response allocation predicted by the generalized matching law (GML) for a range of reinforcer ratios from 1:10 to 10:1, assuming a = 0.80 and log b = 0 in Equation 14.3b. The line shows the fit of the contingency discriminability model (CDM), for which a logarithmic transformation of Equation 14.5 with pr = .067 and log b = 0 accounted for more than 99.9% of the variance. The residuals from the fits of the CDM are shown in the upper right panel. The bottom panels show the case in which log response allocation data were generated by the CDM with pr = .067 and fitted by the GML, which with a = 0.80 accounted for more than 99.9% of the variance. The residuals from the fit of the GML are shown in the lower right panel. From “Comparing the Generalized Matching Law and Contingency Discriminability Model as Accounts of Concurrent Schedule Performance Using a Residual Meta-Analysis,” by N. P. Sutton, R. C. Grace, A. P. McLean, and W. M. Baum, 2008, Behavioral Processes, 78, p. 225. Copyright 2008 by Elsevier Ltd. Adapted with permission.

predicted values. The pattern in the residuals can be described as a third-order polynomial, which differs in the sign of the cubic coefficient. If the generalized matching law is the true model, then the residuals of the contingency discriminability model should have a positive cubic coefficient, whereas if the latter is the true model, then the residuals of the generalized matching law should have a negative cubic coefficient. Sutton et al. (2008) noted that although the residuals were very small (less than 0.1% of the variance in the data), it might be possible to test for

systematic patterns such as those shown in the right-hand panels of Figure 14.3 if results were pooled across studies. Aggregating residuals across studies should produce an increase in statistical power, similar to meta-analysis based on pooling effect sizes from independent studies, which is commonly used in the social sciences (Rosenthal & DiMatteo, 2001). Sutton et al. obtained concurrentschedules response allocation data from 17 different studies (total n = 771 data points). Both the generalized matching law and the contingency discriminability model provided an excellent description of 313

Grace and Hucks

the data in terms of goodness of fit, with average variance accounted for equal to 94.7% and 94.3%, respectively. Residuals were obtained from the fits of each model and pooled across data sets. Polynomial regressions in which the residuals were regressed on the predicted values showed that a significant polynomial with a positive cubic coefficient was obtained for the residuals from fits of the contingency discriminability model, whereas the regression for the residuals from fits of the generalized matching law was not significant. These findings support the generalized matching law, because the obtained pattern in the residuals of the contingency discriminability model is just what would be expected if the relationship between log response allocation and log reinforcement rate was strictly linear over the range of reinforcer ratios in the sample (approximately 10:1–1:10). Thus, Sutton et al.’s results provide strong support for the generalized matching law and its assertion that a power function provides an accurate description of the relationship between response allocation and relative reinforcement rate.

Reinforcement Magnitude In nonhuman research, reinforcement magnitude is typically defined as the quantity of a biologically significant stimulus per reinforcer, for example, seconds of access to grain (Catania, 1963), number of food pellets (Mazur, 1988), or rate of electrical brain stimulation (Leon & Gallistel, 1998). In contrast to reinforcement rate, comparatively few studies have varied reinforcer magnitude parametrically while holding other variables constant. In the most extensive study, Landon, Davison, and Elliffe (2003; see also Elliffe, Davison, & Landon, 2008) varied relative magnitude across five levels from 7:1 to 1:7 in a switching-key concurrent schedule with pigeons. Magnitude was defined as the number of brief (1.2-second) food presentations per reinforcement. Landon et al. found that the generalized matching law provided an excellent description of their data: Averaged across subjects, the generalized matching law accounted for 99.6% of the variance, and sensitivity to relative magnitude was 0.76. Thus, the available evidence, although limited, suggests that the relationship between response allocation and relative magnitude can be described as a power function. 314

Choice Between Delayed Reinforcers The first investigations of the effects of reinforcement delay on concurrent schedules were reported by Chung (1965) and Chung and Herrnstein (1967). In the latter study, pigeons responded on concurrent VI 1-minute–VI 1-minute schedules with a 1-second changeover delay. Responses on one key produced food after a standard delay (either 8 or 16 seconds), and responses on the other key produced food after a delay that varied between 1 and 30 seconds across conditions. Chung and Herrnstein reported that response allocation matched the relative immediacy (i.e., reciprocal of delay) of reinforcement, consistent with Equation 2 (cf. Williams & Fantino, 1978). In Chung and Herrnstein’s (1967) study, reinforced responses produced a darkening of both keys and the houselight and were followed by 3-second access to grain after the delay. Because the concurrent schedules were equal, their procedure may be viewed as a choice between two outcome schedules (e.g., 8-second delay to food vs. 16-second delay to food). Note that these outcomes are mutually exclusive in that once a left-key response initiates an 8-second delay to food, that delay and food delivery must be completed before the keys are illuminated again, and the pigeon can respond to either key. This procedure, in which subjects respond during a choice phase to produce one of two mutually exclusive outcome schedules, is known as concurrent chains. Although the outcome schedules were not differentially signaled in Chung and Herrnstein’s (1967) study, in most cases they have been.

Concurrent-Chains Procedure Figure 14.4 shows a diagram of a typical concurrentchains procedure, with pigeons as subjects. Responses during a choice phase, or initial links, in which two keys are illuminated white, are reinforced by concurrent VI schedules with access to one of two mutually exclusive alternatives, known as terminal links. The name of the procedure reflects the fact that for each key, the initial and terminal links constitute a chain schedule. The onset of a terminal link is signaled by a change in color on the key (e.g., left key to red, right key to green) coupled with darkening

The Allocation of Operant Behavior

Initial Links VI 30 s VI 30 s W

W

G

R VI 10 s

Terminal Links

Food

VI 20 s

Food

Initial Links

Figure 14.4. The concurrent-chains procedure. During the initial links two keys are lighted white, and responses are reinforced by concurrent VI 30 s schedules with access to one of two mutually exclusive terminal-link schedules. Entry into a terminal link is signaled by a change in color on the respective key (left key to red, right key to green) coupled with the other key being darkened. Responses during the terminal links are reinforced with access to food according to a VI 10 s (left key) or VI 20 s (right key) schedule. After reinforcement in a terminal link, the initial links are reinstated and the next cycle begins. G = green key; VI = variable interval; R = red key; W = white key.

of the other key. After reinforcement is delivered in a terminal link, the initial link is reinstated. The usual result is that subjects respond relatively more to the initial link leading to the terminal link, which is associated with the shorter delay to reinforcement. For example, in Figure 14.4, if the terminal links associated with the left and right keys were VI 10-second and VI 20-second schedules, respectively, then the pigeons would respond relatively more to the left initial-link key. The concurrent-chains procedure has traditionally been viewed as a way of studying conditioned reinforcement. A conditioned reinforcer is an initially neutral stimulus that has acquired the ability to act as a reinforcer through a history of pairing with a primary reinforcer such as food (for review, see Williams, 1994). Conditioned reinforcement (sometimes called secondary reinforcement) has historically been an important topic for learning theory because much

human behavior is maintained by stimuli that are not biologically significant (e.g., money). In concurrent chains, production of the terminal-link stimuli acts to reinforce initial-link responding, and so response allocation in the initial links may be interpreted as a measure of the relative strength or value of the terminal-link stimuli as conditioned reinforcers. Although the theoretical status of conditioned reinforcement has been controversial (e.g., Baum, 1974b; Schuster, 1969; Shahan, 2010), a concurrentchains procedure provides a behavioral measure of the relative attractiveness of the terminal-link outcome schedules. In addition, a converging measure of relative value is provided by the resistance to change of responding during the terminal link: Nevin and Grace (2000; see also Grace & Nevin, 1997) have shown that resistance to change and preference in concurrent chains are correlated and argued that these measures are independent assays of a single construct that represents the history of reinforcement correlated with a particular stimulus. As such, results from experiments using concurrent chains and related procedures are important for understanding how the value of a stimulus is determined by the conditions of reinforcement that it signals (for an alternative discussion, see Moore, 2008). Early studies (Autor, 1969; Herrnstein, 1964b) reported that initial-link response allocation matched the relative rate of terminal-link reinforcement and concluded that the conditioned reinforcement value of a terminal-link stimulus was determined by the rate of reinforcement in its presence, which suggested that the matching law might apply to conditioned as well as primary reinforcement, because concurrent chains may be viewed as a concurrent schedule of conditioned reinforcement.

Delay Reduction Theory Fantino (1969) arranged a concurrent-chains procedure with VI 30-second–VI 90-second terminal links and varied the duration of the equal initial-link schedules from VI 40 seconds to VI 600 seconds. He found that response allocation for the VI 30-second terminal link became less extreme as the initial-link duration was increased. For example, when the initial links were VI 40 seconds–VI 40 seconds, the average preference for the initial link leading to VI 315

Grace and Hucks

30 seconds was 95%, but with VI 600-second–VI 600-second initial links, it was 60%. Fantino’s (1969) results were contrary to predictions based on the matching law (Herrnstein, 1964b) and the assumption that conditioned reinforcement value was determined by reinforcement rate, which predicted that preference should not have changed as initial-link duration increased. Fantino proposed an alternative model for conditioned reinforcement to explain his results. He suggested that the strength of a terminal-link stimulus as a conditioned reinforcer was determined by the reduction in delay to reinforcement that it signaled. Initial-link response allocation is then predicted by the relative delay reduction: BL T − D L . = BR T − DR

(6)

In Equation 6, B is the initial-link response rate and D is the average reinforcement delay from terminallink onset, with the subscripts L and R representing the left and right alternative schedules, respectively. T is the overall average delay to reinforcement computed from the onset of the initial links. The decrease in preference as initial-link duration increases is known as the initial-link effect. Equation 6 predicts the initiallink effect because the conditioned reinforcement value of the terminal links depends on the temporal context of reinforcement; specifically, a terminal link that signals a given delay to reinforcement becomes a more effective conditioned reinforcer when the initiallink duration is increased and reinforcers occur less frequently overall. Fantino’s (1969) model, known as delay reduction theory (DRT), has been the most influential account of conditioned reinforcement over the past several decades (for reviews, see Fantino, 1977, 2001; Fantino, Preston, & Dunn, 1993; Fantino & Romanowich, 2007). DRT has been applied to a wide range of situations, including foraging (Fantino & Abarca, 1985), delayed matching to sample (Wixted, 1989), and self-control (Navarick & Fantino, 1976), underscoring its generality. DRT also predicts an effect of overall terminal-link duration on preference in concurrent chains that was later reported by MacEwen (1972). In that study, the initial-link schedules were constant as the duration of terminal-link schedules was increased, with the ratio 316

held constant. For example, in one condition the terminal links were fixed interval (FI) 5 seconds–FI 10 seconds, and in another condition the terminal links were FI 40 seconds–FI 80 seconds. MacEwen found that the preference for the richer terminal link increased as the overall duration increased, a result known as the terminal-link effect. This result is important because it violates expectations based on Weber’s law (Gibbon, 1977). If choice between delayed reinforcers is similar to temporal discrimination, one would expect that a constant ratio of terminal-link delays would produce a constant preference. Yet the terminal-link effect is a robust phenomenon, having been reported for both FI and VI terminal links (Grace & Bragason, 2004; MacEwen, 1972; Williams & Fantino, 1978) and terminal links that differ in reinforcer magnitude (Navarick & Fantino, 1976) and probability (Spetch & Dunn, 1987). Although DRT has been an influential theory of conditioned reinforcement and correctly predicts the initial- and terminal-link effects, it has several limitations. Because delay reduction depends on the average terminal-link delay to reinforcement, it does not predict any effect of differences in the delay distributions. However, one of the most wellestablished results in the concurrent-chains literature is preference for variability: Nonhuman subjects prefer a VI (or variable delay) over a FI (or fixed delay) terminal link, with the average delays equated (Herrnstein, 1964a; Killeen, 1968; Moore, 1984). Also, if the average delays are the same, preference for the richer alternative is greater when the terminal links are both FI than when they are both VI (e.g., Grace, 2002). Finally, because it lacks free parameters, DRT is unable to account for variability in individual-subject data and thus could not provide the kind of quantitative integration for concurrent chains that the generalized matching law accomplished for concurrent schedules. For these reasons, researchers have continued to pursue alternative models for choice in concurrent chains.

Concurrent Chains and Generalized Matching: The Contextual Choice Model Davison (1987) suggested that the generalized matching law (Equation 4) could serve as a basis for modeling concurrent-chains performance because concurrent

The Allocation of Operant Behavior

schedules may be viewed as a concurrent chain with 0-second terminal links. The implication is that a model that reduced to the generalized matching law in the limit as terminal-link duration approached zero could potentially apply to both procedures. In the concatenated generalized matching law (Equation 4) as applied to concurrent chains, R refers to rate of conditioned reinforcement (i.e., rate of terminal-link entry) and D and M refer to the delays and magnitudes of terminal-link reinforcement. However, Davison noted that the concatenated generalized matching law was not a satisfactory model for concurrent chains because it did not predict the initial- or terminal-link effects. Grace (1994) proposed a model for concurrent chains based on the generalized matching law that addressed the shortcomings noted by Davison (1987). In his contextual choice model (CCM), sensitivity to terminal-link reinforcer variables depends on the temporal context of reinforcement: a a a  R L  r  1/ D L  d  M L  m  BL  = b    BR  R R   1/ DR   M R    

Tt / Ti

. (7)

Equation 7 is similar to the generalized matching law for concurrent chains suggested by Davison (1987) but includes an additional exponent, Tt/Ti, which is the ratio of the average terminal-link to initial-link durations. Thus, the effective sensitivities to terminal-link reinforcement variables are ad * (Tt/Ti) and am * (Tt/Ti), which allows CCM to predict the initial-link effect (Tt/Ti decreases with increases in initial-link duration) and the terminal-link effect (Tt/Ti increases with increases in terminal-link duration). Equation 7 also reduces to the generalized matching law in the limit as Tt approaches zero, so it may be viewed as an extension of the generalized matching law that can accommodate both concurrent chains and concurrent schedules. Grace (1994) applied CCM to archival data from concurrent-chains studies in which the terminal links were both VI or both FI schedules (92 individualsubject data sets from 19 studies). Averaged across studies, the model accounted for more than 90% of the variance in relative initial-link responding. He concluded that the generalized matching law and CCM made up a single framework for modeling choice in concurrent chains and concurrent schedules.

Although Equation 7 provides an important step in terms of extending the scope of the matching law to accommodate results from concurrent chains, it is limited because by calculating terminal-link value as a function of the average delay to reinforcement, it fails to account for preference for variability. The search for a method of computing the value of a distribution of reinforcer delays that would apply to fixed as well as variable schedules has been an important goal for research (see Grace, 1996).

Preference for Variability: The Hyperbolic Decay Model As noted earlier, one of the most well-known results in the concurrent-chains literature is preference for variability, that is, response allocation that favors variable over fixed delays with the overall mean delay equated. Preference for variability, which has also been obtained with humans (e.g., Lagorio & Hackenberg, 2010; Locey, Pietras, & Hackenberg, 2009), indicates that the arithmetic mean is inadequate for computing conditioned value. Beginning with Killeen (1968), several studies attempted to find a fixed-variable equivalence rule, that is, a method for computing the value of a distribution of reinforcer delays that would apply to both fixed and variable schedules (Davison, 1969, 1972). However, these studies yielded inconsistent results, casting doubt on the possibility of a general rule for fixedvariable equivalence (Navarick & Fantino, 1974). Mazur (1984) noted that studies on preference for variability might have provided mixed data because of complexities inherent in concurrent chains, particularly the extended duration of the choice phase (initial links), and suggested that more consistent results might be obtained if subjects were able to register their preference with a single response. He introduced a new procedure for studying preference between fixed and variable delays. The adjustingdelay procedure (see Figure 14.5) uses titration to determine an indifference point for a schedule of variable delays, that is, the fixed delay that is chosen equally as often as a variable-delay schedule. In this procedure, after an intertrial interval two response keys are illuminated, for example, one red and the other green. A response to the red key initiates a delay to reinforcement sampled from a variable 317

Grace and Hucks

15s ITI

W

Start of Trial

1 Peck

Choice Trial

R

G 1 Peck

1 Peck

Standard Delay Blue Houselight

Adjusting Delay Orange Houselight

3s Food

Next ITI

Figure 14.5. The adjusting-delay procedure. After an intertrial interval (ITI) during which the keys are dark, the center key is lighted white. A single response to the center key illuminates the side keys, red and green. A response to the red key extinguishes all keys and illuminates a blue houselight, and reinforcement (3 s access to food) is provided after a delay selected from a variable distribution (standard delay). If the subject responds to the green key, an orange houselight is illuminated and reinforcement is provided after an adjusting delay that is changed gradually over trials until the subject is about equally likely to respond to the red and green keys. This figure illustrates a free choice trial. Forced choice trials in which only one key (red or green) is illuminated are also included to ensure that the subject continues to experience both the standard and adjusting delays. G = green key; ITI = intertrial interval; R = red key; W = white key.

schedule (standard), and a response to the green key initiates a fixed delay. Colored houselights are illuminated during the delays. The key aspect of the feature is that the fixed delay is adjusted over trials to yield an indifference point—the fixed delay that is chosen about equally as often as the variable delay. Mazur (1984) determined fixed-delay indifference points for a range of variable-delay schedules and found that they were well described by the following equation: V= 318

1 n 1 . ∑ n i=1 1 + Kd i

(8)

Equation 8 is known as the hyperbolic decay model, and it assumes that the value of a delayed reinforcer is determined as a hyperbolic function of its delay. It calculates the value (V) of a distribution of delays to reinforcement, d1, dn, as the average of the delays after they have been rescaled by a hyperbolic function. K is a parameter that determines how rapidly values decreases with delay (see Figure 14.6). Mazur (1984) showed that the hyperbolic decay model provided an excellent description of his data, accounting for more than 90% of the average data with K = 1. Thus, his results suggest that the adjusting-delay procedure is an effective method for studying the effects of delayed reinforcement and that the function relating the value of a reward to its delay may be described by a simple hyperbola (Figure 14.6). Over the past 25 years, Mazur and colleagues have reported an impressive series of studies in which the adjusting-delay procedure has been used to address specific issues relating to the general question of how the value of a stimulus is determined by the conditions of reinforcement that it signals (for reviews, see Mazur, 1993, 1997, 1998). These issues include choice between fixed- and variable-ratio schedules (Mazur, 1986b); probability of reinforcement (Mazur, 1985, 1989, 1998; Mazur & Romano, 1992); signaling effects in probabilistic reinforcement (Mazur, 1995, 1996); choice between alternatives differing in delay and magnitude of reinforcement (Mazur, 1987); choice between single and multiple reinforcers (Mazur, 1986a, 2007a); transitivity in choice between fixed and variable delays (Mazur & Coe, 1987); choice between alternatives differing in probability and amount of reinforcement (Mazur, 1988); effects of required forceful responding (Mazur & Kralik, 1990); effects of reinforcers during the intertrial interval (Mazur, 1994); procrastination in preference for larger, more delayed work requirements (Mazur, 1996, 1998); and species differences between pigeons and rats in choice between alternatives differing in delay, amount, and probability of reinforcement (Mazur, 2005, 2007b; Mazur & Biondi, 2009). Results of these studies have provided support for Equation 8 as a model for temporal discounting,

The Allocation of Operant Behavior

1 0.8

V

0.6 0.4 0.2 0 0

20

40

60

80

100

Delay (s)

Figure 14.6. Gradient of value (V) predicted by the hyperbolic decay model (Equation 14.8, with K = 0.2) as a function of delay.

that is, how the value of a reward changes as a function of its delay. More important, the hyperbolic decay model has been shown to accurately describe choices not only of pigeons and rats but also of humans (Lagorio & Hackenberg, 2010; Rodriguez & Logue, 1988). The hyperbolic decay model has been very influential in the study of intertemporal choice, that is, decision making by humans when one or more of the outcomes is delayed (for review, see Green & Myerson, 2004). The distinguishing feature of the hyperbola as a model for intertemporal choice is that it predicts that the rate of temporal discounting decreases with delay; that is, delayed rewards are discounted more steeply over delays in the short term than delays in the long term. In this respect, the hyperbola contrasts with exponential discounting, the normative model according to economic theory, which predicts that the discounting rate is constant. Studies have consistently found that the hyperbolic model performs better than the exponential model for describing intertemporal choice by humans (e.g., Myerson & Green, 1995).

Hyperbolic Value-Added Model Mazur (2001) proposed the hyperbolic value-added model (HVA) as an alternative matching-based model for choice. Mazur’s model is based on the hyperbolic function as a model for discounting and assumes that the values of both initial- and terminallink stimuli in concurrent chains are determined via

hyperbolic discounting (Equation 8). Because the terminal-link stimuli are associated with shorter delays to reinforcement, they have greater value than the initial links. The model predicts that initiallink response allocation depends on the relative value added by the terminal-link stimuli. Formally, a

 R L  r  VL − a d Vi  BL =b   , BR  R R   VR − a d Vi 

(9)

where VL and VR are the values of the left and right terminal-link stimuli, Vi is the value of the initial links, and ad is a sensitivity parameter. Note that HVA retains the same generalized-matching kernel as the CCM; the difference is the right-hand expression in brackets, which gives the relative value added by the terminal-link stimuli. Mazur (2001) compared the ability of the HVA and CCM to describe archival data from concurrentchains studies. He fitted both models to the same data sets used by Grace (1994) and found that both models accounted for a similar percentage of the variance in initial-link response allocation (CCM = 90.8%; HVA = 89.6%). For the sake of comparison, Mazur also examined predictions of a version of DRT with added sensitivity parameters and found that it also provided a reasonably good description of the data, although somewhat less than the other models, with an average percentage of the variance of 83.0%. He concluded that both HVA and CCM 319

Grace and Hucks

described the data well and could not be distinguished on the basis of overall goodness of fit.

Testing Models for Choice Identifying the optimal model for choice in concurrent chains is important because it could provide a definitive answer to the question of how conditioned value is determined by delay and other reinforcer variables. Thus, Mazur (2001) suggested that the models be evaluated by exploring situations in which they made different predictions. Such situations that have been studied to date include the constant-difference effect, three-alternative choice, and added initial-link reinforcers. Constant-difference effect. Savastano and Fantino (1996) noted a counterintuitive prediction of DRT: If the difference between a pair of VI schedules was kept constant while their absolute duration increased, preference for the shorter VI schedule should not change. For example, in one condition (short), the terminal links were VI 5 seconds and VI 25 seconds, and in a second condition (long), they were VI 100 seconds and VI 120 seconds. This prediction is counterintuitive because the ratio between the terminal-link schedules decreases with increases in overall duration. However, DRT predicts that preference should not change because when a constant duration is added to each schedule, the average time to reinforcement from the onset of the initial links (T) increases by the same amount, and so the delay reductions for the terminal links do not change. Savastano and Fantino (1996) noted that CCM also predicted the constant-difference effect, because the decrease in preference as the terminal-link ratio was decreased was offset by the effect of increasing overall terminal-link duration. They reported two experiments that confirmed the constant-difference effect predicted by DRT. Mazur (2002b) acknowledged the constantdifference effect was problematic for HVA and reported two experiments with response-independent schedules (fixed time [FT] and variable time [VT]) that tested it. (FT and VT schedules deliver reinforcers after fixed and variable time periods, respectively, without regard to a response.) He found that preference for 2-second versus 12-second delay was stronger 320

than that for 40-second versus 50-second delay, contrary to Savastano and Fantino (1996). Although the reason for the discrepancy across studies is unclear, Mazur suggested that procedural differences—such as response-dependent versus response-independent terminal links—may have contributed to the difference in results. Resolution of this issue awaits future research, although we should note the methodological difficulty of distinguishing between a null prediction (i.e., the constant-difference effect) and a small decrease in preference. Three-alternative choice. Mazur (2000) reported two experiments that tested predictions of DRT, HVA, and CCM for choice involving two versus three alternatives. The basic idea was to test whether preference between a constant pair of terminal links would change depending on the addition of a third alternative. In one comparison, the initial links were independent VI 60-second schedules, and the terminal links were FT 3 seconds and FT 12 seconds in one condition; in another condition, a third alternative was added in which the initial link was also VI 60 seconds, and food was delivered immediately on terminal-link entry (i.e., FT 0 second). Mazur showed that CCM predicted no change in the preference between FT 3 seconds and FT 12 seconds when the third alternative was added because the ratio of times spent in the terminal and initial links did not change (for two alternatives, Tt/Ti = 7.5 seconds/30 seconds; for three alternatives, Tt/Ti = 5 seconds/20 seconds). In contrast, HVA predicts that the proportion of responses for FT 3 seconds should increase in the three-alternative condition because the value of the initial links should increase when the third schedule is added, thus increasing the ratio in the right-hand side of Equation 9. DRT makes a similar prediction because the average time to reinforcement from the initial-link onset decreases in the three-alternative condition. Results of Mazur’s experiment showed that preference for the FT 3 seconds versus the FT 12 seconds increased when the third alternative was added, consistent with HVA and DRT but contrary to CCM. These findings are similar to results from previous studies that have examined the effect of adding a third alternative (Fantino & Dunn, 1983). The findings suggest that

The Allocation of Operant Behavior

the average time to reinforcement from the onset of the initial links, not the ratio of terminal-link and initial-link duration, is a critical factor in terms of understanding how temporal context affects choice in concurrent chains, at least for the relatively understudied (unfortunately) situation of threealternative choice. Added initial-link reinforcers. Mazur (2003) noted that HVA and CCM made different predictions for a situation in which reinforcers were added independently of responding during the initial links. Because the added reinforcers do not change the duration of the initial links, CCM predicts that preference for a constant pair of terminal-link schedules should not change. However, because the added reinforcers increase the value of the initial-link stimuli (and decrease the overall time to reinforcement, T), both HVA and DRT predict that preference should increase when the extra reinforcers are provided. Mazur (2003) tested these predictions in an experiment in which the initial link was a single VI 30-second schedule that ensured the terminal-link schedules, FT 3 seconds and FT 12 seconds, were entered equally as often. Test sessions were run in some conditions in which free food deliveries were provided during the initial links according to a VT 30-second schedule. He found that preference for the FT 3-second alternative increased when free reinforcers were added, consistent with HVA and DRT but not CCM. Scoones, Hucks, McLean, and Grace (2009) suggested an alternative explanation for Mazur’s (2003) results, namely that the effect of free food deliveries on responding to the preferred key might have been mitigated by adventitious reinforcement. Free food would be expected to reduce responding to both initial links because it weakens the contingency between responding and reinforcement (Rachlin & Baum, 1972). However, because pigeons were already responding more in the initial link leading to the FT 3-second alternative, when free food was delivered it would be more likely to be temporally contiguous, by accident, with a response to that alternative. Scoones et al. tested this explanation by replicating Mazur’s experiment but including

an additional condition in which a differential reinforcement of other (DRO) contingency was added to the VT schedule during the initial links so that free food deliveries could not occur within 2 seconds of an initial-link response. Results were consistent with the adventitious reinforcement hypothesis: The increase in preference for the richer terminal link that Mazur found was replicated when VT food was added during the initial link, but when the DRO contingency was also included, preference did not change systematically. Analyses of absolute initial-link response rates were also consistent with adventitious reinforcement: Response rates decreased proportionally for both alternatives when the DRO contingency accompanied the free food deliveries but decreased less for the richer schedule in the VT-only condition. Scoones et al.’s (2009) results showed that when adventitious reinforcement was minimized, provision of free food during the initial links did not change preference. These results pose a challenge for both HVA and DRT because the rates of VT reinforcement were similar in their VT-only and VT + DRO conditions. Thus, the value of the initial-link stimuli (or average time to reinforcement from initial-link onset) should have changed to the same extent in both conditions. However, the lack of a change in preference in the VT + DRO condition is consistent with predictions of CCM. Reinforcement context and conditioned value. As noted previously, the core principle of DRT is that stimulus value varies inversely with reinforcement context—a terminal link that signals a given delay will have greater value in a context in which reinforcers are infrequent than in a context in which they are frequent (Fantino, 2001). This principle accords with intuition and is consistent with wellknown phenomena such as behavioral contrast (Williams, 2002) and evidence that the effectiveness of operant and Pavlovian contingencies varies inversely with reinforcement context (Herrnstein, 1970; Rescorla, 1968). However, Grace (1994) noted that the terminallink effect—that preference between a pair of terminal links in constant ratio increases as overall duration increases—violated Weber’s law, which is 321

Grace and Hucks

a typical result in temporal discrimination (e.g., Gibbon, 1977; Grondin, 2001). He suggested that stimulus value—defined as what subjects learn about the terminal-link stimulus–reinforcer relation—was independent of context, but that the behavioral expression of difference in value as preference depended on the relative duration of the terminal and initial links, which is similar to the distinction between learning and performance that is often made in theories of associative learning (e.g., Stout & Miller, 2007). Several studies have attempted to test whether stimulus value in concurrent chains depends on reinforcement context (Grace & Savastano, 1997, 2000). In general, these studies have used transfer designs in which after exposure to terminal-link contingencies in baseline, subjects are given a probe test in which terminal-link stimuli that differed in baseline reinforcement context are compared. Choice in the probes is assumed to give an independent measure of stimulus value. For example, Grace and Savastano (2000) trained pigeons on a twocomponent procedure in which in one component (short), the initial link was a single VI 15-second schedule that ensured equal access to the terminal links, which were VI 10 seconds and VI 20 seconds. The schedules in the other component (long) were obtained by multiplying these by two (i.e., VI 30-second initial link, VI 20-second and VI 40-second terminal links). After baseline training, occasional probe tests were conducted in which terminal-link stimuli from the different components were presented together. Grace and Savastano (1997, 2000) showed that according to DRT, pigeons should prefer the VI 20 second from the short component over the VI 20 second from the long component and be indifferent to the VI 10 second and VI 40 second. However, if terminal-link value was independent of baseline context, then the pigeons should be indifferent to the two VI 20-second stimuli and prefer VI 10 second over VI 40 second. Results showed that probe preference was well predicted by the baseline schedule values independent of context, consistent with CCM but not with DRT. Although Grace and Savastano’s (1997, 2000) data suggested that terminal-link value does not 322

depend on reinforcement context, other studies have obtained conflicting results. O’Daly, Meyer, and Fantino (2005) trained pigeons on a multiplechains schedule in which the terminal link was always an FT 30-second schedule and the initial link was either VI 10 seconds or VI 100 seconds in different components. O’Daly et al. found that when the terminal-link stimuli were later compared in probe tests, pigeons responded more to the terminal-link stimulus associated with the VI 100-second initial link in baseline, consistent with predictions of DRT. Similar results were obtained in a multiple-chains procedure by O’Daly, Angulo, Gipson, and Fantino (2006), who found that pigeons preferred an FR 30-second terminal-link stimulus that was preceded by a VI 100-second initial link over an FR 30-second stimulus that was preceded by a VI 10-second initial link. In addition, O’Daly et al. found that pigeons preferred a nonreinforced stimulus that accompanied the FR 30 second and was preceded by the VI 100-second initial link, thereby demonstrating value transfer (Clement, Feltus, Kaiser, & Zentall, 2000). However, O’Daly et al. found that results from several experiments using the successive-encounters procedure, an operant analogue foraging task, were inconsistent, and only stimuli associated with rich handling schedules (i.e., terminal links) showed effects of reinforcement context. Overall, studies that have used transfer tests to assess the effect of reinforcement context on stimulus value have yielded mixed results. Apparently, whether the terminal-link stimuli appear as part of a concurrent or single chain in baseline has some impact on whether training in a rich versus lean reinforcement context affects probe choice. The issue is not experience with concurrent responding per se, because pigeons in O’Daly et al.’s (2005, 2006) studies that were experimentally naive at the start of the research were given pretraining in a choice situation before the final baseline procedure. A resolution to this discrepancy awaits future research. Summary: Testing models for choice. Overall, results of experiments that have compared predictions of competing models for concurrent chains have yielded mixed results. Some have favored predictions of HVA and DRT over CCM (e.g., Mazur,

The Allocation of Operant Behavior

2000), and others have supported CCM (e.g., Scoones et al., 2009). Evidence from studies that have used transfer methodologies to assess the influence of reinforcement context on stimulus value have also been inconclusive, with some suggesting that reinforcement value is independent of context (Grace & Savastano, 2000) and others suggesting that value depends on context (O’Daly et al., 2005, 2006). Although both HVA and CCM provide a strong quantitative account of results from concurrent-chains studies and the effects of reinforcement context predicted by DRT have substantial generality, at this point it is premature to conclude that any of the extant steady-state models for choice based on the matching law are definitive.

Reinforcement Probability Another variable that researchers have studied is the probability of reinforcement for a terminal link. In these studies, terminal links end in either reinforcement or extinction (i.e., blackout), with a specified probability. Kendall (1974) reported that pigeons responded more in the initial link for a terminal link that ended in reinforcement 50% of the time than in one that provided reinforcement 100% of the time when different stimuli were correlated with the reinforcement and extinction outcomes on the 50% alternative. However, when differential stimuli were not correlated with outcomes on the probabilistic terminal link, pigeons strongly preferred the 100% schedule. Kendall’s (1974) results were provocative because they revealed a suboptimal preference for the alternative providing the lower overall rate of reinforcement. Subsequent studies that attempted to replicate Kendall’s findings under different conditions showed that although an effect of signaling was reliable—that is, pigeons showed a stronger preference for the 100% alternative when differential stimuli were not correlated with the outcomes on the probabilistic terminal link—pigeons did not show a consistent preference for the 50% alternative under well-controlled conditions (e.g., Dunn & Spetch, 1990; Fantino, Dunn, & Meck, 1979; Mazur, 1996; Spetch, Belke, Barnet, Dunn, & Pierce, 1990; Spetch, Mondloch, Belke, & Dunn, 1994). However, recently Gipson, Alessandri, Miller, and Zentall (2009) have shown that when

the reinforcement probability for the more reliable alternative is reduced to 75%, pigeons show a statistically reliable preference for the 50% alternative provided that the outcomes are differentially signaled (see also Stagner & Zentall, 2010). The stronger preference for the 50% alternative when outcomes are signaled has been attributed to the conditioned reinforcement value of the stimulus correlated with reinforcement. The idea is that this stimulus signals a relatively greater local reduction in delay to reinforcement than the 100% stimulus (Dunn & Spetch, 1990) or that its value is enhanced because it is paired with a signal for extinction (Gipson et al., 2009). Evidence in favor of conditioned reinforcement was reported by McDevitt, Spetch, and Dunn (1997). They found that interpolation of a 5-second gap between the stimulus correlated with reinforcement on the 50% terminal link decreased preference for that alternative. However, when a similar gap was imposed on the 100% alternative, there was no systematic effect on preference. These results are consistent with Dunn and Spetch’s (1990) explanation of the signaling effect in terms of delay reduction. However, Mazur (2005, 2007b) found that signaling reinforcement and extinction outcomes did not influence rats’ indifference points for a probabilistic schedule in the adjusting-delay procedure. The implication is that the signaling effect may be limited in terms of generality across species. Most of the research on reinforcer probability and choice has concentrated on the effects of signaling, or whether preference for risky outcomes depends on deprivation or energy budget (Caraco, Martindale, & Whitam, 1980; Pietras, Locey, & Hackenberg, 2003), and relatively few studies have manipulated probability parametrically with other aspects of reinforcement (e.g., immediacy, magnitude) held constant. Recently, Mattson, Hucks, Grace, and McLean (2010) studied four different probability ratios (83%–17%, 17%–83%, 67%–33%, 33%–67%) and three immediacy ratios (FT 10 seconds– FT 20 seconds, FI 20 seconds–FT 10 seconds, FT 15 seconds–FT 15 seconds) in a factorial design, under conditions in which reinforcement and extinction outcomes were either signaled or unsignaled. They found that the concatenated generalized matching 323

Grace and Hucks

law including a term for the probability ratio provided an excellent account of the data. Sensitivity to reinforcer probability was greater in unsignaled than in signaled conditions, as expected on the basis of previous studies. However, sensitivity to immediacy was also greater in unsignaled conditions, suggesting that the effect of signaling reinforcement and extinction outcomes applies not only to reinforcer probability but to the effects of terminal-link reinforcer variables in general.

response allocation adapts when the reinforcement contingencies are changed. The promise of understanding acquisition is that it could give key insights into the processes underlying steady-state choice and eventually provide an explanatory theory that could predict matching and other results in molar behavior allocation. We consider first research that has examined acquisition in simple concurrent schedules and then studies of acquisition in concurrent chains.

Summary

Acquisition in Concurrent Schedules

Studies that have examined the effects of different reinforcer variables—such as delay, rate, magnitude, and probability of reinforcement—on choice have yielded a rich corpus of data that has supported the development of a variety of quantitative models that describe choice at the molar level with a high degree of accuracy. Although these models differ in terms of specific details, such differences should not obscure their common roots in the concatenated matching law. Overall, considerable support exists for the fundamental assumptions of matching: that response allocation in a concurrent-choice situation depends on the relative reinforcement value of the choice alternatives, with value determined as an additive function of different reinforcement variables (Berg & Grace, 2004, but cf. Elliffe et al., 2008). This support testifies to the robustness and adaptability of the concatenated matching law as a framework for understanding the allocation of operant behavior.

Mazur and colleagues reported a series of experiments in the 1990s that established some basic acquisition phenomena in concurrent schedules. These studies used designs in which for each condition the reinforcement probabilities or rates for a concurrent-choice situation with two alternatives were equal for several sessions and then unequal for several sessions. Bailey and Mazur (1990; see also Mazur & Ratti, 1991) tested whether acquisition of preference between two alternatives reinforced with a specified probability (a concurrent random ratio– random ratio schedule) depended on the ratio or difference of the reinforcement probabilities. They found that acquisition was faster with greater ratios, for example, when reinforcement probabilities were .12 and .02 than when they were .40 and .30. Mazur (1992) reported that with their ratio held constant, acquisition was faster with overall higher probabilities and described a linear-operator model that could account for his results. According to this model, the strength of a response increased with reinforcement and decreased with nonreinforcement. Despite the simplicity of this model, Mazur (1992) showed that it predicted his results more accurately than several previous models for acquisition in concurrent schedules, including the kinetic model (Myerson & Miezin, 1980), melioration (Herrnstein & Vaughan, 1980), and the ratioinvariance model (Horner & Staddon, 1987; Staddon & Horner, 1989). Later studies provided evidence of spontaneous recovery in choice behavior (Mazur, 1995, 1996) and showed that for concurrent VI–VI schedules, acquisition was faster when the overall reinforcement rate was greater and when schedules were changed more frequently (Mazur, 1997).

Acquisition of Choice Behavior Because the matching law describes a molar relationship between response allocation and relative reinforcement—that is, responding and reinforcement measured over an extended temporal epoch (Baum, 2003)—most studies on behavioral choice since Herrnstein (1961) have used steady-state designs in which schedules are maintained over many sessions until responding has stabilized. The success of the generalized matching law and other models for choice provides good evidence of the utility of this approach. However, in the past two decades, researchers have paid increasing attention to the acquisition of behavioral choice—how 324

The Allocation of Operant Behavior

A second major line of investigation has been a series of studies by Davison, Baum, and colleagues using a within-session procedure to illuminate the dynamics of choice under concurrent schedules. In their procedure (Davison & Baum, 2000), pigeons are exposed to seven different pairs of concurrent VI–VI schedules per session in which the left:right reinforcer ratios are 1:27, 1:9, 1:3, 1:1, 3:1, 9:1, and 27:1 and the overall reinforcer rate is kept constant. After an intercomponent interval during which the chamber is dark, the side keys are lighted to signal the start of a component, which ends after a fixed number of reinforcers have been earned. Sessions consist of seven components, with the reinforcer ratio for each determined pseudorandomly such that all seven ratios are used. Pigeons typically receive 80 to 100 sessions of training in each condition, with the data from all but the first 15 sessions aggregated for analysis. The Davison–Baum procedure has yielded a rich and varied spectrum of results, including “every reinforcer counts”—that is, preference shifts toward the alternative that produced the last reinforcement (Davison & Baum, 2000), specifically, an increase in preference for the just-reinforced alternative that decreased toward indifference after 20 seconds to 25 seconds (preference pulse; Davison & Baum, 2002). Similar preference pulses were observed in a traditional steady-state design by Landon, Davison, and Elliffe (2002). Within components in the Davison–Baum procedure, sensitivity to the reinforcer ratio increases as more reinforcers are obtained and increases more rapidly when the overall reinforcement rate is relatively high (Davison & Baum, 2000). In addition, successive reinforcers from the same alternative have a smaller effect than the one before (confirmations), whereas a reinforcer that follows a series of reinforcers from the opposite alternative has a relatively large effect (disconfirmations; Davison & Baum, 2000). Other studies have shown that within-session variation in reinforcer magnitude ratios yields similar effects as variation in reinforcer rate ratios (Davison & Baum, 2003), that sensitivity is greater when the range of reinforcer ratios within the session is greater (Landon & Davison, 2001), and that carryover effects from one component to the next decrease as the intercomponent

interval is increased (Davison & Baum, 2002). Landon, Davison, and Elliffe (2003) found that unequal within-session reinforcer distributions gave similar results as equal reinforcer distributions but were biased toward an average sessionwide preference. Krageloh and Davison (2003) showed that sensitivity is greater when the reinforcer ratios are signaled and when a changeover delay is used. There is also evidence for “fix and sample” in the local pattern of visits to each alternative (Baum & Davison, 2004) and for peak shift with signaled component reinforcer ratios in preference prior to the first reinforcer in a component (Krageloh, Elliffe & Davison, 2006). Increasing reinforcer delay has a similar effect on preference pulses as decreasing reinforcer magnitude (Davison & Baum, 2007). Some of these results are similar to those of previous studies—for example, Buckner, Green, and Myerson (1993) showed that reinforcer delivery increased the length of a visit to the alternative, similar to a preference pulse—but the Davison–Baum procedure is unique in terms of the detailed analyses it supports of response allocation at multiple time scales. Baum and Davison (2009) proposed a linearoperator model to account for results from their procedure. In their model, the change in the log response ratio after reinforcement is determined by the difference between the current log response ratio and an asymptote, multiplied by a parameter that indicates how much weight is given to the current reinforcement compared with previous reinforcements. Baum and Davison (2009) fitted their model to log response ratios associated with all fourreinforcer sequences from several conditions in Baum and Davison’s (2004) study and then used the parameter estimates to predict log response ratios for sequences of one to three and five to seven reinforcers. They showed that the model made accurate predictions and that presence or absence of a changeover delay was reflected by the asymptote parameter and change in overall reinforcement rate by the weighting parameter. Baum and Davison (2009) acknowledged that their model was unusual in that it predicted log response allocation directly, rather than response strengths or values for individual alternatives that were then used to predict response allocation, as is typical for linear-operator 325

Grace and Hucks

models for choice (Mazur, 1992). Baum and Davison (2009) remarked that they had thoroughly explored such approaches but found that they did not perform as well as one that predicted response allocation directly. Davison, Baum, and colleagues have used similar procedures to demonstrate that stimuli produced by responding, both primary reinforcers (e.g., food) and conditioned reinforcers (e.g., presentation of a magazine light that accompanies food presentations), can have a discriminative function that may act independently of any response-strengthening function that the stimulus might have. For example, Krägeloh, Davison, and Elliffe (2005) used a procedure in which the arranged reinforcer ratio in each session was always 1:1 but the probability of the next reinforcer being obtained from the same or the other alternative was varied across conditions. As a result, the length of sequences of same-key reinforcers was varied, and the relative reinforcement rate was constant. They found that reinforcers increased the likelihood of responding to the same key, but to a lesser extent when the length of reinforcer sequences was short. The implication is that reinforcers also had a discriminative function—that is, they signaled the location of the upcoming reinforcer—in addition to strengthening responding on the key that produced the reinforcer. Davison and Baum (2006) studied the effects of conditioned reinforcers—presentations of the food magazine light (which accompanied primary reinforcers) by itself and a 2.5-second green keylight. They manipulated the correlation between the magazine light and green keylight presentations and the location of the upcoming primary reinforcer and found that preference pulses after the conditioned reinforcers depended on the correlation with food: When the correlation was 1.0, positive preference pulses resulted, but when the correlation was −1.0, negative preference pulses were obtained. Davison and Baum suggested that conditioned reinforcers did not serve a strengthening function; rather, they signaled the location of upcoming food. On the basis of these results, Davison and Baum argued that a more general, evolutionary conception of reinforcement in terms of a discriminative rather than strengthening function may be appropriate. According to their 326

view, stimuli that predict phylogenetically important events, that is, fitness-enhancing or fitness-reducing events such as food or shock, will “guide behavior into activities that produce fitness-enhancing events and into activities that prevent the fitness-reducing events” (Davison & Baum, 2006, p. 281). At this point, it is unclear whether Davison and Baum’s (2006) proposal will lead to a more comprehensive and satisfactory understanding of reinforcement processes. As they acknowledged, the observation that biologically relevant stimuli such as food can have discriminative as well as reinforcing properties is not new (e.g., Dufort, Gutman, & Kimble, 1954). Moreover, research that has contrasted predictions of matching and reinforcement maximization has found that pigeons’ and rats’ response allocation will deviate from matching if that results in an increase in overall reinforcement rate, provided that the subject is able to discriminate the increase (e.g., Heyman & Tanz, 1995; Williams, 1992). These results led Williams (1992) to remark that, rather than being contradictory accounts of choice, matching and maximizing were competitive processes: Organisms maximize when they can, but match when they cannot. It is also important to note that even if Davison and Baum’s (2006) conception of reinforcement as discrimination is correct, it does not challenge the conclusions of the research, previously discussed, on the matching law and conditioned value. The contribution of matching-based models for choice has been to provide a quantitative measure of the relative value of the reinforcing outcomes signaled by the terminal-link stimuli. Assuming that there are no sequential dependencies between the location of the terminal link associated with reinforcement on one cycle and the next, response allocation in the initial links should provide a measure of relative terminal-link value that is not confounded by effects of reinforcers as discriminative stimuli.

Acquisition in Concurrent Chains Research in the past decade has also examined how initial-link response allocation in concurrent chains adapts when the reinforcement contingencies are altered. Mazur, Blake, and McManus (2001) compared rate of change in pigeons’ preference in

The Allocation of Operant Behavior

concurrent chains when either the initial- or the terminal-link schedules were changed. In some conditions, both terminal-link delays were 12.5 seconds and the initial link that produced the greater proportion of terminal-link entries (80% vs. 20%) was altered. In other conditions, both initial links arranged 50% of the entries, and the terminal-link delays were altered (5 seconds vs. 20 seconds or 2 seconds vs. 18 seconds). Mazur et al. found that preference adapted more rapidly when the proportion of entries arranged by the initial links was changed rather than when the terminal-link delays were changed. Mazur (2002a) studied whether acquisition of initial-link preference depended on overall terminal-link duration. In his experiment, terminal-link delays were equal (1 second or 20 seconds), and the proportion of terminal-link entries arranged by the initial links was changed. Overall, he found no effect of terminal-link duration on acquisition. Analyses of the effects of individual reinforcers showed evidence for preference pulses similar to Davison and Baum (2002): After a response on one key had been reinforced, initiallink choice for that alternative was higher for approximately the next 100 responses. An alternative approach for studying acquisition in concurrent chains was pursued by Grace, Bragason, and McLean (2003). In their Experiment 1, the terminal link for the left alternative was always FI 8 second and the right terminal link was either FI 4 seconds or FI 16 seconds across sessions, as determined by a 31-step pseudorandom binary series (Hunter & Davison, 1985; Schofield & Davison, 1997). Each session consisted of 72 initial- and terminal-link cycles. Grace et al. showed that after sufficient training (approximately two pseudorandom binary series sequences), pigeons’ response allocation was determined by the ratio of terminallink immediacies in the current session and with no significant control by the immediacy ratios from prior sessions. They also found that sensitivity to relative immediacy increased rapidly and reached an asymptote about midway through individual sessions. Grace et al. (2003) termed the control over choice by the current-session immediacy ratio that developed under the pseudorandom binary series design rapid acquisition, because initial-link choice

reached asymptote within individual sessions. Subsequent research by Grace et al. has attempted to model the process by which pigeons’ response allocation adapts in the rapid-acquisition procedure, with the goal of developing a model for acquisition that could eventually account for steady-state performance in concurrent chains as well. Grace et al. (2003) reported a second experiment in which the same pigeons as in Experiment 1 served, with one difference: Instead of two values, a different FI schedule was used for the right terminal link in each session by sampling from an infinite population of values. These values were arranged so that the location of the richer alternative still changed across sessions according to the PRBS. Surprisingly, Grace et al. found that there was no disruption in performance: Sensitivity to immediacy did not decrease compared with levels achieved at the end of Experiment 1, and it did not change over 60 sessions of training in Experiment 2. Moreover, generalized-matching scatterplots showed that for some pigeons, response allocation was nonlinear and fell into two clusters (see Figure 14.7). Grace et al. suggested that a process akin to categorical discrimination—in which response allocation favored whichever terminal link had the shorter delay in an all-or-none fashion—determined choice. Grace and McLean (2006) compared performance in a condition identical to Grace et al.’s (2003) Experiment 1 with one in which a different pair of FI schedule values was used in each session. They reasoned that if choice was determined by the conditioned reinforcement value of the terminallink stimuli, then acquisition should be faster in the former condition (minimal variation, because only one terminal link changed between two values) than the latter condition (maximal variation). However, there was no systematic difference in sensitivity to immediacy; response allocation adapted similarly in the minimal- and maximal-variation conditions. Results from the latter condition showed individual differences in response allocation; for some pigeons, response allocation was approximately linearly related to the log immediacy ratio, whereas for others the relationship was nonlinear. Grace and McLean proposed a model to account for their results. Their model assumed that after reinforcement 327

Grace and Hucks

Beginning of Session

End of Session 1.5

Log Response Ratio

Log Response Ratio

1.5 1 0.5 0 -0.5 -1 -1.5 -2

1

10 Delay

100

1 0.5 0 -0.5 -1 -1.5 -2

1

10 Delay

100

Figure 14.7. Log initial-link response ratio as a function of the fixedinterval schedule associated with the right terminal link (the left schedule was always fixed-interval 8 seconds), separately for the beginning (left panel) and end (right panel) for Pigeon 221 from Grace, Bragason, and McLean’s (2003) Experiment 2. Each data point represents performance from one session. From “Rapid Acquisition of Preference in Concurrent Chains,” by R. C. Grace, O. Bragason, and A. P. McLean, 2003, Journal of the Experimental Analysis of Behavior, 80, p. 248. Copyright 2003 by the Society for the Experimental Analysis of Behavior, Inc. Adapted with permission.

in a terminal link, subjects made a decision about whether the preceding delay was relatively short or long. If the delay was judged to be short, response strength for the corresponding initial link increased, whereas if it was judged to be long, response strength decreased. Changes in response strength were made according to a linear-operator rule. Grace and McLean showed that the model could account for the individual differences in their data depending on a parameter that determined how accurate the decisions were. When decisions were relatively inaccurate, response allocation was approximately linearly related to the log immediacy ratio, but when decisions were relatively accurate, response allocation was a nonlinear (sigmoidal) function of the log immediacy ratio (see Figure 14.8). Subsequently, Christensen and Grace extended the decision model to predict acquisition of preference between FI terminal links when overall initiallink (Christensen & Grace, 2008) and terminal-link (Christensen & Grace, 2009b) duration varied and when terminal-link schedules changed systematically in an ascending and descending series (Christensen & Grace, 2009a). Notably, Christensen and Grace (2008) showed that the decision model predicted a bitonic effect of initial-link duration—with a decrease in preference at very short initial-link 328

durations as well as with longer durations. They confirmed this prediction, which is contrary to models for steady-state choice such as CCM, HVA, and DRT, in their experiment. Recently, Christensen and Grace (2010) proposed a model for steady-state choice based on the decision model. They derived an expression for relative initial-link response strength after sustained training with a pair of terminal-link schedules. The major factor determining response strength was the probability that a terminal-link delay was judged to be short relative to a criterion. According to their model, the criterion is calculated as the average interval between all stimuli correlated with reinforcement in the situation (including both intervals between initial-link onset and terminal-link entry and terminal-link entry and food). The probability that a delay is judged to be short is then calculated from the normal distribution with the mean equal to the criterion and with the standard deviation (σ) as a parameter. The standard deviation σ plays a role similar to sensitivity to delay in the generalized matching law (Equation 4): Small values of σ mean that decisions are relatively accurate and produce stronger levels of preference; large values of σ result in less accurate decisions and lower levels of preference. Christensen and Grace showed that relative response strength could be sub-

The Allocation of Operant Behavior

1

1

y = 1.60x - 0.00 R² = 0.90

0.6

y = 0.72x - 0.00 R² = 1.00

0.4 0.2 0

-0.8

-0.6

-0.4

-0.2

-0.2

0

0.2

0.4

0.6

0.8

-0.4 -0.6

Predicted Log Response Ratio

Predicted Log Response Ratio

0.8

0.6 0.4 0.2 0 -0.8

-0.6

-0.4

0

0.2

0.4

0.6

0.8

-0.8

-1

-1

Log Immediacy Ratio

y = 1.31x - 0.09 R² = 0.85

y = 1.37x - 0.02 R² = 0.87

0.5

Log IL Response Ratio

Log IL Response Ratio

-0.2

-0.6

Log Immediacy Ratio

0

-0.5

-1

-1.5 -0.8

-0.2

-0.4

-0.8

1

0.8

-0.3

0.2

0.7

Log Immediacy Ratio

-0.8

-0.3

0.2

0.7

Log Immediacy Ratio

Figure 14.8. The upper panels show predictions of the decision model when the value of σ is relatively large (σ = 0.30; left panel) or small (σ = 0.075; right panel). When σ is large, the predicted response allocation is approximately linearly related to the log immediacy ratio, but when σ is small, the relation is nonlinear. The lower panels show results from two pigeons in Grace and McLean’s (2006) experiment, in which pigeons chose between two different fixed-interval terminal links each session, which illustrate both linear and nonlinear relations. From “Rapid Acquisition of Preference in Concurrent Chains,” by R. C. Grace, O. Bragason, and A. P. McLean, 2003, Journal of the Experimental Analysis of Behavior, 80, pp. 242 and 245. Copyright 2003 by the Society for the Experimental Analysis of Behavior, Inc. Adapted with permission.

stituted for the terminal-link value ratio in the concatenated generalized matching law, yielding a model for steady-state choice. The steady-state decision model was fitted to the same archival data sets used by Grace (1994) and Mazur (2001). The model performed very well in terms of goodness of fit, with an average percentage of the variance across the studies that was slightly higher than for both CCM and HVA. In addition, the decision model was arguably more parsimonious because it required fewer estimated parameters for

some data sets, and its sensitivity values were more consistent between studies with FI and VI schedules than those of the other models. Christensen and Grace’s (2010) results show that the decision model, originally developed to explain individual differences in responding under rapid-acquisition conditions, yields an account of molar, steady-state choice that is at least as good as existing models. Notably, the model does not rely on the construct of conditioned reinforcement; rather than the terminal-link stimuli acquiring value, the model assumes that 329

Grace and Hucks

iscriminating a terminal-link delay as short (or d long) strengthens (or weakens) responding in the associated initial link. Thus, the decision model is able to accommodate results of studies showing that temporal control of terminal-link responding and initial-link preference can be dissociated (Grace & Nevin, 1999; Kyonka & Grace, 2007), which is problematic for accounts based on conditioned reinforcement. However, we should note that the decision model, as currently formulated, does not give a comprehensive explanation of concurrent-chains performance. For example, the model does not accommodate effects of reinforcer magnitude and probability and terminal-link signaling conditions (Williams & Fantino, 1978) and has not been applied to three-alternative choice (Mazur, 2000). Finally, it is important to note that most models for acquisition of concurrent choice—including Mazur (1992), Grace (2002), Grace and McLean (2006), and Baum and Davison (2009), among others (e.g., Lau & Glimcher, 2005)—are based on a linear-operator approach. Linear-operator models have been extremely popular in research on conditioning and behavior (Bush & Mosteller, 1955; Rescorla & Wagner, 1972) because they are simple; flexible; well-suited for modeling discrete events, such as sequences of responses or reinforcers; and predict smooth, negatively accelerated acquisition curves. Although smooth curves are typical for data that have been averaged over many subjects or trials, Gallistel, Fairhurst, and Balsam (2004) have argued persuasively that for individual subjects and trials, acquisition is better described as an all-or-none process. Indeed, there is evidence that shifts in response allocation under both concurrent schedules and concurrent chains can be abrupt rather than gradual, provided that subjects have sufficient experience with changing reinforcement contingencies (e.g., Gallistel, Mark, King, & Latham, 2001; Kyonka & Grace, 2007). Whether the current generation of linear-operator models can be extended to explain this result is a matter for future research. Conclusion Research on behavioral choice has made considerable progress since Herrnstein’s (1961) report of 330

matching 50 years ago. The matching law has evolved into a framework that can accurately model how response allocation in concurrent choice situations depends on various aspects of reinforcement, constituting a quantitative law of effect. Basic research on behavioral choice has led to developments such as hyperbolic discounting and self-control, which have enhanced researchers’ understanding of human behavior in complex and realistic situations outside the laboratory. Although a definitive explanation of the origins of matching has yet to be achieved, research on acquisition has begun to identify the processes underlying choice. As models for acquisition develop, they may ultimately be able to provide a comprehensive account of steady-state behavior as well, and molar phenomena such as matching and hyperbolic discounting may be explainable in terms of a more fundamental process. In that case, research will have identified the variables controlling the allocation of operant behavior and answered some of the most important and enduring problems in psychology.

References Alsop, B., & Davison, M. (1991). Effects of varying stimulus disparity and the reinforcer ratio in concurrentschedule and signal-detection procedures. Journal of the Experimental Analysis of Behavior, 56, 67–80. doi:10.1901/jeab.1991.56-67 Aparicio, C. F., & Baum, W. M. (2006). Fix and sample with rats in the dynamics of choice. Journal of the Experimental Analysis of Behavior, 86, 43–63. doi:10.1901/jeab.2006.57-05 Autor, S. M. (1969). The strength of conditioned reinforcers as a function of frequency and probability of reinforcement. In D. Hendry (Ed.), Conditioned reinforcement (pp. 127–162). Homewood, IL: Dorsey Press. Bailey, J. T., & Mazur, J. E. (1990). Choice behavior in transition: Development of preference for the higher probability of reinforcement. Journal of the Experimental Analysis of Behavior, 53, 409–422. doi:10.1901/jeab.1990.53-409 Baum, W. M. (1974a). Chained concurrent schedules: Reinforcement as situation transition. Journal of the Experimental Analysis of Behavior, 22, 91–101. doi:10.1901/jeab.1974.22-91 Baum, W. M. (1974b). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242. doi:10.1901/jeab.1974.22-231

The Allocation of Operant Behavior

Baum, W. M. (1979). Matching, undermatching, and overmatching in studies of choice. Journal of the Experimental Analysis of Behavior, 32, 269–281. doi:10.1901/jeab.1979.32-269 Baum, W. M. (1981). Optimization and the matching law as accounts of instrumental behavior. Journal of the Experimental Analysis of Behavior, 36, 387–403. doi:10.1901/jeab.1981.36-387 Baum, W. M. (2003). The molar view of behavior and its usefulness in behavior analysis. Behavior Analyst Today, 4, 78–81. Baum, W. M., & Davison, M. (2004). Choice in a variable environment: Visit patterns in the dynamics of choice. Journal of the Experimental Analysis of Behavior, 81, 85–127. doi:10.1901/jeab.2004.81-85 Baum, W. M., & Davison, M. (2009). Modeling the dynamic of choice. Behavioural Processes, 81, 189–194. doi:10.1016/j.beproc.2009.01.005 Baum, W. M., & Rachlin, H. C. (1969). Choice as time allocation. Journal of the Experimental Analysis of Behavior, 12, 861–874. doi:10.1901/jeab.1969.12-861 Baum, W., Schwendiman, J., & Bell, K. (1999). Choice, contingency discriminability, and foraging theory. Journal of the Experimental Analysis of Behavior, 71, 355–373. doi:10.1901/jeab.1999.71-355 Berg, M. E., & Grace, R. C. (2004). Independence of terminal-link entry rate and immediacy in concurrent chains. Journal of the Experimental Analysis of Behavior, 82, 235–251. doi:10.1901/jeab.2004. 82-235 Buckner, R. L., Green, L., & Myerson, J. (1993). Shortterm and long-term effects of reinforcers on choice. Journal of the Experimental Analysis of Behavior, 59, 293–307. doi:10.1901/jeab.1993.59-293 Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York, NY: Springer-Verlag. Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning. New York, NY: Wiley. Caraco, T., Martindale, S., & Whitham, T. S. (1980). An empirical demonstration of risk-sensitive foraging preferences. Animal Behaviour, 28, 820–830. doi:10.1016/S0003-3472(80)80142-4 Catania, A. C. (1963). Concurrent performances: A baseline for the study of reinforcement magnitude. Journal of the Experimental Analysis of Behavior, 6, 299–300. doi:10.1901/jeab.1963.6-299 Christensen, D. R., & Grace, R. C. (2008). Rapid acquisition in concurrent chains: Effects of initial-link duration. Behavioural Processes, 78, 217–223. doi:10.1016/j.beproc.2008.01.006 Christensen, D. R., & Grace, R. C. (2009a). Response allocation in a rapid-acquisition concurrent-chains

procedure: Effects of overall terminal-link duration. Behavioural Processes, 81, 233–237. doi:10.1016/j. beproc.2009.01.006 Christensen, D. R., & Grace, R. C. (2009b). Response allocation in concurrent chains when terminal-link delays follow an ascending and descending series. Journal of the Experimental Analysis of Behavior, 91, 1–20. doi:10.1901/jeab.2009.91-1 Christensen, D. R., & Grace, R. C. (2010). A decision model for steady-state choice in concurrent chains. Journal of the Experimental Analysis of Behavior, 94, 227–240. doi:10.1901/jeab.2010.94-227 Chung, S.-H. (1965). Effects of delayed reinforcement in a concurrent situation. Journal of the Experimental Analysis of Behavior, 8, 439–444. doi:10.1901/ jeab.1965.8-439 Chung, S.-H., & Herrnstein, R. J. (1967). Choice and delay of reinforcement. Journal of the Experimental Analysis of Behavior, 10, 67–74. doi:10.1901/ jeab.1967.10-67 Clement, T. S., Feltus, J., Kaiser, D. H., & Zentall, T. R. (2000). “Work ethic” in pigeons: Reward value is directly related to the effort or time required to obtain the reward. Psychonomic Bulletin and Review, 7, 100–106. doi:10.3758/BF03210727 Cutting, J. E. (2000). Accuracy, scope, and flexibility of models. Journal of Mathematical Psychology, 44, 3–19. doi:10.1006/jmps.1999.1274 Davis, D. G. S., Staddon, J. E. R., Machado, A., & Palmer, R. G. (1993). The process of recurrent choice. Psychological Review, 100, 320–341. doi:10.1037/ 0033-295X.100.2.320 Davison, M. C. (1969). Preference for mixed-interval versus fixed-interval schedules. Journal of the Experimental Analysis of Behavior, 12, 247–252. doi:10.1901/jeab.1969.12-247 Davison, M. C. (1972). Preference for mixed-interval versus fixed-interval schedules: Number of component intervals. Journal of the Experimental Analysis of Behavior, 17, 169–176. doi:10.1901/jeab.1972.17-169 Davison, M. C. (1987). The analysis of concurrent-chain performance. In M. L. Commons, J. E. Mazur, J. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior: Vol. 5. The effect of delay and of intervening events on reinforcer value (pp. 225–241). Hillsdale, NJ: Erlbaum. Davison, M., & Baum, W. M. (2000). Choice in a variable environment: Every reinforcer counts. Journal of the Experimental Analysis of Behavior, 74, 1–24. doi:10.1901/jeab.2000.74-1 Davison, M., & Baum, W. M. (2002). Choice in a variable environment: Effects of blackout duration and extinction between components. Journal of the Experimental Analysis of Behavior, 77, 65–89. doi:10.1901/jeab.2002.77-65 331

Grace and Hucks

Davison, M., & Baum, W. M. (2003). Every reinforcer counts: Reinforcer magnitude and local preference. Journal of the Experimental Analysis of Behavior, 80, 95–129. doi:10.1901/jeab.2003.80-95 Davison, M., & Baum, W. M. (2006). Do conditional reinforcers count? Journal of the Experimental Analysis of Behavior, 86, 269–283. doi:10.1901/ jeab.2006.56-05 Davison, M., & Baum, W. M. (2007). Local effects of delayed food. Journal of the Experimental Analysis of Behavior, 87, 241–260. doi:10.1901/jeab.2007.13-06 Davison, M., & Elliffe, D. (2009). Variance matters: The shape of a datum. Behavioural Processes, 81, 216–222. doi:10.1016/j.beproc.2009.01.004 Davison, M., & Jenkins, P. E. (1985). Stimulus discriminability, contingency discriminability, and schedule performance. Animal Learning and Behavior, 13, 77–84. doi:10.3758/BF03213368 Davison, M., & Jones, B. M. (1995). A quantitative analysis of extreme choice. Journal of the Experimental Analysis of Behavior, 64, 147–162. doi:10.1901/ jeab.1995.64-147 Davison, M., & McCarthy, D. (1988). The matching law: A research review. Hillsdale, NJ: Erlbaum. Davison, M., & Nevin, J. A. (1999). Stimuli, reinforcers, and behaviour: An integration. Journal of the Experimental Analysis of Behavior, 71, 439–482. doi:10.1901/jeab.1999.71-439 Dufort, R. H., Gutman, N., & Kimble, G. A. (1954). Onetrial discrimination reversal in the white rat. Journal of Comparative and Physiological Psychology, 47, 248–249. doi:10.1037/h0057856 Dunn, R., & Spetch, M. L. (1990). Choice with uncertain outcomes. Journal of the Experimental Analysis of Behavior, 53, 201–218. doi:10.1901/jeab.1990.53-201 Elliffe, D., & Alsop, B. (1996). Concurrent choice: Effects of overall reinforcer rate and the temporal distribution of reinforcers. Journal of the Experimental Analysis of Behavior, 65, 445–463. doi:10.1901/ jeab.1996.65-445 Elliffe, D., Davison, M., & Landon, J. (2008). Relative reinforcer rates and magnitudes do not control concurrent choice independently. Journal of the Experimental Analysis of Behavior, 90, 169–185. doi:10.1901/jeab.2008.90-169 Fantino, E. (1969). Choice and rate of reinforcement. Journal of the Experimental Analysis of Behavior, 12, 723–730. doi:10.1901/jeab.1969.12-723 Fantino, E. (1977). Conditioned reinforcement: Choice and information. In W. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 313–339). New York, NY: Prentice Hall. Fantino, E. (2001). Context: A central concept. Behavioural Processes, 54, 95–110. doi:10.1016/ S0376-6357(01)00152-8 332

Fantino, E., & Abarca, N. (1985). Choice, optimal foraging, and the delay-reduction hypothesis. Behavioral and Brain Sciences, 8, 315–362. doi:10.1017/ S0140525X00020847 Fantino, E., & Dunn, R. (1983). The delay-reduction hypothesis: Extension to three-alternative choice. Journal of Experimental Psychology: Animal Behavior Processes, 9, 132–146. doi:10.1037/00977403.9.2.132 Fantino, E., Dunn, R., & Meck, W. (1979). Percentage reinforcement and choice. Journal of the Experimental Analysis of Behavior, 32, 335–340. doi:10.1901/ jeab.1979.32-335 Fantino, E., Preston, R. A., & Dunn, R. (1993). Delay reduction: Current status. Journal of the Experimental Analysis of Behavior, 60, 159–169. doi:10.1901/ jeab.1993.60-159 Fantino, E., & Romanowich, P. (2007). The effect of conditioned reinforcement rate on choice: A review. Journal of the Experimental Analysis of Behavior, 87, 409–421. doi:10.1901/jeab.2007.44-06 Findley, J. D. (1958). Preference and switching under concurrent scheduling. Journal of the Experimental Analysis of Behavior, 1, 123–144. doi:10.1901/ jeab.1958.1-123 Fisher, W. W., & Mazur, J. E. (1997). Basic and applied research on choice responding. Journal of Applied Behavior Analysis, 30, 387–410. doi:10.1901/ jaba.1997.30-387 Gallistel, C. R., Fairhurst, S., & Balsam, P. (2004). The learning curve: Implications of a quantitative analysis. Proceedings of the National Academy of Sciences of the United States of America, 101, 13124–13131. doi:10.1073/pnas.0404965101 Gallistel, C. R., Mark, T. A., King, A., & Latham, P. E. (2001). The rat approximates an ideal detector of changes in rates of reward: Implications for the law of effect. Journal of Experimental Psychology: Animal Behavior Processes, 27, 354–372. doi:10.1037/00977403.27.4.354 Gibbon, J. (1977). Scalar expectancy theory and Weber’s law in animal timing. Psychological Review, 84, 279–325. doi:10.1037/0033-295X.84.3.279 Gipson, C. D., Alessandri, J. J. D., Miller, H. C., & Zentall, T. R. (2009). Preference for 50% reinforcement over 75% reinforcement by pigeons. Learning and Behavior, 37, 289–298. doi:10.3758/ LB.37.4.289 Grace, R. C. (1993). [Concurrent schedules: Effects of inter-reinforcer interval distributions]. Unpublished experiment. Grace, R. C. (1994). A contextual model of concurrentchains choice. Journal of the Experimental Analysis of Behavior, 61, 113–129. doi:10.1901/jeab.1994.61-113

The Allocation of Operant Behavior

Grace, R. C. (1996). Choice between fixed and variable delays to reinforcement in the adjusting-delay procedure and concurrent chains. Journal of Experimental Psychology: Animal Behavior Processes, 22, 362–383. doi:10.1037/0097-7403.22.3.362 Grace, R. C. (2002). Acquisition of preference in concurrent chains: Comparing linear-operator and memoryrepresentational models. Journal of Experimental Psychology: Animal Behavior Processes, 28, 257–276. doi:10.1037/0097-7403.28.3.257

Herrnstein, R. J. (1964a). Aperiodicity as a factor in choice. Journal of the Experimental Analysis of Behavior, 7, 179–182. doi:10.1901/jeab.1964.7-179 Herrnstein, R. J. (1964b). Secondary reinforcement and rate of primary reinforcement. Journal of the Experimental Analysis of Behavior, 7, 27–36. doi:10.1901/jeab.1964.7-27 Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243–266. doi:10.1901/jeab.1970.13-243

Grace, R. C., & Bragason, O. (2004). Does the terminallink effect depend on duration or reinforcement rate? Behavioural Processes, 67, 67–79. doi:10.1016/j. beproc.2004.02.006

Herrnstein, R. J., & Vaughan, W. (1980). Melioration and behavioural allocation. In J. E. R. Staddon (Ed.), Limits to action (pp. 143–176). New York, NY: Academic Press.

Grace, R. C., Bragason, O., & McLean, A. P. (2003). Rapid acquisition of preference in concurrent chains. Journal of the Experimental Analysis of Behavior, 80, 235–252. doi:10.1901/jeab.2003.80-235

Heyman, G. M., & Tanz, L. (1995). How to teach a pigeon to maximize overall reinforcement rate. Journal of the Experimental Analysis of Behavior, 64, 277–297. doi:10.1901/jeab.1995.64-277

Grace, R. C., & McLean, A. P. (2006). Rapid acquisition in concurrent chains: Evidence for a decision model. Journal of the Experimental Analysis of Behavior, 85, 181–202. doi:10.1901/jeab.2006.72-04

Hinson, J. M., & Staddon, J. E. R. (1983a). Hill-climbing by pigeons. Journal of the Experimental Analysis of Behavior, 39, 25–47. doi:10.1901/jeab.1983.39-25

Grace, R., & Nevin, J. A. (1997). On the relation between preference and resistance to change. Journal of the Experimental Analysis of Behavior, 67, 43–65. doi:10.1901/jeab.1997.67-43 Grace, R. C., & Nevin, J. A. (1999). Timing and choice in concurrent chains. Behavioural Processes, 45, 115–127. doi:10.1016/S0376-6357(99)00013-3 Grace, R., & Savastano, H. I. (1997). Transfer tests of stimulus value in concurrent chains. Journal of the Experimental Analysis of Behavior, 68, 93–115. doi:10.1901/jeab.1997.68-93 Grace, R. C., & Savastano, H. I. (2000). Temporal context and conditioned reinforcement value. Journal of Experimental Psychology: General, 129, 427–443. doi:10.1037/0096-3445.129.4.427 Green, L., & Myerson, J. (2004). A discounting framework for choice with delayed and probabilistic rewards. Psychological Bulletin, 130, 769–792. doi:10.1037/0033-2909.130.5.769 Green, L., & Snyderman, M. (1980). Choice between rewards differing in amount and delay: Toward a choice model of self control. Journal of the Experimental Analysis of Behavior, 34, 135–147. doi:10.1901/jeab.1980.34-135 Grondin, S. (2001). From physical time to the first and second moments of psychological time. Psychological Bulletin, 127, 22–44. doi:10.1037/00332909.127.1.22 Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267–272. doi:10.1901/jeab.1961.4-267

Hinson, J. M., & Staddon, J. E. R. (1983b). Matching, maximizing, and hill-climbing. Journal of the Experimental Analysis of Behavior, 40, 321–331. doi:10.1901/jeab.1983.40-321 Horner, J. M., & Staddon, J. E. R. (1987). Probabilistic choice: A simple invariance. Behavioural Processes, 15, 59–92. doi:10.1016/0376-6357(87)90034-9 Houston, A. I., & McNamara, J. (1981). How to maximize reward rate on two variable-interval paradigms. Journal of the Experimental Analysis of Behavior, 35, 367–396. doi:10.1901/jeab.1981.35-367 Hunter, I., & Davison, M. (1985). Determination of a behavioral transfer function: White-noise analysis of session-to-session response-ratio dynamics on concurrent VI VI schedules. Journal of the Experimental Analysis of Behavior, 43, 43–59. doi:10.1901/jeab. 1985.43-43 Kendall, S. B. (1974). Preference for intermittent reinforcement. Journal of the Experimental Analysis of Behavior, 21, 463–473. doi:10.1901/jeab.1974.21-463 Killeen, P. (1968). On the measurement of reinforcement frequency in the study of preference. Journal of the Experimental Analysis of Behavior, 11, 263–269. doi:10.1901/jeab.1968.11-263 Killeen, P. (1972). The matching law. Journal of the Experimental Analysis of Behavior, 17, 489–495. doi:10.1901/jeab.1972.17-489 Kirby, K. N., & Herrnstein, R. J. (1995). Preference reversals due to myopic discounting of delayed reward. Psychological Science, 6, 83–89. doi:10.1111/ j.1467-9280.1995.tb00311.x Krägeloh, C. U., & Davison, M. (2003). Concurrentschedule performance in transition: Changeover 333

Grace and Hucks

delays and signaled reinforcer ratios. Journal of the Experimental Analysis of Behavior, 79, 87–109. doi:10.1901/jeab.2003.79-87 Krägeloh, C. U., Davison, M., & Elliffe, D. M. (2005). Local preference in concurrent schedules: The effects of reinforcer sequences. Journal of the Experimental Analysis of Behavior, 84, 37–64. doi:10.1901/ jeab.2005.114-04 Krägeloh, C. U., Elliffe, D. M., & Davison, M. (2006). Contingency discriminability and peak shift in concurrent schedules. Journal of the Experimental Analysis of Behavior, 86, 11–30. doi:10.1901/jeab. 2006.11-05 Kyonka, E. G. E., & Grace, R. C. (2007). Rapid acquisition of choice and timing in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 33, 392–408. doi:10.1037/0097-7403.33.4.392 Lagorio, C. H., & Hackenberg, T. D. (2010). Risky choice in pigeons and humans: A cross-species comparison. Journal of the Experimental Analysis of Behavior, 93, 27–44. doi:10.1901/jeab.2010.93-27 Landon, J., & Davison, M. (2001). Reinforcer-ratio variation and its effects on rate of adaptation. Journal of the Experimental Analysis of Behavior, 75, 207–234. doi:10.1901/jeab.2001.75-207 Landon, J., Davison, M., & Elliffe, D. (2002). Concurrent schedules: Short- and log-term effects of reinforcers. Journal of the Experimental Analysis of Behavior, 77, 257–271. doi:10.1901/jeab.2002.77-257 Landon, J., Davison, M., & Elliffe, D. (2003). Concurrent schedules: Reinforcer magnitude effects. Journal of the Experimental Analysis of Behavior, 79, 351–365. doi:10.1901/jeab.2003.79-351 Lau, B., & Glimcher, P. W. (2005). Dynamic responseby-response models of matching behavior in rhesus monkeys. Journal of the Experimental Analysis of Behavior, 84, 555–579. doi:10.1901/jeab.2005.110-04 Leon, M. I., & Gallistel, C. R. (1998). Self-stimulating rats combine subjective reward magnitude and subjective reward rate multiplicatively. Journal of Experimental Psychology: Animal Behavior Processes, 24, 265–277. doi:10.1037/0097-7403.24.3.265 Locey, M. L., Pietras, C. J., & Hackenberg, T. D. (2009). Human risky choice: Delay sensitivity depends on reinforcer type. Journal of Experimental Psychology: Animal Behavior Processes, 35, 15–22. doi:10.1037/ a0012378 Logue, A. W., Rodriguez, M. L., Pena-Correal, T. E., & Mauro, B. C. (1984). Choice in a self-control paradigm: Quantification of experience-based differences. Journal of the Experimental Analysis of Behavior, 41, 53–67. doi:10.1901/jeab.1984.41-53 MacDonall, J. S. (2009). The stay/switch model of concurrent choice. Journal of the Experimental Analysis of Behavior, 91, 21–39. doi:10.1901/jeab.2009.91-21 334

MacEwen, D. (1972). The effects of terminal-link fixedinterval and variable-interval schedules on responding under concurrent chained schedules. Journal of the Experimental Analysis of Behavior, 18, 253–261. doi:10.1901/jeab.1972.18-253 Mattson, K. M., Hucks, A. D., Grace, R. C., & McLean, A. P. (2010). Signaled and unsignaled terminal links in concurrent chains: I. Effects of reinforcer probability and immediacy. Journal of the Experimental Analysis of Behavior, 94, 327–352. Mazur, J. E. (1984). Tests of an equivalence rule for fixed and variable reinforcer delays. Journal of Experimental Psychology: Animal Behavior Processes, 10, 426–436. doi:10.1037/0097-7403.10.4.426 Mazur, J. E. (1985). Probability and delay of reinforcement as factors in a discrete-trial choice. Journal of the Experimental Analysis of Behavior, 43, 341–351. doi:10.1901/jeab.1985.43-341 Mazur, J. E. (1986a). Choice between single and multiple delayed reinforcers. Journal of the Experimental Analysis of Behavior, 46, 67–77. doi:10.1901/ jeab.1986.46-67 Mazur, J. E. (1986b). Fixed and variable ratios and delays: Further tests of an equivalence rule. Journal of Experimental Psychology: Animal Behavior Processes, 12, 116–124. doi:10.1037/0097-7403.12.2.116 Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. E. Mazur, J. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior: Vol. 5. The effect of delay and of intervening events on reinforcement value (pp. 55–73). Hillsdale, NJ: Erlbaum. Mazur, J. E. (1988). Choice between small certain and large uncertain reinforcers. Animal Learning and Behavior, 16, 199–205. doi:10.3758/BF03209066 Mazur, J. E. (1989). Theories of probabilistic reinforcement. Journal of the Experimental Analysis of Behavior, 51, 87–99. doi:10.1901/jeab.1989.51-87 Mazur, J. E. (1992). Choice behavior in transition: Development of preference with ratio and interval schedules. Journal of Experimental Psychology: Animal Behavior Processes, 18, 364–378. doi:10.1037/00977403.18.4.364 Mazur, J. E. (1993). Predicting the strength of a conditioned reinforcer: Effects of delay and uncertainty. Current Directions in Psychological Science, 2, 70–74. doi:10.1111/1467-8721.ep10770907 Mazur, J. E. (1994). Effects of intertribal reinforcers on self-control choice. Journal of the Experimental Analysis of Behavior, 61, 83–96. doi:10.1901/jeab. 1994.61-83 Mazur, J. E. (1995). Conditioned reinforcement and choice with delayed and uncertain primary reinforcers. Journal of the Experimental Analysis of Behavior, 63, 139–150. doi:10.1901/jeab.1995.63-139

The Allocation of Operant Behavior

Mazur, J. E. (1996). Past experience, recency, and spontaneous recovery in choice behavior. Learning and Behavior, 24, 1–10. doi:10.3758/BF03198948 Mazur, J. E. (1997). Choice, delay, probability, and conditioned reinforcement. Learning and Behavior, 25, 131–147. doi:10.3758/BF03199051 Mazur, J. E. (1998). Choice with delayed and probabilistic reinforcers: Effects of prereinforcer and postreinforcer stimuli. Journal of the Experimental Analysis of Behavior, 70, 253–265. doi:10.1901/jeab.1998.70-253 Mazur, J. E. (2000). Two- versus three-alternative concurrent-chain schedules: A test of three models. Journal of Experimental Psychology: Animal Behavior Processes, 26, 286–293. doi:10.1037/00977403.26.3.286 Mazur, J. E. (2001). Hyperbolic value addition and general models of animal choice. Psychological Review, 108, 96–112. doi:10.1037/0033-295X.108.1.96 Mazur, J. E. (2002a). Concurrent-chain performance in transition: Effects of terminal-link duration and individual reinforcers. Learning and Behavior, 30, 249–260. doi:10.3758/BF03192834 Mazur, J. E. (2002b). Evidence against a constantdifference effect in concurrent-chain schedules. Journal of the Experimental Analysis of Behavior, 77, 147–155. doi:10.1901/jeab.2002.77-147 Mazur, J. E. (2003). Effects of free-food deliveries and delays on choice under concurrent-chain schedules. Behavioural Processes, 64, 251–260. doi:10.1016/ S0376-6357(03)00140-2 Mazur, J. E. (2005). Effects of reinforcer probability, delay, and response requirements on the choices of rats and pigeons: Possible species differences. Journal of the Experimental Analysis of Behavior, 83, 263–279. doi:10.1901/jeab.2005.69-04 Mazur, J. E. (2006). Mathematical models and the experimental analysis of behaviour. Journal of the Experimental Analysis of Behavior, 85, 275–291. doi:10.1901/jeab.2006.65-05

schedules. Behavioural Processes, 53, 171–180. doi:10.1016/S0376-6357(01)00137-1 Mazur, J. E., & Coe, D. (1987). Tests of transitivity in choices between fixed and variable reinforcer delays. Journal of the Experimental Analysis of Behavior, 47, 287–297. doi:10.1901/jeab.1987.47-287 Mazur, J. E., & Kralik, J. D. (1990). Choice between delayed reinforcers and fixed-ratio schedules requiring forceful responding. Journal of the Experimental Analysis of Behavior, 53, 175–187. doi:10.1901/ jeab.1990.53-175 Mazur, J. E., & Ratti, T. A. (1991). Choice behavior in transition: Development of preference in a freeoperant procedure. Animal Learning and Behavior, 19, 241–248. doi:10.3758/BF03197882 Mazur, J. E., & Romano, A. (1992). Choice with delayed and probabilistic reinforcers: Effects of variability, time between trials, and conditioned reinforcers. Journal of the Experimental Analysis of Behavior, 58, 513–525. doi:10.1901/jeab.1992.58-513 McDevitt, M., Spetch, M., & Dunn, R. (1997). Contiguity and conditioned reinforcement in probabilistic choice. Journal of the Experimental Analysis of Behavior, 68, 317–327. doi:10.1901/jeab.1997.68-317 Miller, J. T., Saunders, S. S., & Bourland, G. (1980). The role of stimulus disparity in concurrently available reinforcement schedules. Animal Learning and Behavior, 8, 635–641. doi:10.3758/BF03197780 Miller, R. R., & Grace, R. C. (2003). Conditioning and learning. In I. B. Weiner, A. F. Healy, D. K. Freedheim, & R. W. Proctor (Eds.), Handbook of psychology: Experimental psychology (pp. 355–397). New York, NY: Wiley. doi:10.1002/0471264385.wei0413 Moore, J. (1984). Choice and transformed interreinforcement intervals. Journal of the Experimental Analysis of Behavior, 42, 321–335. doi:10.1901/jeab.1984.42-321 Moore, J. (2008). A critical appraisal of contemporary approaches in the quantitative analysis of behavior. Psychological Record, 58, 641–664.

Mazur, J. E. (2007a). Rats’ choices between one and two delayed reinforcers. Learning and Behavior, 35, 169–176. doi:10.3758/BF03193052

Myerson, J., & Green, L. (1995). Discounting of delayed rewards: Models of individual choice. Journal of the Experimental Analysis of Behavior, 64, 263–276. doi:10.1901/jeab.1995.64-263

Mazur, J. E. (2007b). Species differences between rats and pigeons in choices with probabilistic and delayed reinforcers. Behavioural Processes, 75, 220–224. doi:10.1016/j.beproc.2007.02.004

Myerson, J., & Miezin, F. M. (1980). The kinetics of choice: An operant systems analysis. Psychological Review, 87, 160–174. doi:10.1037/0033-295X.87.2.160

Mazur, J. E., & Biondi, D. R. (2009). Delay-amount tradeoffs in choices by pigeons and rats: Hyperbolic versus exponential discounting. Journal of the Experimental Analysis of Behavior, 91, 197–211. doi:10.1901/jeab.2009.91-197 Mazur, J. E., Blake, N., & McManus, C. (2001). Transitional choice behavior in concurrent-chain

Navarick, D. J., & Fantino, E. (1974). Stochastic transitivity and unidimensional behavior theories. Psychological Review, 81, 426–441. doi:10.1037/ h0036953 Navarick, D. J., & Fantino, E. (1976). Self-control and general models of choice. Journal of Experimental Psychology: Animal Behavior Processes, 2, 75–87. doi:10.1037/0097-7403.2.1.75 335

Grace and Hucks

Nevin, J. A. (1998). Choice and momentum. In W. O’Donohue (Ed.), Learning and behavior therapy (pp. 230–251). Boston, MA: Allyn & Bacon. Nevin, J. A., Davison, M., & Shahan, T. A. (2005). A theory of attending and reinforcement in conditional discriminations. Journal of the Experimental Analysis of Behavior, 84, 281–303. doi:10.1901/jeab.2005.97-04 Nevin, J. A., & Grace, R. C. (2000). Behavioral momentum and the law of effect. Behavioral and Brain Sciences, 23, 73–90. doi:10.1017/S0140525 X00002405 O’Daly, M., Angulo, S., Gipson, C., & Fantino, E. (2006). Influence of temporal context on value in the multiple-chains and successive-encounters procedures. Journal of the Experimental Analysis of Behavior, 85, 309–328. doi:10.1901/jeab.2006.68-05 O’Daly, M., Meyer, S., & Fantino, E. (2005). Value of conditioned reinforcers as a function of temporal context. Learning and Motivation, 36, 42–59. doi:10.1016/j.lmot.2004.08.001 Pietras, C. J., Locey, M. L., & Hackenberg, T. D. (2003). Human risky choice under temporal constraints: Tests of an energy-budget model. Journal of the Experimental Analysis of Behavior, 80, 59–75. doi:10.1901/jeab.2003.80-59 Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472–491. doi:10.1037/0033-295X.109.3.472 Rachlin, H. (1995). Self-control: Beyond commitment. Behavioral and Brain Sciences, 18, 109–121. doi:10.1017/ S0140525X00037602 Rachlin, H., & Baum, W. M. (1972). Effects of alternative reinforcement: Does the source matter? Journal of the Experimental Analysis of Behavior, 18, 231–241. doi:10.1901/jeab.1972.18-231 Rachlin, H., & Green, L. (1972). Commitment, choice, and self-control. Journal of the Experimental Analysis of Behavior, 17, 15–22. doi:10.1901/jeab.1972.17-15 Rachlin, H., Green, L., Kagel, J. H., & Battalio, R. C. (1976). Economic demand theory and psychological studies of choice. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 10, pp. 129–154). New York, NY: Academy Press. Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66, 1–5. doi:10.1037/h0025984 Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York, NY: Appleton-Century-Crofts. 336

Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory-testing. Psychological Review, 107, 358–367. doi:10.1037/0033295X.107.2.358 Rodriguez, M. L., & Logue, A. W. (1988). Adjusting delay to reinforcement: Comparing choice in pigeons and humans. Journal of Experimental Psychology: Animal Behavior Processes, 14, 105–117. doi:10.1037/00977403.14.1.105 Rosenthal, R., & DiMatteo, M. R. (2001). Meta-analysis: Recent developments in quantitative methods for literature reviews. Annual Review of Psychology, 52, 59–82. doi:10.1146/annurev.psych.52.1.59 Savastano, H., & Fantino, E. (1996). Differences in delay, not ratios, control choice in concurrent chains. Journal of the Experimental Analysis of Behavior, 66, 97–116. doi:10.1901/jeab.1996.66-97 Schofield, G., & Davison, M. (1997). Nonstable concurrent choice in pigeons. Journal of the Experimental Analysis of Behavior, 68, 219–232. doi:10.1901/ jeab.1997.68-219 Schuster, R. H. (1969). A functional analysis of conditioned reinforcement. In D. P. Hendry (Ed.), Conditioned reinforcement (pp. 192–234). Homewood, IL: Dorsey Press. Scoones, C., Hucks, A., McLean, A. P., & Grace, R. C. (2009). Effects of free food deliveries and temporal contiguity on choice under concurrent-chain schedules. Psychonomic Bulletin and Review, 16, 736–741. doi:10.3758/PBR.16.4.736 Shahan, T. A. (2010). Conditioned reinforcement and response strength. Journal of the Experimental Analysis of Behavior, 93, 269–289. doi:10.1901/ jeab.2010.93-269 Shimp, C. P. (1966). Probabilistically reinforced two-choice behaviour in pigeons. Journal of the Experimental Analysis of Behavior, 9, 443–455. doi:10.1901/jeab. 1966.9-443 Shimp, C. P. (1969). The concurrent reinforcement of two interresponse times: The relative frequency of an interresponse time equals its relative harmonic length. Journal of the Experimental Analysis of Behavior, 12, 403–411. doi:10.1901/jeab.1969.12-403 Spetch, M. L., Belke, T. W., Barnet, R. C., Dunn, R., & Pierce, W. D. (1990). Suboptimal choice in a percentage-reinforcement procedure: Effects of signal condition and terminal-link length. Journal of the Experimental Analysis of Behavior, 53, 219–234. doi:10.1901/jeab.1990.53-219 Spetch, M. L., & Dunn, R. (1987). Choice between reliable and unreliable outcomes: Mixed percentage reinforcement in concurrent chains. Journal of the Experimental Analysis of Behavior, 47, 57–72. doi:10.1901/jeab.1987.47-57

The Allocation of Operant Behavior

Spetch, M. L., Mondloch, M. V., Belke, T. W., & Dunn, R. (1994). Determinants of pigeons’ choice between certain and probabilistic outcomes. Animal Learning and Behavior, 22, 239–251. doi:10.3758/BF03209832 Staddon, J. E. R., & Cerutti, D. T. (2003). Operant conditioning. Annual Review of Psychology, 54, 115–144. doi:10.1146/annurev.psych.54.101601.145124 Staddon, J. E. R., & Horner, J. M. (1989). Stochastic choice models: A comparison between BushMosteller and a source-independent reward-following model. Journal of the Experimental Analysis of Behavior, 52, 57–64. doi:10.1901/jeab.1989.52-57 Staddon, J. E. R., & Motheral, S. (1978). On matching and maximizing in operant choice experiments. Psychological Review, 85, 436–444. doi:10.1037/ 0033-295X.85.5.436 Stagner, J. P., & Zentall, T. R. (2010). Suboptimal choice behavior by pigeons. Psychonomic Bulletin and Review, 17, 412–416. doi:10.3758/PBR.17.3.412 Stout, S. C., & Miller, R. R. (2007). Sometimescompeting retrieval (SOCR): A formalization of the comparator hypothesis. Psychological Review, 114, 759–783. doi:10.1037/0033-295X.114.3.759 Stubbs, D. A., & Pliskoff, S. S. (1969). Concurrent responding with fixed relative rate of reinforcement. Journal of the Experimental Analysis of Behavior, 12, 887–895. doi:10.1901/jeab.1969.12-887 Sutton, N. P., Grace, R. C., McLean, A. P., & Baum, W. M. (2008). Comparing the generalized matching law

and contingency discriminability model as accounts of concurrent schedule performance using a residual meta-analysis. Behavioural Processes, 78, 224–230. doi:10.1016/j.beproc.2008.02.012 Williams, B. A. (1988). Reinforcement, choice, and response strength. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey, & R. D. Luce (Eds.), Stevens’ handbook of experimental psychology: Vol. 2. Learning and cognition (2nd ed., pp. 167–244). New York, NY: Wiley. Williams, B. A. (1992). Dissociation of theories of choice by temporal spacing of choice opportunities. Journal of Experimental Psychology: Animal Behavior Processes, 18, 287–297. doi:10.1037/00977403.18.3.287 Williams, B. A. (1994). The role of probability of reinforcement in models of choice. Psychological Review, 101, 704–707. doi:10.1037/0033-295X.101.4.704 Williams, B. A. (2002). Behavioral contrast redux. Learning and Behavior, 30, 1–20. doi:10.3758/ BF03192905 Williams, B. A., & Fantino, E. (1978). Effects on choice of reinforcement delay and conditioned reinforcement. Journal of the Experimental Analysis of Behavior, 29, 77–86. doi:10.1901/jeab.1978.29-77 Wixted, J. T. (1989). Nonhuman short-term memory: A quantitative reanalysis of selected findings. Journal of the Experimental Analysis of Behavior, 52, 409–426. doi:10.1901/jeab.1989.52-409

337

Chapter 15

Behavioral Neuroscience David W. Schaal

The physiologist of the future will tell us all that can be known about what is happening inside the behaving organism. His account will be an important advance over a behavioral analysis, because the latter is necessarily “historical”—that is to say it is confined to functional relations showing temporal gaps. Something is done today which affects the behavior of an organism tomorrow. No matter how clearly that fact can be established, a step is missing, and we must wait for a physiologist to supply it. He will be able to show how an organism is changed when exposed to contingencies of reinforcement and why the changed organism then behaves in a different way, possibly at a much later date. What he discovers cannot invalidate the laws of a science of behavior, but it will make the picture of human action more nearly complete. (Skinner, 1974, pp. 236–237) Functional relations between an animal’s environment and its behavior are established, maintained, altered, and elaborated by experience because it has a nervous system. Behavioral neuroscience is the investigation of how the nervous system participates in and accounts for functional relations between environment and behavior. It is important for a chapter on behavioral neuroscience to appear in a

handbook for behavior analysts because neuroscience is becoming an increasingly behavioral enterprise, and behavior analysts of the near future will find themselves naturally becoming part of it. This assertion is supported by recent neuroscience research on the process of reinforcement, which I briefly review first. I then present a selective survey of how the neural mechanisms of reinforcement participate in complex operant behavior, followed by examples of the application of behavioral neuroscience to human problems. I conclude with a discussion of the unique conceptual difficulties behavior analysts face when they try to integrate behavioral theory with behavioral neuroscience. A note before I begin: Behavioral neuroscience, as currently practiced, is a vast and extremely varied enterprise. This fact is illustrated by a survey of the most recent (as of this writing) 200 articles published in the Journal of Neuroscience (the official journal of the Society for Neuroscience). Of these articles, 38% included behavioral data of some sort; most of the remainder focused on neurons and glial cells. Humans served as subjects in 35 of these studies, almost half of them using functional magnetic resonance imaging (MRI). Seventeen studies included recording from single neurons in a real-time, behaviorcorrelated fashion in rats or monkeys. An almost bewildering variety of behaviors were studied; details of hearing or seeing (e.g., sound localization, discrimination of complex moving stimuli), movement and muscle control (swimming in tadpoles, zebrafish, and

I thank Jane E. Bailly for helpful discussions during the writing of this chapter and for critical input on early versions. DOI: 10.1037/13937-015 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

339

David W. Schaal

lamprey; rotarod performance in mice; recovery of grip in monkeys), behaviors related to pain (pain threshold using cold pressor test in humans, tail-flick latencies in mice and rats, tumor-induced pain in mice, reactions to someone else’s pain in humans), cognitive behaviors (working memory tasks in rats, monkeys, and humans; naming and semantic decision making in humans; spatial memory tasks in mice; attention in a dichotic listening task in humans), and behavior related to reinforcement value (latencies and choice of saccadic eye movements in monkeys, choice in rats, drug-reinforced behavior in rats, estimates of value in humans). A chapter for behavior analysts cannot review all of what is commonly considered behavioral neuroscience, so I focus on research related to the central concept of the science, reinforcement. Finally, almost two thirds of the articles reporting on behavioral studies included measurements of behavioral and brain variables so that the two could be precisely related to each other in time (e.g., functional MRI, recording from single neurons). Although these time-locked observations are of correlated events (i.e., the dependence of one on the other is often not established), they provide an empirical rationale for selecting brain regions and neurons to manipulate to determine their causal contributions to behavior. In addition, far more sophisticated and restricted approaches to altering the activity of the structures so identified (e.g., interference RNA to inhibit protein synthesis, gene therapy approaches to modulate expression of neuronal and glial proteins, focused electrolytic lesions guided by functional data, cooling probes and specific chemical ligands to temporarily lesion neurons) are available. Advances such as these are generating outcomes that have a high probability of being incorporated into durable, comprehensive theories that integrate neural and behavioral findings. Reinforcement Mechanisms Neural plasticity is at the heart of neural theories of reinforcement-based learning. Neural plasticity is a general term that refers to adaptations in neural structure and activity during development, in response to injury, or, in the context of this chapter, as a consequence of an animal’s experience with its 340

environment. The next several sections serve as a brief review of mechanisms most likely to play a role in the process of the reinforcement.

Long-Term Potentiation and Depression At the neuronal level, the plasticity that is related to behavior change is frequently synaptic plasticity, in which the ability of neurons to alter the firing of neurons they form synapses with is enhanced (or potentiated) or decreased (or depressed). Several mechanisms for long-term changes in neuronal interaction have been proposed; in the sea slug Aplysia californica, for example, classical, or Pavlovian, conditioning of siphon withdrawal has been attributed to facilitation of the sensory neurons that stimulate the motor neurons that cause the conditioned response, a type of facilitation referred to as presynaptic (Byrne & Kandel, 1996). More recent evidence with Aplysia, however, has shown the contribution of a more mammalian, postsynaptic mechanism (Glanzman, 2006). In mammals, the mechanisms of synaptic facilitation have primarily been studied using models of long-term potentiation (LTP) and long-term depression (LTD). LTP was first demonstrated in hippocampal neurons by Bliss and Lomo (1973). In their experiments, a high-frequency train of impulses was delivered to pyramidal cell fibers of the hippocampus. These cell fibers synapse on granule cells of the dentate gyrus of the hippocampus. The result of this repeated, high-frequency stimulation of the pyramidal cells was an increase in the magnitude of the electrical response of the granule cells to pyramidal cell firing, that is, the firing of the granule cells was potentiated. This long-lasting change in neuron excitability reflected an increase in synaptic strength, a change in excitability at the synapse between pyramidal cells and granule cells. LTP was thought to be a potential mechanism for learning and memory, and hundreds of studies have tended to support the claim. In associative LTP, weak and intense electrical stimulation of different neurons that synapse onto the same neuronal group are paired, which causes potentiation of the response to the weak electrical stimulus. This associative phenomenon enhances the face validity of LTP as a model of Pavlovian conditioning, in which the weak stimulus is thought to

Behavioral Neuroscience

model the activity stimulated by a conditioned stimulus (CS), the intense stimulus models the activity stimulated by an unconditioned stimulus, and the potentiated response to the weak stimulus is the surrogate for the conditioned response. Demonstrations that LTP occurs during learning (Roman, Staubli, & Lynch, 1987), that agents that block LTP also block memory formation (Morris, Anderson, Lynch, & Baudry, 1986), and that LTP and LTD are distributed throughout the brain suggest that these forms of plasticity may underlie many learning-related changes in synaptic activity. LTP and LTD likely play a role in several forms of learning. Perhaps the most well-characterized mammalian form of learning, from a neural perspective, is delay conditioning of eyeblinks in mice and rats; LTD at parallel fiber–Purkinje cell synapses of the cerebellum is thought to be important for delay conditioning (Kakegawa et al., 2008), although the extent and nature of its involvement is uncertain (Christian, Poulos, Lavond, & Thompson, 2004; Welsh et al., 2005). Several studies have implicated LTP in the amygdala in the establishment of conditioned emotional responses (e.g., a freezing response to an auditory CS paired repeatedly with shock; Fanselow & Kim, 1994). Injection directly into the lateral amygdala of a drug that blocks a subunit of the N-Methyl-D-aspartate receptor, to which the excitatory neurotransmitter glutamate binds and which is critical to many forms of LTP and LTD, blocked the acquisition of the conditioned emotional response (Rodrigues, Schafe, & LeDoux, 2001). Potentiation of excitatory synapses of the ventral tegmental area accompanied cocaine-reinforced and food-reinforced behavior in rats (Chen et al., 2008); potentiation did not occur in rats that received response-independent cocaine. Moreover, although the potentiation persisted for 3 months in rats that received cocaine reinforcers, it had faded by 21 days in rats receiving food or sucrose reinforcers. The point of this brief survey is that these mechanisms of neural plasticity are ubiquitous and are likely involved in most lasting behavior change wrought by experience.

Reinforcement Signals Mechanisms underlying neural plasticity may be engaged when reinforcing events occur. A consensus

has emerged that the delivery of an unpredicted positive reinforcer causes firing of dopamine- containing cells in the ventral tegmental area, which leads to dopamine release in synapses distributed throughout the brain. Mirenowicz and Schultz (1994) showed reliable phasic firing (i.e., a shortduration pattern of firing that occurs just after an event, as distinct from tonic firing, which is longduration firing not in phase with an event) of ventral tegmental area dopamine neurons in monkeys that received unpredictable, response-independent juice. The same neurons responded to the delivery of the juice during classical conditioning training in which a CS preceded it. However, after conditioning was established, the neurons responded to the CS but not to the juice US. This basic finding (illustrated in Figure 15.1), that neurons are more likely to fire in response to a reinforcer when its delivery is unpredicted, gave rise to the concept of the reinforcement prediction error. Consistent with the Rescorla–Wagner (1972) model, the prediction error hypothesis holds that neurons that detect the delivery of reinforcers do so relative to the predictive value of the prereinforcer context; if a stimulus reliably predicts upcoming reinforcement, that stimulus may increase neuron firing, but delivery of the reinforcer will not. Neural plasticity in this context is evident in the response of ventral tegmental area neurons to CS delivery after conditioning and changes in responsiveness to the predictable and unpredictable juice reward. The prediction error notion is important enough to examine in more detail. A study by Fiorillo, Tobler, and Schultz (2003) revealed the issues nicely. Two monkeys were trained in a Pavlovian conditioning procedure in which different stimuli were followed by juice after a 2-second delay according to different probabilities (ranging from p = 0 to p = 1.0). When training was complete, licking during the 2-second delay (the conditional response to the CS) was an increasing function of the reward probability programmed to each stimulus. Fiorillo et al. simultaneously recorded the activity of dopamine neurons in areas A8 through A10 of the ventral midbrain. As shown in Figure 15.2c, the firing of these neurons was a continuous, decreasing function of reinforcer probability. When the reward 341

David W. Schaal

Task acqusition

Free liquid

10

–0.5

0

0

Sound

Liquid reward

0.8

0 Sound

0

0.8 s

Liquid

Established task

–0.5

–0.5

Free liquid

0 Liquid reward

0.8

–0.5

0 Liquid

0.8 s

Figure 15.1. Responses of dopamine neurons in monkeys to liquid delivery at different periods of task performance. Top: Note the strong responses after liquid delivery during learning of the reaction time task (left) and after free liquid (right). Bottom: Bursts of responses of a neuron after acquisition occur predominantly after the conditioned stimulus that precedes liquid delivery, but not to liquid delivery itself (left). When free liquid, unpredicted by a prior conditioned stimulus, is given again, a burst of responding is observed (right). Both recorded neurons were in group A9 of the ventroanterior midbrain. From “Importance of Unpredictability for Reward Responses in Primate Dopamine Neurons,” by J. Mirenowicz and W. Schultz, 1994, Journal of Neurophysiology, 72, p. 1025. Copyright 1994 by the American Physiology Society. Reprinted with permission.

was delivered after the stimulus that predicted the reward with a probability of 1.0, neuronal firing remained at baseline. However, when a reward was delivered after the stimulus signaling a reward probability of 0.0, then intense neuronal activation ensued. As shown in Figure 15.2d, when juice was predicted but omitted after the stimulus, dopamine neuron firing was suppressed to a greater extent when the probability of reward was higher. Finally, dopamine neuron firing in the 2 seconds after CS presentation was an increasing function of reward probability. Fiorillo et al. concluded, “By always coding prediction error over the full range of probabilities, dopamine neurons could provide a teaching signal in accord with the principles of learning originally described by Rescorla and Wagner” (p. 1901). In a set of experiments using the same basic methodology, Tobler, Fiorillo, and Schultz (2005) showed in monkeys that licking and dopamine neuron 342

a ctivity during the 2-second delay between CS and juice were both an increasing function of reward magnitude. When different magnitudes were presented alone (i.e., not preceded by a CS), dopamine activity was also an increasing function of magnitude. Neuron firing in response to rewards of different magnitudes depended on the magnitude of the predicted reward; in the presence of a stimulus predicting a medium-sized (0.15-milliliter) reward, a larger reward (0.5 milliliter) increased dopamine neuron firing and a smaller (0.05 milliliter) reward decreased it. A strong context effect was also demonstrated; in an experiment in which one CS was presented before either the small- or the medium-magnitude juice reward and a different CS was presented before either the medium or the large reward, the medium reward elicited high rates of firing in the first context and suppressed firing in the second. Finally, the same relative difference in dopamine firing occurred when

Behavioral Neuroscience

Figure 15.2. Dependence of phasic neuron responses on reward probability. A: Raster plots and histograms of single-cell firing to conditioned stimuli and rewards as a function of reward probabilities. C: Neuron firing rate after reward presentation as a function of reward probability. D: Suppression of firing rate owing to reward omission. E: Neuron firing rates during the prereward CS. From “Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons,” by C. D. Fiorillo, P. N. Tobler, and W. Schultz, 2003, Science, 299, p. 1899. Copyright 2003 by the American Association for the Advancement of Science. Reprinted with permission.

343

David W. Schaal

absolute reinforcer magnitudes were varied over a 10-fold range, again showing that neuronal responses were correlated with relative differences in reinforcer magnitude in a given context. Reward- and CS-correlated activity has been detected in the orbitofrontal cortex that varies as a function of the preference between reinforcers as indicated in an animal’s choice (Tremblay & Schultz, 1999). Several other brain areas, such as the amygdala and orbitofrontal cortex (Schultz, Tremblay, & Hollerman, 2000), include neurons that have specific responses to reward and reward-predictive events. Thus, in a wide array of tasks involving different stimuli and stimulus modalities and reinforcer qualities, magnitudes, and probabilities, large and diffusely distributed dopamine teaching signals (the magnitude of which are prediction-error dependent) are available to update and augment the configurations of neuronal groups that participate in learning and behavior. Although the prediction error concept is a powerful one, its generality has limits. For example, the magnitude of the dopamine response in the nucleus accumbens to drug administration may be larger when it is expected (i.e., given in a drug-associated context or when it is self-administered rather than given response independently). Volkow et al. (2003) used functional scanning with 18F-deoxyglucose positron-emitted tomography to study the psychomotor stimulant drug methylphenidate’s effects on active cocaine users. Subjects were scanned on 4 separate days. Before each scan, subjects were told that they would receive either active drug or saline, and then they received either drug or saline. Thus, drug or saline expectation and drug or saline receipt were tested in a 2 × 2 design. Subjects completed visual analog scales to rate levels of high, drug effects, alertness, and restlessness before and at regular time points after injection. On days on which subjects expected and received drugs, self-reports of drug effects were increased (about 50% more high) relative to days when drugs were received after saline instructions. Likewise, activation was greater in the thalamus and cerebellar vermis when drug was expected (methyphenidate–methylphenidate condition in Figure 15.3) than when it was not (saline–methylphenidate condition). Unexpected methylphenidate produced larger metabolic effects 344

Figure 15.3. Enhancement of volume of activation (number of voxels) by methylphenidate in functional magnetic resonance imaging in humans, both when methylphenidate was expected (methylphenidate–methylphenidate [MP/MP] condition) and when it was not (saline–methylphenidate [PL/MP] condition). From “Expectation Enhances the Regional Brain Metabolic and the Reinforcing Effects of Stimulants in Cocaine Abusers,” by N. D. Volkow, G. J. Wang, Y. Ma, J. S. Fowler, W. Zhu, L. Maynard, . . . J. M. Swanson, 2003, Journal of Neuroscience, 23, p. 11465. Copyright 2003 by the Society for Neuroscience. Reprinted with permission.

Behavioral Neuroscience

in the orbitofrontal cortex (bottom panel of Figure 15.3), however, which suggests that prediction errors for drug reinforcers stimulate activity in areas distinct from other reinforcers. Responsiveness to reinforcers is not a fixed property of neurons. Studies have shown that neurons that are activated by reinforcers are not fixed in their sensitivity to reinforcing events (Paton, Belova, Morrison, & Salzman, 2006; Salzman, Paton, Belova, & Morrison, 2007); flexibility to adjust to changing environments and contingencies has been demonstrated. Paton et al. (2006) trained monkeys to lick in anticipation of juice presented 1.5 seconds after one visual stimulus and to blink in anticipation of an airpuff presented after a second, distinct visual stimulus; after a third visual stimulus, nothing happened. They learned these responses easily. Neurons in the amygdala were recorded during these tasks. Several patterns of neuronal responses were obtained; some neurons responded whenever a particular stimulus was presented, regardless of its relation to upcoming events, and some responded differently depending on which event was delivered (juice vs. airpuff). Paton et al. then reversed the relations so that the stimulus that had signaled upcoming juice now signaled upcoming airpuff and vice versa. The responses of many neurons “switched”—that is, after responding differentially to the juice-correlated stimulus, they responded, after a transition, to the airpuff- correlated stimulus. This shift in neuronal responding preceded changes in behavior (licking and blinking) caused by the new stimulus–outcome relations. Thus, flexibility in neuronal responding when contingencies change has been demonstrated, a fact that is relevant to the evolution of larger neuronal systems that participate in behavior in a changing setting (Edelman, 2003). In summary, basic synaptic plasticity mechanisms working in a context in which widely distributed reinforcer-correlated neurotransmitter release occurs allows multiple aspects of the stimulus setting to be connected with multiple dimensions of ongoing behavior to allow acquisition, maintenance, and evolution of useful behavioral repertoires. Not reviewed in this chapter are the volumes of research on the cellular and subcellular mechanisms that

allow synaptic plasticity and event-correlated neural activity (e.g., protein synthesis involved in the longterm maintenance of new learning). The behavioral consequences of such molecular events are vital to the investigation of these cellular processes, and so the interested behavior analyst is certainly in a position to contribute to such cellular-level research. In addition, an enormous amount of empirical and conceptual work is still required to arrive at a conception of these interactive neural units that makes unambiguous contact with the evolving observable activities of intact animals; loosely stated, researchers still do not know how brain events and behavioral events are connected. This fact, however, does not preclude the investigation of neural participants in larger, more complex behavioral contexts that have been of interest to behavior analysts. Neural Participants in Operant Behavior The basic neural model of learning is one in which reinforcement strengthens distributed synaptic connections provided certain criteria are met (e.g., there is a prediction error). Even if that model and its cellular mechanisms were completely elucidated, research would be required to discover how the model helps researchers account for the ongoing behavior of intact animals in diverse contexts. Some of this research is translational, that is, it is stimulated by the desire to discover why people have psychological or behavioral problems and how to overcome them. Other research addresses issues that seem to be pulled straight from the pages of the Journal of the Experimental Analysis of Behavior. I briefly review examples of this research here.

Response Rates on Reinforcement Schedules From a behavior analyst’s perspective, there are two broad approaches to accounting for differential rates of responding on different reinforcement schedules. These approaches may be characterized as molecular, focusing on the likely characteristics of the instances of behavior that produce the reinforcer (Peele, Casey, & Silberberg, 1984), and molar, focusing on the overall correlation between rates of 345

David W. Schaal

behavior and rates of reinforcement (Herrnstein, 1970). The molecular view, it would seem, has not yet made an appearance in behavioral neuroscience (although see Corrado, Sugrue, Seung, & Newsome, 2005, for molecular contributions to shifts in choice behavior when relative reinforcement rates change frequently). Instead, attempts to account for response rates on reinforcement schedules are framed in terms of choice of a response rate on the basis of its expected value, a decidedly molar approach. Two extant models begin with learning based on prediction errors. One is called the actor–critic model. The critic component of the model, thought to be located in the ventral striatum, uses phasic dopamine release caused by prediction errors to update the reward-predictive value of environmental stimuli. The critic is hypothesized to be active in both Pavlovian and operant procedures because both involve valuation. The actor is thought to be localized in the dorsal striatum and also uses an error-prediction signal to modify stimulus–response or stimulus–response–reward associations that guide behavioral choices (O’Doherty et al., 2004). The difference between the two was demonstrated in humans by arranging a Pavlovian procedure that engages the ventral striatal critic component without engaging the dorsal striatal actor, alternating with an operant task (which requires both critic and actor; O’Doherty et al., 2004). On operant trials, humans chose between stimuli correlated with low (0.3) or high (0.6) probability of juice; on other trials, they chose between two stimuli that predicted different probabilities of a tasteless solution, referred to as neutral on the basis of ratings of pleasantness near zero. On Pavlovian trials, there was no choice: Stimulus–juice (or stimulus–tasteless solution) pairings were presented, and subjects simply indicated which stimulus had been presented. Response latencies on the Pavlovian trials were shorter for stimuli that predicted juice than for stimuli that predicted the tasteless solution. On the operant trials, subjects more frequently chose the stimuli that predicted the higher reinforcer probability. As predicted, ventral striatum MRI activity was engaged by both tasks, but only the operant one increased activity in the dorsal striatum. According to the actor–critic model, 346

response rate on a reinforcement schedule would reflect the value of responding represented by the critic and rules for responding based on that value represented by the actor. Niv (2007) proposed a distinct model of response rates on reinforcement schedules. According to this model, an animal responding on a freeoperant reinforcement schedule faces two decisions: which response to make and how fast to make it. Both depend on the expected probability and quality of the reinforcer, or its benefit, and the cost of making the response quickly versus slowly, where a slow response is less effortful but more costly in terms of delaying the next and all future reinforcers (the opportunity cost of time). As in the actor–critic model, phasic inputs from ventral and dorsal striatum update current estimates of reinforcer probability. However, Niv’s model also requires neural activity that is correlated with the opportunity cost of time (which is directly correlated with overall reinforcer rate). Niv proposed that the tonic level of dopamine in basal ganglia and prefrontal areas, which reflects accumulated phasic dopamine inputs from striatal areas, would serve as such a signal. Increases in tonic dopamine level increase the opportunity cost of time and thus promote faster responding; the opposite effect follows reductions in the tonic dopamine level. Tonic dopamine levels may be increased by increasing reinforcement rates, because more rapid phasic firing of striatal dopamine cells leads to accumulations of tonic dopamine in basal ganglia and prefrontal areas, or by artificial means, as when dopamine-enhancing drugs (e.g., amphetamine) are administered. Increasing food deprivation will drive up the net benefit, and hence the opportunity cost of time, and thereby promote more rapid responding. Niv’s model reproduced several aspects of free-operant responding, including response rate differences on variable-ratio and yoked variable-interval schedules and matching on concurrently available schedules. A behavior analyst might object to models of behavior that postulate an inner actor and critic, but these words are entirely superfluous to the explanations of which they are a part. Discarding these terms, the model is just a two-factor theory of response rates on reinforcement schedules (or three

Behavioral Neuroscience

factors in Niv’s 2007 model); each factor corresponds to different aspects (or time frames) of the animal’s experimental experience, and each is tentatively associated with the activities of neurons in different regions of the brain. So, as seems to be the case with all such cognitive terms, once the behavioral relations and brain participants are known, the cognitive terms become mere labels (Schaal, 2003). From a behavior-analytic perspective, what may be of more concern than the cognitive labels is that studies such as these tend to involve a very small range of reinforcement parameters and thereby risk building an elaborate explanatory model of phenomena that may occur under limited circumstances. Behavioral neuroscientists have also differentiated two putatively distinct types of operant (instrumental) learning that lead to reinforcement schedule–maintained behavior, referred to as goaldirected versus stimulus-bound learning (Balleine, Delgado, & Hikosaka, 2007). Goal-directed behavior is sensitive to manipulations of reinforcer value or motivation, whereas stimulus-bound behavior is relatively insensitive to these and is often referred to as habitual. Goal-directed behavior is controlled in the rodent by the dorsomedial striatum and follows prediction–error–correction learning rules; stimulus-bound behavior is controlled by the dorsolateral striatum and results from training well beyond the point at which responding is stable. Stimulus-bound behavior is exemplified by schedule-maintained behavior that is insensitive to changes in reinforcer value or contingency manipulations. If activity of the dorsolateral striatum controls the stimulus-bound behavior, reducing its influence (e.g., by a reversible chemical lesion) should render the behavior more susceptible to changes in reinforcer value or contingency manipulations, that is, make the behavior less stimulus bound and more goal directed. These ideas are illustrated in a series of studies in which groups of rats work for sucrose according to variable-interval schedules, then experience a manipulation intended to reduce the value of the reinforcer (e.g., taste aversion) or degrade the response–reinforcer contingency (e.g., omission training, in which rats must withhold responding to receive a reinforcer; Yin, Knowlton, & Balleine,

2004, 2006). Control rats responded at the same rate the day after taste aversion or omission training as they did the day before; this insensitivity is the hallmark of stimulus-bound behavior under control of the dorsolateral striatum. Rats with chemical lesions of the dorsolateral striatum, however, responded at low rates the next day, thereby demonstrating enhanced control by the dorsomedial striatum. Although the model may account for differential alterations in response rates under some conditions, this model does not make specific predictions about response rates on reinforcement schedules.

Matching Law Groundbreaking work on neural mechanisms underlying matching (Herrnstein, 1961) has been conducted by William Newsome and his graduate students, in which rhesus monkeys match relative rates of saccadic eye movements to relative rate of juice delivery. In one study (Sugrue, Corrado, & Newsome, 2004), monkeys initially fixated their gaze on a cross in the center of the screen (see Figure 15.4A). Color stimuli were then presented to the left and right of the cross (color positions were randomized across trials) while the monkey continued to fixate on the cross. When the cross dimmed, a saccadic response occurred and was reinforced at a maximum overall rate of 0.15 reinforcers per second. Relative reinforcement rates changed unpredictably during each session (ratios from 8:1 to 1:1 were tested). Relative response rate tracked the changed reinforcer ratios very rapidly (see Figure 15.4), suggesting to Sugrue et al. (2004) a local matching rule in which a leaky integrator allows temporally distant reinforcer distributions to drop out and current behavior to reflect the most recent reinforcer distribution. They recorded from neurons in the lateral intraparietal area that were active during movements to one or the other side of the screen. Large lateral intraparietal area responses before choice were correlated with a high probability of reinforcement for that choice, and small lateral intraparietal area responses were correlated with a low probability of reinforcement for that choice. Although Sugrue et al. noted that lateral intraparietal area neuron 347

David W. Schaal

Figure 15.4. A: The sequence of events of an oculomotor matching task. Reward is delivered at the time of the go response if it is scheduled. Overall maximum reinforcer rate was 0.15 per second, and relative reinforcer rate varied unpredictably from the set {8:1, 6:1, 3:1, 1:1}. B: Dynamic, withinsession matching behavior. The gray curve shows cumulative choices of the red and green targets; black lines show the average ratio of reinforcers (red:green) within each block. Approximate matching was obtained. From “Matching Behavior and the Representation of Value in the Parietal Cortex,” by L. P. Sugrue, G. S. Corrado, and W. T. Newsome, 2004, Science, 304, p. 1783. Copyright 2004 by the American Association for the Advancement of Science. Adapted with permission.

activity is more likely responsive to reinforcer rate– correlated activity initiated elsewhere (as opposed to being the neural signal of relative reinforcer value), these findings clearly show brain activity in real time correlated with choice on concurrent reinforcement schedules. Using similar methodology, Lau and Glimcher (2005, 2008) recorded from phasically active neurons in the oculomotor caudate of monkeys. They found neurons whose firing was correlated with upcoming outcomes (action values) and those whose firing was correlated with the outcome just received (chosen values). In fact, 62% of taskrelated phasically active neurons covaried significantly with action-value or chosen-value outcomes. Phasically active neurons’ firing also correlated with more local, trial-by-trial estimates of value. Other researchers (Kalenscher, Diekamp, & Gunturkun, 2003; Padoa-Schioppa & Assad, 2006) have similarly

348

shown that choice is amenable to neurobiological investigation. An interesting benefit of these studies is that when researchers from outside experimental psychology confront issues as familiar to behavior analysts as matching in concurrent schedules, methodological and theoretical issues seem to combine to generate interesting and novel analyses and interpretations of behavior. For example, the linear– nonlinear–probabilistic model of Corrado et al. (2005) fits moment-to-moment choice behavior well and yields model parameters that may be more clearly related to both the behavioral and the neural mechanisms of matching than the molar response and reinforcer rates used to frame the original matching law (Herrnstein, 1961). Similarly, the blocking of dopamine in rats responding for concurrently available reinforcers that differ both in value and in the effort required to

Behavioral Neuroscience

earn them have repeatedly shown that effects are more clearly related to response effort than to reinforcer value (Salamone, Correa, Farrar, Nunes, & Pardo, 2009). For example, Salamone et al. (1991) studied rats whose lever pressing was reinforced on a fixed-ratio-5 schedule, in which every five responses produced preferred pellet reinforcers in a setting in which nonpreferred chow was freely available. Low doses of the dopamine antagonist haloperidol decreased lever pressing and increased feeding on lab chow, without altering overall food intake or preference for the reinforcer pellets in a free-access situation. These and many complementary findings led Salamone et al. (2009) to suggest that dopamine interacts with the response bias aspect of the matching law rather than with reinforcer value. It remains to be seen whether continued investigation of the neuroscience of choice behavior will, as Glimcher and Rustichini (2004) suggested, result in an understanding of value “as a concrete object, a neural signal in the human and animal brain, rather than as a purely theoretical construction” (p. 452). It is exciting, though, to see choice, the matching law, and behavioral economics under active investigation by behavioral neuroscientists.

Delay Discounting The concept of delay discounting has contributed greatly to the understanding of human behavioral disorders such as drug abuse (see reviews by Dalley, Mar, Economidou, & Robbins, 2008; de Wit, 2009; Perry & Carroll, 2008) and gambling (Petry & Madden, 2010), and it has facilitated a quantitative understanding of the value of delayed reinforcers (Mazur, 1987; Mazur & Biondi, 2009). The concept seems to have resonated with behavioral neuroscientists as well. For example, McClure, Ericson, Laibson, Loewenstein, and Cohen (2007) tested a two-part discounting model that consists of an exponential discount function (associated with activity in the prefrontal and parietal cortex, regions implicated in planning and deliberation) and a function that is hypersensitive to immediate reinforcement (associated with activity in the mesolimbic dopamine system, or limbic reward areas). They used

functional MRI to investigate whether different brain areas are active in a choice procedure using juice or water rewards that varied in magnitude (the availability of which were signaled during choice periods) among 34 humans. Choices were observed in three separate delay contexts. In one, immediate reinforcement was tested against 1- or 5-minute delayed reinforcement. This context was compared with two contexts in which the relatively short delay was much longer (10 minutes vs. 11 or 15 minutes in one context and 20 minutes vs. 21 or 25 minutes in another context). Subjects’ preference for the immediate reinforcer over the 1- and 5-minute delayed reinforcer was much greater than their preference for the shorter delayed reinforcer in the other two contexts. Activity in the (limbic reward) system was greatest with all immediate reinforcers; it decayed rapidly as both rewards were delayed. The lateral prefrontal cortex and posterior parietal cortex responded similarly regardless of the duration of the delays. This disjunction between what McClure et al. (2007) called an impatient limbic reward system and the more deliberate, longer term prefrontal and parietal system was revealed in an earlier study by this group using choices involving money (McClure, Laibson, Loewenstein, & Cohen, 2004). This and similar studies have shown that immediate reinforcers might be viewed as qualitatively different from delayed ones, at least in the sense that the limbic system is engaged by the former, which suggests a finding that may alter behavioral models of delay discounting. In the absence of neurobiological data suggesting a qualitatively distinct function of immediate reinforcers, a single model in which reinforcers are discounted very steeply in the short-delay range may appear adequate, which begs the question of whether and to what extent behavioral theorizing will need to be updated to accommodate neurobiological findings. McClure et al.’s (2004, 2007) two-system approach, however, may not account for more of the variance at the behavioral level than a more conventional exponential delay-discounting function. The question of whether behavioral models need to be revised in the light of neurobiological data will, no

349

David W. Schaal

doubt, be faced repeatedly as behavioral neuroscience advances.

Reinforcement of Observing Behavior by Informative Stimuli Observing is another interesting and long-studied behavioral phenomenon recently shown to have clear neurobiological correlates. Bromberg-Martin and Hikosaka (2009) trained two macaques on an eye movement procedure similar to that of Sugrue et al. (2004), but in the context of an observing response procedure (Dinsmoor, Browne, & Lawrence, 1972; Wyckoff, 1952). Monkeys chose between two options at trial onset and after a few seconds received either a large or a small water reinforcer. Choice of one option produced stimuli that were predictive of upcoming reinforcer magnitude, and choice of the other option produced stimuli that were uncorrelated with the upcoming reinforcer; reinforcer probability at the end of the trials was the same whether monkeys chose the informative or the uninformative option. Both monkeys strongly preferred the option that led to informative stimuli. In a second task, monkeys chose between two informative options, but one presented the reinforcer-magnitude signal immediately, whereas the other presented it nearer in time to water delivery. Both monkeys preferred the immediate informative stimulus. Preference also tracked the immediate informative stimulus after a color reversal. Real-time recording from midbrain dopamine neurons showed phasic excitation by the stimulus that preceded a large reward and inhibition by the stimulus that preceded a small reward. When monkeys were, in forced trials, required to view uninformative random stimuli, differential neuronal activity occurred only when the water reinforcers were presented. Moreover, neurons were excited when a stimulus was presented indicating the information would be available on a trial, but not when a stimulus was presented indicating that random stimuli would be available. Thus, the behavioral preference for informative stimuli was correlated with the neural response to opportunities to receive information about upcoming reinforcers. “These observations suggest that the act of prediction has a special 350

status, an intrinsic motivational or rewarding value of its own,” suggested Bromberg-Martin and Hikosaka (p. 122). Given the findings of Dinsmoor et al. (1972), an interesting experiment would examine the behavioral and neural outcomes of contingencies in which only the negative informative stimulus was presented. Translational Behavioral Neuroscience So far I have examined the neural participants in several specific environment–behavior relations that have in common the central role of reinforcement. The reinforcement relation is so ubiquitous in behavior that this collection of behavioral and neural variables is likely to appear in a wide variety of neurobiological investigations. Nevertheless, behavioral neuroscience is a large field with very fuzzy boundaries and with ample opportunities for behavior analysts to make contributions. In this section, I survey just a few such opportunities. In no sense are these brief reviews intended to be exhaustive; I describe them here only to give the reader an idea of what roles behavior analysts may play, in the laboratory and the clinic, in the future development of what may be considered translational behavioral neuroscience.

Hippocampal Neurogenesis New neurons proliferate and differentiate (i.e., neurogenesis occurs) in the adult hippocampus of rats and humans, a fact that distinguishes the hippocampus from most brain areas, in which neurogenesis typically does not occur (see review by Ehninger & Kempermann, 2008). Excitatory granule cells proliferate from the subgranular zone of the hippocampus and populate the dentate gyrus, form synapses, and participate in new learning. A fascinating aspect of hippocampal neurogenesis is its capacity to be regulated by experience; it is enhanced by environmental enrichment (Brown et al., 2003; Kempermann, Gast, & Gage, 2002), physical exercise (van Praag, 2008; van Praag, Kempermann, & Gage, 1999), and certain kinds of learning (Gould, Beylin, Tanapat, Reeves, & Shors, 1999) but is suppressed by stress (Gould & Tanapat, 1999).

Behavioral Neuroscience

In a study by Gould et al. (1999), for example, rats were trained in either delay or trace eyeblink conditioning. Trace, but not delay, conditioning is typically disrupted by hippocampal lesions and causes timed neuron firing and structural changes in dendritic spines in neurons of the hippocampus; thus, trace conditioning is hippocampal dependent and delay conditioning is not. Bromodeoxyuridine (which is taken up by cells undergoing division and can be stained later so that newborn cells can be counted) was administered at the onset of training. A huge increase in the number of new neurons was observed in the granule cell layer of the hippocampus in rats that were trained on trace procedure but not on the hippocampal-independent tasks. Of course, in this case the neurogenesis could not really be implicated in the actual learning—it was caused by it, but neurons were too new, too immature, to be integrated into functioning neural groups that participated in this learning. It is possible, however, that experiences such as trace conditioning, exercise, environmental enrichment, and so forth all enhance hippocampal neurogenesis to prepare animals, in some sense, for future learning. As Kempermann (2002) put it, “Hippocampal informationprocessing capacity is optimized for future memory storage” (p. 247). Regardless of the specific function of hippocampal neurogenesis, it represents a more global sense in which brain structure and function is altered by experience. The relevance of findings such as these to issues faced by aging people, who show decreases in hippocampal neurogenesis (Jessberger & Gage, 2008), and people with progressive diseases of memory such as Alzheimer’s disease (Steiner, Wolf, & Kempermann, 2006) are under investigation.

Drug Abuse Behavior analysis has already proved important to the neuroscience of drug abuse, in which the identification of abused drugs as unconditioned stimuli, reinforcers, and discriminative stimuli has set the stage for neurobiological investigation (Lejuez, Schaal, & O’Donnell, 1998). For example, behavior analysts provided some of the first unambiguous evidence of the reinforcing effects of abused drugs, such as cocaine (Pickens & Thompson, 1968). This

evidence has allowed investigation of processes in the brain that participate in these effects (e.g., dopamine release in the nucleus accumbens coincident with drug-reinforced behavior; Di Chiara, 1995). In an example of a behavior-analytic contribution to research on the neurobiology of drug abuse, a counterintuitive finding helped elucidate the critical role of contingencies of reinforcement in determining a drug’s effect. Hemby, Martin, Co, Dworkin, and Smith (1995) found that extracellular dopamine concentration was increased in the nucleus accumbens of rats when heroin was administered response independently. When rats pressed a lever for response-dependent infusions of heroin, however, nucleus accumbens concentrations of dopamine did not rise, which suggests limits on nucleus accumbens involvement in sustained heroin self-administration. Subsequent research has enhanced the understanding of the functional role of this region in drug taking and relapse. For example, Alvarez- Jaimes, Polis, and Parsons (2008) trained rats to self-administer heroin on a multiple schedule; in one context, lever-pressing led to heroin infusion, and in a distinct context it led to saline infusion (extinction). After discriminative performance was established, heroin-reinforced responding was extinguished in the absence of the multiple- schedule stimuli. Reinstatement (de Wit & Stewart, 1981; Epstein, Preston, Stewart, & Shaham, 2006) was indicated by an increase in lever pressing when the drug context was reintroduced during sessions in which presses did not produce drug. Pretest infusion of a cannabinoid receptor antagonist into the nucleus accumbens (and the prefrontal cortex) dose dependently suppressed reinstatement (cannabinoid receptor antagonists had been shown to interact with multiple drugs of abuse). Current understanding of drug-maintained behavior and drug relapse continues to have central roles for the nucleus accumbens and prefrontal cortex as well as for differential functions of dopamine and the excitatory neurotransmitter glutamate in all phases of drug reinforcement and relapse (Kalivas & Volkow, 2005). Such models are ultimately examined relative to the ability of drugs of abuse to participate in operant or Pavlovian relations and thus support a 351

David W. Schaal

critical role for behavior analysis in the neuro science of drug abuse.

Stroke Stroke is the leading cause of long-term disability and the third leading cause of death in the United States (American Heart Association, 2010). It is caused in most cases when an embolism lodges in a blood vessel of the brain (i.e., most strokes are ischemic strokes). Patients who survive a stroke often experience several neurological disorders the nature and severity of which depend on the extent and location of the stroke damage (Dirnagl, Iadecola, & Moskowitz, 1999). Although volumes of research have been published on the cascade of cellular events initiated by ischemia and leading to stroke damage, and manipulations of some of these events can reduce the size of experimentally induced strokes in the laboratory (e.g., see H. Zhao et al., 2005), none of this work has resulted in therapies to limit stroke damage if the embolism is not quickly eliminated. Thus, both laboratory and clinical research aimed at recovery from stroke are important. Behavioral research on stroke in the laboratory has primarily involved untrained reflexive or exploratory behavior and behavior in simple training conditions. This research can result in interesting insights; for example, C. Zhao, Wang, Zhao, and Nie (2009) studied rats with experimental lesions caused by middle cerebral artery occlusion, which causes a unilateral cortical lesion (and deficits in forelimb function on the contralateral side). Some of the rats were fitted with a plaster cast that prevented use of the ipsilateral limb; thus, these rats received a kind of constraint-induced movement therapy (CIMT; Taub et al., 1993). After 2 weeks of training, rats were tested on a beam-walking task and a watermaze task; they were then anesthetized and their brains were removed to assess changes in neurogenesis and levels of stromal-cell–derived factor, a protein involved in signaling migration of cells toward stroke damage. Stroke caused slipping off the beam in untreated rats relative to rats without stroke; slipping was significantly reduced in limb-restrained rats. Water maze performance also improved in limb-restrained rats, and both stromal cell–derived 352

factor levels and neurogenesis were significantly enhanced relative to nonrestrained rats. Thus, in this animal model an approach to poststroke therapy was found to be effective and to enhance strokeinduced plasticity. CIMT was recently subjected to a randomized controlled trial (Wolf et al., 2006). Patients who had experienced stroke 3 to 9 months earlier received either customary care or a 2-week course of CIMT (which involved wearing a restraining mitt on the less-affected hand and repetitive task practice and behavioral shaping with the hemiplegic hand for as many as 6 hours per day). Structured motor tasks and interviews showed that CIMT significantly improved function, and most of these improvements were maintained at the 12-month follow-up. Thus, stroke is currently viewed as creating both frank neural damage and a neuroplastic environment around the ischemic damage that can be exploited by training experiences. The role of behavioral contingencies in the neurobehavioral interaction arranged by CIMT is widely appreciated. As stated by Taub, Uswatte, and Morris (2003), First . . . CI [constraint-induced] therapy changes the contingencies of reinforcement (provides opportunities for reinforcement of use of the more-affected arm and aversive consequences for its nonuse by training procedures and by constraining the less-affected UE [upper extremity]), so that the learned nonuse of the more-affected UE learned in the acute and early subacute periods is counter-conditioned or lifted. Second, the consequent increase in use, involving sustained and repeated practice of functional arm movements induces expansion of the contralateral cortical area controlling movement of the more-affected UE and recruitment of new ipsilateral areas. This use-dependent cortical reorganization may serve as the neural basis for the permanent increase in use of the affected arm as well as other beneficial effects of CI therapy. (p. S82)

Behavioral Neuroscience

Autism The contribution of behavior analysis to the understanding and treatment of people with autism has, by any measure, been substantial (Lovaas, Koegel, & Schreibman, 1979; McEachin, Smith, & Lovaas, 1993). The connection between autism as a neural disease and autism as a behavioral disorder has been much more difficult to establish. Recently, Thompson (2007) described the neurobiological mechanisms by which intensive early behavior therapy may overcome the behavioral deficits of children with autism (Eldevik et al., 2009; Howlin, Magiati, & Charman, 2009; Schreibman, 2000). Some children exposed to intensive early behavior therapy are unresponsive, and others show significant gains in function (Howlin et al., 2009). Thompson suggested that the relative failure of intensive early behavior therapy in some cases may reflect different degrees or natures of neuronal pathology in these children. Many neuroanatomical differences between brains of people with autism and those of typically developing people have been discovered (for a review, see Bauman & Kemper, 2005). Hutsler and Zhang (2010), for example, recently examined dendritic spine densities on cortical pyramidal cells in brains of deceased people with autism and age-matched control subjects. Elevated spine densities were found in all subjects relative to their age-matched control and were greatest in layer II of the frontal, parietal, and temporal cortical lobes, a layer in which most connectivity is established postnatally. Because experience-dependent culling of cortical dendritic spines occurs during normal development, increased spine densities in people with autism may suggest a failure of this developmental mechanism. These data do not allow one to determine when in the life of the person with autism such structural differences arose or whether and to what extent they played a role in the person’s disability. Nevertheless, they do point to a structural alteration in a key component of cortical neurons that could be involved in social and learning disabilities of people with autism. What Does the Brain Do? The articles reviewed in this chapter mostly concerned correlations between behavior and brain

activity. I described these findings without questioning, in any rigorous way, how the brain events explain the observed behavior. I avoided this because my goal was to describe behavioral neuroscience as it is practiced, so readers could get an idea of the kinds of questions behavioral neuroscientists ask and how they answer them. For a behavior analyst, however, each finding begs the question, “How does this brain event produce this behavior?” More broadly, one may ask, “What does the brain do?” The theoretical language of prominent neuroscientists suggests that they believe that the brain does most of the things that whole animals do. Consider Francis Crick’s (1995) views on the neuroscience of vision: What you see is not what is really there; it is what your brain believes is there. . . . Your brain makes the best interpretation it can according to its previous experience and the limited and ambiguous information provided by your eyes . . . the brain combines the information provided by the many distinct features of the visual scene . . . and settles on the most plausible interpretation of all these various clues taken together. (p. 30) In this quote, the brain believes, combines information, and makes interpretations. In cognitive neuroscience, it is not difficult to find references to brains believing, interpreting, constructing symbolic descriptions, thinking, responding to the content of thought, acquiring and encoding information, storing and retrieving memories, and so on. Animals are thought to learn because the brain learns and to choose because the brain chooses. Bennett and Hacker (2003) referred to this practice of “ascribing psychological attributes to the brain and its parts in order to explain the possession of psychological attributes . . . by human beings” (p. 3) as committing the mereological fallacy. They argued that psychological terms, which make sense when applied to whole animals, usually result in nonsense when applied to the brain. Bechtel (2005) argued that a science of a complex phenomenon is successful when scientists are able to name the component parts and describe the operations of the mechanisms of the phenomenon. 353

David W. Schaal

Until then, they are apt to refer to these mechanistic components using terms that apply to the phenomenon as a whole. He used as an example the history of the science of fermentation. Before the discovery of the operations of the mechanisms of fermentation, that is, additions or deletions of groups of molecules to the carbon backbone (deamination, carboxylation, phosphorylation, etc.), scientists “invoked the vocabulary designed to explain the overall behavior to describe the operation of its components” (Bechtel, 2005, p. 318). For example, they said that a given sugar ferments. When the biochemical compounds and their reactions that make up the mechanisms and operations of fermentation became understood, ferment no longer applied to the operations of the mechanisms that explained fermentation but only to the overall process. Behavioral neuroscientists and behavior analysts are in a similar place. Lacking sufficient knowledge of the brain mechanisms and operations that account for behavior, they extend psychological concepts to the brain level, to which they typically do not apply. It is important for behavior analysts to recognize that this is an error, lest they make the mistake of believing that reinforcement is a process of the nervous system. Processes in the brain make reinforcement possible (distributed release of dopamine from the ventral tegmental area, alteration of synaptic efficacy in coactive neurons, etc.), but it is unlikely that the term reinforcement will apply to any of them. These brain events allow reinforcement to occur (and thereby, in some sense, explain it; Schaal, 2003), but reinforcement will remain a relation between an animal’s behavior and its environment (see Gold, 2009, for a complementary perspective in psychiatry, and Thompson, 2007, for an opposing one in behavior analysis). So what does the brain do? At the beginning of the chapter, I suggested an answer to this question when I defined behavioral neuroscience as “the investigation of the neural processes that participate in and account for functional relations between environment and behavior.” My definition takes as axiomatic the centrality of the functional relation in explanations of behavior and asserts that the job of behavioral neuroscience is to elucidate the

354

mechanisms and processes in the nervous system that allow these functional relations to be realized. This is a sort of mediationistic approach, so at least tentatively I will assert that brains mediate environment–behavior relations. Furthermore, they do so over multiple time scales, from moment to moment to cumulatively over an animal’s lifetime. But mediate is a weak, nonspecific term, and to be satisfied with it as a description of what the brain does is to ignore what I have called the fundamental nestedness of the brain, the rest of the body, and the person in the world, each entity executing processes that overlap and turn back on themselves and each other in time and space. The firing of a neuron in the lateral intraparietal area may be critical to the execution of a choice response that is reflective of recent relative reinforcement rates (see, e.g., Corrado et al., 2005; Lau & Glimcher, 2005), but the individual neuron’s firing only has meaning when it is part of an integrated neuronal circuit . . . the activity of which only has meaning relative to the current environmental-behavioral context . . . which itself only has meaning relative to previously experienced environmental-behavioral contexts . . . a sufficient understanding of how the brain participates in behavior will depend on an ability to refer simultaneously to events at multiple levels of integration and at multiple time frames, including— most importantly from the perspective of behavior analysts—the animal’s history. (Schaal, 2005, pp. 690–691) Behavioral neuroscientists will be closer to succeeding in describing the participation of brain events in behavior when they reject a belief that all behavioral and psychological events emerge from the activity of the nervous system. Psychological events do not happen inside of us; they are emergent in the interaction of intact organisms with their environments. What Alva Noë (2009) recently said

Behavioral Neuroscience

about consciousness applies as well to all psychological phenomena: The brain is not the locus of consciousness inside us because consciousness has no locus inside us. Consciousness isn’t something that happens inside us: it is something that we do, actively, in our dynamic interaction with the world around us. . . . If we want to understand how the brain contributes to consciousness, we need to look at the brain’s job in relation to the larger nonbrain body and the environment in which we find ourselves. (p. 24)

Conclusions In this chapter, I have reviewed a few ways in which “an organism is changed when exposed to contingencies of reinforcement and why the changed organism then behaves in a different way” (Skinner, 1974, p. 237), focusing primarily on experimental studies of reinforcement (or reinforcer presentation) in which behavior and neuronal activity are observed and correlated in real time. The observation of prediction error–bound, phasic neuronal activity elicited by reward has led to compelling ideas about the role of widespread dopamine release providing teaching signals that may alter the responsiveness of neurons in many regions of the brain to the events associated with the reward (Schultz, 2010). Behavior analysts interested in neuroscience may find the focus on the real-time behavior of individual neurons in these studies to be analogous to the focus on the behavior of individual organisms by which our field is defined and may thus be drawn to this research, perhaps even to the point of engaging in it. However, as I have also indicated in this chapter, the field of behavioral neuroscience is broad and deep; behavior analysts may participate by applying their unique practical and philosophical expertise to research on processes ranging from the subcellular to whole-organism levels and in the study of normal functioning as well as injury and disease. I encourage teachers of behavior analysis to

begin introducing students to neuroscience research in their classes; a broader view of behavior analysis will expand their students’ career opportunities as it enhances the depth of their worldviews.

References Alvarez-Jaimes, L., Polis, I., & Parsons, L. H. (2008). Attenuation of cue-induced heroin-seeking behavior by cannabinoid CB1 antagonist infusions into the nucleus accumbens core and prefrontal cortex, but not basolateral amygdala. Neuropsychopharmacology, 33, 2483–2493. doi:10.1038/sj.npp.1301630 American Heart Association. (2010). About stroke. Retrieved from http://www.strokeassociation. org/STROKEORG/AboutStroke/About-Stroke_ UCM_308529_SubHomePage.jsp Balleine, B. W., Delgado, M. R., & Hikosaka, O. (2007). The role of the dorsal striatum in reward and decision-making. Journal of Neuroscience, 27, 8161– 8165. doi:10.1523/JNEUROSCI.1554-07.2007 Bauman, M. L., & Kemper, T. L. (2005). Neuroanatomic observations of the brain in autism: A review and future directions. International Journal of Developmental Neuroscience, 23, 183–187. doi:10.1016/j.ijdevneu.2004.09.006 Bechtel, W. (2005). The challenge of characterizing operations in the mechanisms underlying behavior. Journal of the Experimental Analysis of Behavior, 84, 313–325. doi:10.1901/jeab.2005.103-04 Bennett, M. R., & Hacker, P. M. S. (2003). Philosophical foundations of neuroscience. Malden, MA: Blackwell. Bliss, T. V., & Lomo, T. (1973). Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path. Journal of Physiology, 232, 331–356. Bromberg-Martin, E. S., & Hikosaka, O. (2009). Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron, 63, 119–126. doi:10.1016/j.neuron.2009.06.009 Brown, J., Cooper-Kuhn, C. M., Kempermann, G., Van Praag, H., Winkler, J., Gage, F. H., & Kuhn, H. G. (2003). Enriched environment and physical activity stimulate hippocampal but not olfactory bulb neurogenesis. European Journal of Neuroscience, 17, 2042–2046. doi:10.1046/j.1460-9568.2003.02647.x Byrne, J. H., & Kandel, E. R. (1996). Presynaptic facilitation revisited: State and time dependence. Journal of Neuroscience, 16, 425–435. Chen, B. T., Bowers, M. S., Martin, M., Hopf, F. W., Guillory, A. M., Carelli, R. M., . . . Bonci, A. (2008). Cocaine but not natural reward self-administration nor passive cocaine infusion produces persistent LTP

355

David W. Schaal

in the VTA. Neuron, 59, 288–297. doi:10.1016/ j.neuron.2008.05.024 Christian, K. M., Poulos, A. M., Lavond, D. G., & Thompson, R. F. (2004). Comment on “Cerebellar LTD and learning-dependent timing of conditioned eyelid responses.” Science, 304, 211. doi:10.1126/ science.1093706 Corrado, G. S., Sugrue, L. P., Seung, H. S., & Newsome, W. T. (2005). Linear-nonlinear-Poisson models of primate choice dynamics. Journal of the Experimental Analysis of Behavior, 84, 581–617. doi:10.1901/ jeab.2005.23-05 Crick, F. (1995). The astonishing hypothesis. London, England: Touchstone. Dalley, J. W., Mar, A. C., Economidou, D., & Robbins, T. W. (2008). Neurobehavioral mechanisms of impulsivity: Fronto-striatal systems and functional neurochemistry. Pharmacology, Biochemistry and Behavior, 90, 250–260. doi:10.1016/j.pbb.2007.12.021 de Wit, H. (2009). Impulsivity as a determinant and consequence of drug use: A review of underlying processes. Addiction Biology, 14, 22–31. doi:10.1111/ j.1369-1600.2008.00129.x de Wit, H., & Stewart, J. (1981). Reinstatement of cocaine-reinforced responding in the rat. Psychopharmacology, 75, 134–143. doi:10.1007/ BF00432175 Di Chiara, G. (1995). The role of dopamine in drug abuse viewed from the perspective of its role in motivation. Drug and Alcohol Dependence, 38, 95–137. doi:10.1016/0376-8716(95)01118-I Dinsmoor, J. A., Browne, M. P., & Lawrence, C. E. (1972). A test of the negative discriminative stimulus as a reinforcer of observing. Journal of the Experimental Analysis of Behavior, 18, 79–85. doi:10.1901/jeab.1972.18-79 Dirnagl, U., Iadecola, C., & Moskowitz, M. A. (1999). Pathobiology of ischaemic stroke: An integrated view. Trends in Neurosciences, 22, 391–397. doi:10.1016/S0166-2236(99)01401-0 Edelman, G. M. (2003). Naturalizing consciousness: A theoretical framework. Proceedings of the National Academy of Sciences of the United States of America, 100, 5520–5524. doi:10.1073/pnas.0931349100 Ehninger, D., & Kempermann, G. (2008). Neurogenesis in the adult hippocampus. Cell and Tissue Research, 331, 243–250. doi:10.1007/s00441-007-0478-3 Eldevik, S., Hastings, R. P., Hughes, J. C., Jahr, E., Eikeseth, S., & Cross, S. (2009). Metaanalysis of early intensive behavioral intervention for children with autism. Journal of Clinical Child and Adolescent Psychology, 38, 439–450. doi:10.1080/15374410902851739 356

Epstein, D. H., Preston, K. L., Stewart, J., & Shaham, Y. (2006). Toward a model of drug relapse: An assessment of the validity of the reinstatement procedure. Psychopharmacology, 189, 1–16. doi:10.1007/s00213006-0529-6 Fanselow, M. S., & Kim, J. J. (1994). Acquisition of contextual Pavlovian fear conditioning is blocked by application of an NMDA receptor antagonist D,L-2amino-5-phosphonovaleric acid to the basolateral amygdala. Behavioral Neuroscience, 108, 210–212. doi:10.1037/0735-7044.108.1.210 Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299, 1898– 1902. doi:10.1126/science.1077349 Glanzman, D. L. (2006). The cellular mechanisms of learning in Aplysia: Of blind men and elephants. Biological Bulletin, 210, 271–279. doi:10.2307/4134563 Glimcher, P. W., & Rustichini, A. (2004). Neuroeconomics: The consilience of brain and decision. Science, 306, 447–452. doi:10.1126/ science.1102566 Gold, I. (2009). Reduction in psychiatry. Canadian Journal of Psychiatry/Revue Canadienne de Psychiatrie, 54, 506–512. Gould, E., Beylin, A., Tanapat, P., Reeves, A., & Shors, T. J. (1999). Learning enhances adult neurogenesis in the hippocampal formation. Nature Neuroscience, 2, 260–265. doi:10.1038/6365 Gould, E., & Tanapat, P. (1999). Stress and hippocampal neurogenesis. Biological Psychiatry, 46, 1472–1479. doi:10.1016/S0006-3223(99)00247-4 Hemby, S. E., Martin, T. J., Co, C., Dworkin, S. I., & Smith, J. E. (1995). The effects of intravenous heroin administration on extracellular nucleus accumbens dopamine concentrations as determined by in vivo microdialysis. Journal of Pharmacology and Experimental Therapeutics, 273, 591–598. Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267–272. doi:10.1901/jeab.1961.4-267 Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243–266. doi:10.1901/jeab.1970.13-243 Howlin, P., Magiati, I., & Charman, T. (2009). Systematic review of early intensive behavioral interventions for children with autism. American Journal on Intellectual and Developmental Disabilities, 114, 23–41. Hutsler, J. J., & Zhang, H. (2010). Increased dendritic spine densities on cortical projection neurons in autism spectrum disorders. Brain Research, 1309, 83–94.

Behavioral Neuroscience

Jessberger, S., & Gage, F. H. (2008). Stem-cell-associated structural and functional plasticity in the aging hippocampus. Psychology and Aging, 23, 684–691. doi:10.1037/a0014188 Kakegawa, W., Miyazaki, T., Emi, K., Matsuda, K., Kohda, K., Motohashi, J., . . . Yuzaki, M. (2008). Differential regulation of synaptic plasticity and cerebellar motor learning by the C-terminal PDZ-binding motif of GluRdelta2. Journal of Neuroscience, 28, 1460–1468. doi:10.1523/JNEUROSCI.2553-07.2008 Kalenscher, T., Diekamp, B., & Gunturkun, O. (2003). Neural architecture of choice behaviour in a concurrent interval schedule. European Journal of Neuroscience, 18, 2627–2637. doi:10.1046/j.14609568.2003.03006.x Kalivas, P. W., & Volkow, N. D. (2005). The neural basis of addiction: A pathology of motivation and choice. American Journal of Psychiatry, 162, 1403–1413. doi:10.1176/appi.ajp.162.8.1403 Kempermann, G. (2002). Why new neurons? Possible functions for adult hippocampal neurogenesis. Journal of Neuroscience, 22, 635–638. Kempermann, G., Gast, D., & Gage, F. H. (2002). Neuroplasticity in old age: Sustained fivefold induction of hippocampal neurogenesis by long-term environmental enrichment. Annals of Neurology, 52, 135–143. doi:10.1002/ana.10262 Lau, B., & Glimcher, P. W. (2005). Dynamic responseby-response models of matching behavior in rhesus monkeys. Journal of the Experimental Analysis of Behavior, 84, 555–579. doi:10.1901/jeab.2005.110-04 Lau, B., & Glimcher, P. W. (2008). Value representations in the primate striatum during matching behavior. Neuron, 58, 451–463. doi:10.1016/ j.neuron.2008.02.021 Lejuez, C. W., Schaal, D. W., & O’Donnell, J. (1998). Behavioral pharmacology and the treatment of substance abuse. In J. J. Plaud & G. H. Eifert (Eds.), From behavior theory to behavior therapy (pp. 116–135). Needham Heights, MA: Allyn & Bacon.

Experimental Analysis of Behavior, 91, 197–211. doi:10.1901/jeab.2009.91-197 McClure, S. M., Ericson, K. M., Laibson, D. I., Loewenstein, G., & Cohen, J. D. (2007). Time discounting for primary rewards. Journal of Neuroscience, 27, 5796–5804. doi:10.1523/ JNEUROSCI.4246-06.2007 McClure, S. M., Laibson, D. I., Loewenstein, G., & Cohen, J. D. (2004). Separate neural systems value immediate and delayed monetary rewards. Science, 306, 503–507. doi:10.1126/science.1100907 McEachin, J. J., Smith, T., & Lovaas, O. I. (1993). Longterm outcome for children with autism who received early intensive behavioral treatment. American Journal on Mental Retardation, 97, 359–372. Mirenowicz, J., & Schultz, W. (1994). Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology, 72, 1024–1027. Morris, R. G., Anderson, E., Lynch, G. S., & Baudry, M. (1986). Selective impairment of learning and blockade of long-term potentiation by an N-methylD-aspartate receptor antagonist, AP5. Nature, 319, 774–776. doi:10.1038/319774a0 Niv, Y. (2007). Cost, benefit, tonic, phasic: What do response rates tell us about dopamine and motivation? Annals of the New York Academy of Sciences, 1104, 357–376. doi:10.1196/annals.1390.018 Noë, A. (2009). Out of our heads: Why you are not your brain, and other lessons from the biology of consciousness. New York, NY: Hill & Wang. O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304, 452–454. doi:10.1126/science.1094285 Padoa-Schioppa, C., & Assad, J. A. (2006). Neurons in the orbitofrontal cortex encode economic value. Nature, 441, 223–226. doi:10.1038/nature04676

Lovaas, O. I., Koegel, R. L., & Schreibman, L. (1979). Stimulus overselectivity in autism: A review of research. Psychological Bulletin, 86, 1236–1254. doi:10.1037/0033-2909.86.6.1236

Paton, J. J., Belova, M. A., Morrison, S. E., & Salzman, C. D. (2006). The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature, 439, 865–870. doi:10.1038/ nature04490

Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. E. Mazur, J. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior: Vol. 5. The effect of delay and of intervening events on reinforcement value (pp. 55–73). Hillsdale, NJ: Erlbaum.

Peele, D. B., Casey, J., & Silberberg, A. (1984). Primacy of interresponse-time reinforcement in accounting for rate differences under variable-ratio and variable-interval schedules. Journal of Experimental Psychology: Animal Behavior Processes, 10, 149–167. doi:10.1037/0097-7403.10.2.149

Mazur, J. E., & Biondi, D. R. (2009). Delay-amount tradeoffs in choices by pigeons and rats: Hyperbolic versus exponential discounting. Journal of the

Perry, J. L., & Carroll, M. E. (2008). The role of impulsive behavior in drug abuse. Psychopharmacology, 200, 1–26. doi:10.1007/s00213-008-1173-0 357

David W. Schaal

Petry, N. M., & Madden, G. J. (2010). Discounting and pathological gambling. In G. J. Madden & W. K. Bickel (Eds.), Impulsivity: The behavioral and neurological science of discounting (pp. 273– 294). Washington, DC: American Psychological Association. doi:10.1037/12069-010 Pickens, R., & Thompson, T. (1968). Cocaine-reinforced behavior in rats: Effects of reinforcement magnitude and fixed-ratio size. Journal of Pharmacology and Experimental Therapeutics, 161, 122–129. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning: Vol. 2. Current research and theory (pp. 64–99). New York, NY: Appleton-Century-Crofts. Rodrigues, S. M., Schafe, G. E., & LeDoux, J. E. (2001). Intra-amygdala blockade of the NR2B subunit of the NMDA receptor disrupts the acquisition but not the expression of fear conditioning. Journal of Neuroscience, 21, 6889–6896. Roman, F., Staubli, U., & Lynch, G. (1987). Evidence for synaptic potentiation in a cortical network during learning. Brain Research, 418, 221–226. doi:10.1016/0006-8993(87)90089-8 Salamone, J. D., Correa, M., Farrar, A. M., Nunes, E. J., & Pardo, M. (2009). Dopamine, behavioral economics, and effort. Frontiers in Behavioral Neuroscience, 3, 13. doi:10.3389/neuro.08.013.2009 Salamone, J. D., Steinpreis, R. E., McCullough, L. D., Smith, P., Grebel, D., & Mahan, K. (1991). Haloperidol and nucleus accumbens dopamine depletion suppress lever pressing for food but increase free food consumption in a novel food choice procedure. Psychopharmacology, 104, 515–521. doi:10.1007/BF02245659 Salzman, C. D., Paton, J. J., Belova, M. A., & Morrison, S. E. (2007). Flexible neural representations of value in the primate brain. Annals of the New York Academy of Sciences, 1121, 336–354. doi:10.1196/ annals.1401.034 Schaal, D. W. (2003). Explanatory reductionism in behavior analysis. In K. A. Lattal & P. N. Chase (Eds.), Behavior theory and philosophy (pp. 83–102). New York, NY: Kluwer Academic. Schaal, D. W. (2005). Naming our concerns about neuro science: A review of Bennett and Hacker’s philosophical foundations of neuroscience. Journal of the Experimental Analysis of Behavior, 84, 683–692. doi:10.1901/jeab.2005.83-05 Schreibman, L. (2000). Intensive behavioral/ psychoeducational treatments for autism: Research needs and future directions. Journal of Autism and Developmental Disorders, 30, 373–378. doi:10.1023/A:1005535120023 358

Schultz, W. (2010). Dopamine signals for reward value and risk: Basic and recent data. Behavioral and Brain Functions, 6, 24. doi:10.1186/1744-9081-6-24 Schultz, W., Tremblay, L., & Hollerman, J. R. (2000). Reward processing in primate orbitofrontal cortex and basal ganglia. Cerebral Cortex, 10, 272–283. doi:10.1093/cercor/10.3.272 Skinner, B. F. (1974). About behaviorism. New York, NY: Knopf. Steiner, B., Wolf, S., & Kempermann, G. (2006). Adult neurogenesis and neurodegenerative disease. Regenerative Medicine, 1, 15–28. doi:10.2217/17460751.1.1.15 Sugrue, L. P., Corrado, G. S., & Newsome, W. T. (2004). Matching behavior and the representation of value in the parietal cortex. Science, 304, 1782–1787. doi:10.1126/science.1094765 Taub, E., Miller, N. E., Novack, T. A., Cook, E. W., III, Fleming, W. C., Nepomuceno, C. S., . . . Crago, J. E. (1993). Technique to improve chronic motor deficit after stroke. Archives of Physical Medicine and Rehabilitation, 74, 347–354. Taub, E., Uswatte, G., & Morris, D. M. (2003). Improved motor recovery after stroke and massive cortical reorganization following constraint-induced movement therapy. Physical Medicine and Rehabilitation Clinics of North America, 14(1, Suppl.), S77–S91. doi:10.1016/S1047-9651(02)00052-9 Thompson, T. (2007). Relations among functional systems in behavior analysis. Journal of the Experimental Analysis of Behavior, 87, 423–440. doi:10.1901/ jeab.2007.21-06 Tobler, P. N., Fiorillo, C. D., & Schultz, W. (2005). Adaptive coding of reward value by dopamine neurons. Science, 307, 1642–1645. doi:10.1126/ science.1105370 Tremblay, L., & Schultz, W. (1999). Relative reward preference in primate orbitofrontal cortex. Nature, 398, 704–708. doi:10.1038/19525 van Praag, H. (2008). Neurogenesis and exercise: Past and future directions. NeuroMolecular Medicine, 10, 128–140. doi:10.1007/s12017-008-8028-z van Praag, H., Kempermann, G., & Gage, F. H. (1999). Running increases cell proliferation and neurogenesis in the adult mouse dentate gyrus. Nature Neuroscience, 2, 266–270. doi:10.1038/6368 Volkow, N. D., Wang, G. J., Ma, Y., Fowler, J. S., Zhu, W., Maynard, L., . . . Swanson, J. M. (2003). Expectation enhances the regional brain metabolic and the reinforcing effects of stimulants in cocaine abusers. Journal of Neuroscience, 23, 11461–11468. Welsh, J. P., Yamaguchi, H., Zeng, X. H., Kojo, M., Nakada, Y., Takagi, A., . . . Llinás, R. R. (2005).

Behavioral Neuroscience

Normal motor learning during pharmacological prevention of Purkinje cell long-term depression. Proceedings of the National Academy of Sciences of the United States of America, 102, 17166–17171. doi:10.1073/pnas.0508191102 Wolf, S. L., Winstein, C. J., Miller, J. P., Taub, E., Uswatte, G., Morris, D., . . . Nichols-Larsen, D.; EXCITE Investigators. (2006). Effect of constraintinduced movement therapy on upper extremity function 3 to 9 months after stroke: The EXCITE randomized clinical trial. JAMA, 296, 2095–2104. doi:10.1001/jama.296.17.2095 Wyckoff, L. B., Jr. (1952). The role of observing responses in discrimination learning. Psychological Review, 59, 431–442. doi:10.1037/h0053932 Yin, H. H., Knowlton, B. J., & Balleine, B. W. (2004). Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental

learning. European Journal of Neuroscience, 19, 181– 189. doi:10.1111/j.1460-9568.2004.03095.x Yin, H. H., Knowlton, B. J., & Balleine, B. W. (2006). Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning. Behavioural Brain Research, 166, 189–196. doi:10.1016/j. bbr.2005.07.012 Zhao, C., Wang, J., Zhao, S., & Nie, Y. (2009). Constraint-induced movement therapy enhanced neurogenesis and behavioral recovery after stroke in adult rats. Tohoku Journal of Experimental Medicine, 218, 301–308. doi:10.1620/tjem.218.301 Zhao, H., Shimohata, T., Wang, J. Q., Sun, G., Schaal, D. W., Sapolsky, R. M., & Steinberg, G. K. (2005). Akt contributes to neuroprotection by hypothermia against cerebral ischemia in rats. Journal of Neuroscience, 25, 9794–9806. doi:10.1523/ JNEUROSCI.3163-05.2005

359

Chapter 16

Stimulus Control and Stimulus Class Formation Peter J. Urcuioli

The significance of this chapter’s topic can perhaps best be appreciated by reading various comments made as recently as within the past 20 years and as far back as 70 years ago by some prominent psychologists: A classic problem in the analysis of behavior is to provide an account of how physically dissimilar stimuli can have similar and apparently interrelated effects on behavior. (McIlvane, 1992, pp. 76–77) Understanding how organisms come to form categories is central to understanding complex human behavior in general and language in particular. (Horne & Lowe, 1997, p. 278) The problem of stimulus equivalence is a fundamental one . . . for any psychology purporting to deal in a thorough-going manner with adaptive behavior. . . . How can we account for the fact that a stimulus will sometimes evoke a reaction . . . with which it has never been associated? (Hull, 1939, p. 9) Clark Hull (1939), an early influential behaviorist, proposed two answers for the question he posed. One was primary stimulus generalization. Through this process, reactions explicitly learned to particular objects or stimuli also occur to some extent to new objects or new stimuli perceptually similar to those of training. This well-known and thoroughly researched process (Honig & Urcuioli, 1981) is certainly one contributing factor, but it does not adequately explain “similar and apparently interrelated

effects on behavior” by physically dissimilar stimuli (cf. McIlvane, 1992; see also Hall, Mitchell, Graham, & Lavis, 2003; Jenkins, 1963; Urcuioli, 1996). The distinction can be appreciated by means of a simple example. Chairs differ from one another in various ways—shape, color, material, presence versus absence of arm rests, and so forth—but all share a sufficient number of physically similar features such that new chairs people have never before encountered can evoke previously learned reactions (e.g., saying “chair” or sitting on one; cf. Mackintosh, 2000). By contrast, consider a chair, a lamp, an end table, a bed, and a bureau. Despite their physical dissimilarity, these are all easily recognized as examples of furniture and as having interrelated effects on behavior (e.g., where people go to buy them, where people place them). People’s common reactions to them go beyond any sort of physical resemblances, reflecting instead what they have learned about them (e.g., that they have the same category name or that they can be found inside houses). Hull (1939) was keenly aware of this and proposed secondary stimulus generalization as a mechanism to account for how classes of disparate objects or stimuli could develop. In essence, he said that explicitly learning the same response to a set of disparate stimuli would generate an equivalence between them such that new behavior learned to a subset of the stimulus class would immediately generalize to the remaining class members (see, e.g., Lowe, Horne, & Hughes, 2005; Spradlin, Cotter, & Baxley, 1973; Urcuioli & Lionello-DeNolf, 2001).

DOI: 10.1037/13937-016 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

361

Peter J. Urcuioli

Another quotation might be helpful at this point to set the stage for what is to follow: “Class formation . . . may be the product of any procedure that serves to partition a set of stimuli into subsets of stimuli that are substitutable for one another in certain contexts” (Saunders & Green, 1992, p. 319). This remark is applicable both to stimulus classes whose partitioning reflects shared physical features, such as people versus flowers versus cars versus chairs (Bhatt, Wasserman, Reynolds, & Knauss, 1988), and to stimulus classes whose partitioning reflects common learned associations (e.g., Vaughan, 1988; Vickery & Jiang, 2009). Furthermore, it underscores an important way to identify potential stimulus classes after training or other explicit learning experiences—namely, showing that the stimuli in question are interchangeable or substitutable for one another in certain contexts. I would prefer the adjective new to certain (cf. Goldiamond, 1962) to emphasize the fact that a stimulus that, as a result of training, becomes a member of a particular class can now “evoke a reaction . . . with which it has never been associated” (Hull, 1939, p. 9). In other words, that stimulus must be shown to occasion new, untrained (emergent) behavior. I organized this chapter around two desired takehome messages. The first is that disparate stimuli that share a common association are often treated as belonging together—that is, as members of the same stimulus class or category. This message merely reinforces a long-recognized finding in the psychological literature (e.g., Hull, 1939; Jenkins & Palermo, 1964; Peters, 1935; Shipley, 1935). The common association in question can be the occasioning of the same reinforced response, the signaling of the same but distinct reinforcer, or more generally association with the same outcome. I provide examples of each of these effective associations from both human and nonhuman animal research and illustrate how the resulting stimulus classes have been demonstrated. The second message is that any demonstration of stimulus substitutability or interchangeability requires researchers to know precisely the composition of each class member (viz., what the functional stimuli are) and how they control behavior. Here, my focus narrows somewhat as I draw on research from the animal literature showing 362

that an elusive behavioral effect indicative of stimulus class formation finally made its appearance once it was recognized that the functional stimulus was a bit more complex than researchers originally thought. Stimulus Control Primer Humans’ behavior and that of other animals do not occur in an environmental vacuum. To the contrary, certain behavior often routinely occurs in certain circumstances but not in others. Likewise, different behavior is often observed in different circumstances. In a nutshell, these general observations capture what learning psychologists mean by the term stimulus control: Particular circumstances or stimuli, when present, occasion or control particular behavior. Consider studying, for example, ignoring all of the different forms this behavior can take. Studying is occasioned by particular circumstances—for example, those associated with an academic session at a college or university. Stated otherwise, studying occurs more frequently at college during the academic session than it does elsewhere. The academic environment, then, can be said to control this behavior, a point easily appreciated by students who plan to catch up on studying while at home on weekends or during semester breaks only to realize later on how unsuccessful their plans were. A likely reason is that the circumstances (stimuli) associated with home makes studying far less likely and other behavior far more likely. To understand how stimulus control originates, consider young children learning the names of objects. Using picture books, parents typically teach their children the names of pictures to which they point (e.g., to say “dog” to a picture of a dog, “cat” to a picture of a cat) by providing social reinforcement only when the child says the correct name. This simple procedure eventually generates stimulus control over naming—that is, different pictures come to occasion different names (different behavior) by the child. It also illustrates how discrimination training effectively establishes stimulus control: Different behavior (saying “dog” vs. saying “cat,” respectively) is explicitly reinforced in the presence of different stimuli (picture of a dog vs. picture of a

Stimulus Control and Stimulus Class Formation

cat). Note, too, that discrimination training also involves nonreinforcement of incorrect behavior (e.g., saying “dog” to a picture of a cat, and vice versa), thus narrowing or sharpening the control any particular stimulus exerts over behavior. Of course, the correct or reinforced behavior learned to a particular stimulus (e.g., a specific dog picture) will then frequently occur or generalize to other perceptually similar stimuli (pictures of other dogs, real dogs, etc.)—a process known as stimulus generalization (e.g., Honig & Urcuioli, 1981; Urcuioli, 2003). Discrimination training can take a variety of forms. For instance, successive discrimination involves presenting different stimuli individually at different times, reinforcing the appropriate behavior and nonreinforcing other behavior to each. Successive discrimination is implied in the preceding picture–name example. By contrast, if the parent were to show a picture of a dog and a cat together and ask the child to point to one or the other (reinforcing the correct choice), the child must differentiate between the simultaneously presented pictures to point appropriately to one of them, illustrating a simultaneous discrimination. In fact, this latter example in its entirety illustrates a conditional discrimination: The picture to which the child must point depends (is conditional) on what the parent requests. If the parent asks for the dog, pointing to the dog rather than to the cat is reinforced; if the parent asks for the cat, pointing to the cat rather than to the dog is reinforced. This instructional method, then, actually requires both a successive and a simultaneous discrimination: To perform correctly, the child must differentiate between the individual words spoken sequentially by the parent (dog vs. cat) and must also differentiate between the concurrently presented animal pictures. In the stimulus control literature, this conditional discrimination procedure would be called two-alternative matching-to-sample or matching-tosample for short. In the example, the samples are the successively spoken words dog and cat. The simultaneously presented dog and cat pictures are called comparisons or comparison alternatives. The task is to match the appropriate comparison picture to the previously heard sample, hence the name of the procedure.

This procedure has many variations, often with procedure-specific names. For example, arbitrary matching (also known as symbolic matching) involves samples and comparisons that have no obvious physical resemblance to one another, as the preceding example illustrates. Identity matching, however, consists of samples and comparisons that are physically identical, as would be the case if the child was required to point to a dog or a cat comparison picture after being shown one or the other picture as the sample stimulus. Finally, successive matching refers to a set of conditional discriminations in which only one comparison (rather than two or more) appears after each sample stimulus. Successive refers to the fact that subjects must not only discriminate between successively (individually) presented samples but, in addition, must discriminate between successively (individually) presented comparisons. Successive matching is sometimes called go–no-go matching because subjects learn that responding (go) to each individually presented comparison is reinforced only when that comparison is preceded by a particular sample. When that same comparison is preceded by a different sample, responding is nonreinforced, and subjects eventually learn not to respond to it (no go) on those occasions. Successive, simultaneous, and conditional discriminations have been implemented in many ways in research with humans and other animals. The reader will see some examples in figures included in this chapter (e.g., Figures 16.2, 16.7, and 16.10). Not surprisingly, some stimuli used with humans (e.g., written or spoken words) cannot be used with other animals. Nonetheless, these details are less important than the general features of the procedures and that visual or auditory stimuli used with nonhuman animals accommodate their natural sensory abilities. Stimulus Classes: Partitioning by Common Responses A robust finding in the psychological literature is that physically dissimilar stimuli that occasion the same reinforced response become members of the same stimulus class. This has been demonstrated in 363

Peter J. Urcuioli

a variety of ways, and a good place to start is by describing some recent experiments by Lowe and his associates (e.g., Horne, Lowe, & Harris, 2007; Lowe, Horne, Harris, & Randle, 2002; Lowe et al., 2005) involving young children between 1 and 4 years old. In one study (Lowe et al., 2002, Experiment 1A), children were taught to say “zag” to each of three arbitrary wooden shapes and to say “vek” to each of three other arbitrary wooden shapes with positive social feedback for correct responding (an example of a successive discrimination). For the most part, the shapes bore no obvious resemblance to any real-world object or figure, and moreover, their random assignment across subjects into a zag group and a vek group minimized or eliminated any possibility that the shapes within a particular group would be perceptually more similar to one another than to shapes in the other group. After each child learned to consistently say “zag” and “vek” to the various stimuli, his or her ability to sort the shapes into common-response groups was tested in a category-sort task. The task involved presenting all six shapes together, pointing at one, and saying to the child either “Look at this. Can you give me the others?” or “What is this? Can you give me the others?” (a conditional discrimination example). With the former test instructions, four of nine children correctly selected the zag and the vek shapes virtually without error, whereas the remaining five children were completely unable to sort the shapes in groups. However, when the latter five children were retested again using the “What is this?” instruction, all five appropriately sorted the zag and vek shapes, averaging 92.1% correct sorts (range = 83.3%–100%). These findings have been systematically replicated by Miguel, Petursdottir, Carr, and Michael (2008), who also showed that preschool children could not sort shapes into their eventual groups until they were taught to provide common labels (north vs. south) for those shapes. Furthermore, other research has shown that success on these stimulus class tests requires that children actually learn to say “zag” or “vek” to each shape stimulus (i.e., to give or be able to give a common albeit distinct response to the shapes within each group). For example, Horne, Lowe, and Randle (2004) showed that if young children simply learned to give the 364

researcher individual wooden shapes when the researcher asked for a zag or a vek stimulus, all of them were subsequently unable to correctly sort the shapes into groups when requested to do so using the “Look at this” instruction. Interestingly, most could also not provide the appropriate zag or vek response when asked what each shape stimulus was (i.e., they could not provide the common name response). The partitioning of physically dissimilar stimuli into distinct classes via this sort of common response training can also be demonstrated in a transfer-of-control or a transfer-of-function test (Dougher & Markham, 1994, 1996; see also Volume 2, Chapter 1, this handbook). The procedure works as follows. First, subjects learn a common reinforced response to one group of disparate stimuli and another reinforced response to a second group of stimuli. After original training has been successfully completed, some new behavior is explicitly trained to one member of each common-response set of stimuli. This second training phase has for obvious reasons been called reassignment training by Wasserman, DeVolder, and Coppage (1992). Finally, subjects are tested with the remaining stimuli to see whether those stimuli, too, will now occasion the same response that has just been explicitly taught to one of their purported class members. If original training had generated stimulus classes defined by the common response conditioned to each class member, then stimulus control over new behavior established to one member during reassignment training should immediately transfer to all other members (hence, transfer of control). These transfer effects were demonstrated in young children by Lowe et al. (2005). Original (or initial) training with arbitrary wooden shapes was virtually identical to that of Lowe et al. (2002). During subsequent reassignment training, each child was taught to clap to one shape in the zag group and to wave to one shape in the vek group. When the researcher then asked, “How does this go?” to each of the other shapes, every child clapped or waved in a class-consistent fashion. Although common naming—in the verbal–spoken meaning of that term—may be a potent way for stimulus classes to develop (Dugdale & Lowe, 1990; Eikeseth & Smith, 1992; Mandell & Sheen, 1994;

Stimulus Control and Stimulus Class Formation

see also Reese, 1972), it is certainly not the only type of effective response, as illustrated by other research using both humans and nonhuman animals (e.g., Garcia & Rehfeldt, 2008; Horne et al., 2007; Urcuioli, 2006a). This is underscored by Table 16.1, the upper portion of which shows a generalized schematic of common-response training along with reassignment training and a subsequent transfer-ofcontrol test. The hypothesized stimulus classes are shown in the lower portion of the table. For example, Spradlin et al. (1973, Experiments 1 and 2) initially trained adolescents with intellectual disabilities to press the lower of two windows displaying a star (Response [R] 1) after seeing either a striped circle (Stimulus [S] 1) or a horizontal bar (S3) displayed in an upper window and to press whichever lower window displayed an infinity sign (R2) after seeing either a solid circle (S2) or a vertical bar (S4) displayed in an upper window. (The upper window stimuli were the samples, and the lower window stimuli were the comparisons in this matching-to-sample task.) On each training trial, both comparisons were presented for response choice, one in one lower window and the other in Table 16.1 Procedure to Assess Stimulus-Class Formation on the Basis of Partitioning by Common Reinforced Responses Training Stimulus-class Common response S1 → R1+ S2 → R2+ S3 → R1+ S4 → R2+

Reassignment

tests

S1 → R3+ S2 → R4+

S3 → R3 vs. R4 S4 → R3 vs. R4

Hypothesized stimulus classes S1

S2

S3

S4

Note. S1–S4 represent different stimuli, and R1–R4 represent different reinforced (+) responses. Italicized responses represent class-consistent responding assuming the two hypothesized stimulus classes shown as the circled elements.

the other window with their positioning randomized across trials. Notice that (a) each reinforced comparison-choice response (e.g., R1) was occasioned by more than just one sample and (b) samples occasioning a common response (e.g., S1 and S3) were no more similar to one another in appearance than samples occasioning a different response (e.g., S1 and S4); indeed, in this experiment, they were most certainly less similar. To assess stimulus class formation, Spradlin et al. then reinforced the selection of a mushroomlike comparison (R3) after the striped-circle sample (S1) and a chainlike-figure comparison (R4) after the solid-circle sample (S2) during reassignment training. Testing involved presenting, on different trials, the horizontal- and vertical-bar samples (S3 and S4, respectively) followed by the R3 and R4 choice alternatives. These infrequent test trials were inserted among all of the various training trials from original and reassignment training and, unlike the training trials, were nonreinforced. Stimulus class formation via common-response training should yield a preference for R3 after S3, and for R4 after S4, on the test trials (i.e., class-consistent responding), which is precisely what Spradlin et al. (1973, Experiments 1 and 2) found. Their average training and test performances are depicted in the top panel of Figure 16.1. Using 12 color slides each of people, cars, chairs, and flowers, Wasserman et al. (1992) trained hungry pigeons to make one spatially distinct choice response whenever a slide depicting cars or flowers was shown on a center viewing screen of a fourchoice panel (see left side of Figure 16.2) and a different, spatially distinct choice response whenever a slide depicting people or chairs was shown. Thus, one common response was reinforced with food after each photographic stimulus in two disparate groups of complex stimuli, and an alternative common response was reinforced with food after each photographic stimulus in two other disparate groups of complex stimuli. To evaluate whether this common-response training yielded two stimulus classes—that is, [cars, flowers] and [people, chairs], pigeons were then taught to make new spatial choice responses to one set of slides making up each presumed class (e.g., chairs and flowers; see middle of Figure 16.2). Finally, the remaining sets of slides 365

Peter J. Urcuioli

Figure 16.1. Stimulus class (transfer-of-control) results after common response training. Top panel: Averaged percentage of correct choice (±1 standard error of the mean) of six adolescents with mental retardation on all reinforced training trials (baseline) and average percentage of class-consistent choices (±1 standard error of the mean) on nonreinforced test trials (probes) over two test sessions. Dotted line indicates chance level of performance. Data from Spradlin, Cotter, and Baxley (1973). Bottom panel: Averaged percentage of correct choice of eight pigeons on reassigned training trials (cf. Table 16.1) and average percentage of class-consistent choices on nondifferentially reinforced test trials (probes) over four test sessions. Dotted line indicates chance level of performance. Data from Wasserman, DeVolder, and Coppage (1992).

Common-response Training

(e.g., people and cars) were intermixed among the reassignment training trials in a test phase (see right side of Figure 16.2) to see whether the new spatialresponse functions directly trained to the reassigned class members would immediately transfer to the remaining class members. The test trials were nondifferentially reinforced: Food was delivered no matter what spatial choice response pigeons made to each slide shown in testing. Wasserman et al. did this to assess the pigeons’ test-trial preferences based on prior learning and to avoid biasing those preferences in one way or another during testing. The bottom panel of Figure 16.1 shows the data from the second block of four test sessions in the Wasserman et al. (1992) study. The pigeon results correspond nicely with those reported by Spradlin et al. (1973). Although the percentage of classconsistent choices on the stimulus class test trials was not as high as the corresponding accuracy level on the explicitly reassigned training trials, it was nevertheless significantly above the level expected by chance alone (indicated by the dotted line). Moreover, the pigeon test results are perhaps even more impressive considering that they were tested with a total of 24 different slides (12 showing different people, 12 showing different cars). The adolescent subjects in the Spradlin et al. study were tested with just two sample stimuli. To what extent do the class-consistent choices exhibited in these tests truly reflect class partitioning by common responses in original training? This question was answered by Urcuioli and

Reassignment

Cars & Flowers

Testing Chairs

People ?

Viewing Screen

Food Hopper

People & Chairs

Flowers

Cars ?

Figure 16.2. A schematic depiction of the apparatus, reinforced response locations for the stimuli during common-response training and reassignment, and the class-consistent response locations during testing in Wasserman, DeVolder, and Coppage (1992). 366

Stimulus Control and Stimulus Class Formation

Lionello-DeNolf (2005), who gave commonresponse training to one group of pigeons but unique-response training to a second, control group before reassignment and testing (see Table 16.2). For the latter one-to-one (OTO) group, each sample stimulus occasioned a different reinforced choice response during initial training; otherwise, its reassignment training and testing (cf. the middle and right portions of Table 16.2, respectively) were identical to that for the common-response or many-toone (MTO) group. Because there were no common reinforced responses across samples for the OTO group during initial training, there could be no common-response sample classes and, thus, no a priori way to classify these pigeons’ choices in testing as class consistent or class inconsistent (Urcuioli & Lionello-DeNolf, 2005). Consequently, to facilitate a between-group comparison, each pigeon in the OTO group was paired with a pigeon in the MTO group that had the Table 16.2 Design to Evaluate the Role of Common Responses in Stimulus Class Formation (cf. Urcuioli & Lionello-DeNolf, 2005, Experiment 1) Training Initial MTO S1 → R1+ S2 → R2+ S3 → R1+ S4 → R2+

Reassignment

Stimulus-class tests

S1 → R3+ S2 → R4+

S3 → R3 vs. R4 S4 → R3 vs. R4

OTO S1 → R1+ S2 → R2+ S3 → R5+ S4 → R6+

S1 → R3+ S2 → R4+

S3 → R3 vs. R4 S4 → R3 vs. R4

Note. S1–S4 represent different stimuli, and R1–R6 represent different reinforced (+) responses. Underlined responses highlight the difference between groups in initial training and the one-to-one sample-comparison response relation for the OTO group. Italicized responses represent class-consistent responding for the MTO group during testing. MTO = many-to-one or commonresponse group; OTO = one-to-one or unique-response group.

same reinforced sample–comparison relations during reassignment and experienced the same differentially reinforced sample–comparison relations in testing. For these tests, half of the common-response (MTO) pigeons were tested on reinforced relations consistent with the hypothesized stimulus classes arising from such training. The italicized responses shown in Table 16.2 indicate class-consistent reinforcement contingencies for these pigeons. For example, if S1 and S3 are members of the same class (cf. Table 16.1) and R3 was reinforced after S1 during reassignment, then reinforcing R3 after S3 during testing is consistent with class membership and should yield greater-than-chance levels of reinforced test trial responding. The remaining pigeons were tested on reinforced relations inconsistent with such classes (not shown in Table 16.2): For them, R4 (the alternative response) was reinforced after S3, and likewise, R3 was reinforced after S4, during testing (Urcuioli & Lionello-DeNolf, 2005). These reinforcement contingencies are inconsistent with class membership and should yield lower-than-chance levels of reinforced test trial responding (at least at the outset of testing). By contrast, pigeons in the OTO group were expected to respond haphazardly (viz., at chance) at the outset of testing because their samples could not have been partitioned into common-response classes given the absence of common-response training. Figure 16.3 shows the first-session test results for the MTO (common-response) pigeons in both the consistent and the inconsistent conditions and for all of the OTO pigeons combined. The above- versus below-chance performances for the two MTO conditions are indicative of stimulus class formation. The chance level of accuracy for the latter group of pigeons demonstrates that common-response training was indeed responsible for the different test performance profiles of the MTO pigeons (Urcuioli & Lionello-DeNolf, 2005). Stimulus Classes: Partitioning by Common Outcomes In his well-known Law of Effect, Edward L. Thorndike (1898) underscored the point that the 367

Peter J. Urcuioli

Figure 16.3. Average percentage of reinforced choices (±1 standard error of the mean) on the first stimulus class transfer-of-control test for pigeons initially trained on a commonresponse matching task (MTO group) or a task without common comparison responses (OTO group). MTO test results are shown separately for pigeons whose reinforced test trial choices were either consistent (Cons.) or inconsistent (Incons.) with stimulus class formation. MTO = many to one, or common response; OTO = one to one, or unique response. From “The Role of Common Reinforced Comparison Responses in Acquired Sample Equivalence,” by P. J. Urcuioli and K. M. Lionello-DeNolf, 2005, Behavioral Processes, p. 69. Copyright 2005 by Elsevier B.V. Adapted with permission.

consequence or outcome of behavior can influence the future likelihood of that behavior. Indeed, the study of how behavior is affected by its consequences defines the field of operant and instrumental conditioning. Just considering the consequence called positive reinforcement, a little reflection indicates that the reinforcing consequence or outcome for behavior more often than not varies across behavior. Moreover, because reinforced behavior is typically occasioned by antecedent stimuli, those stimuli will also be associated with different outcomes. Given the behavioral potency of reinforcing outcomes and their frequent association with antecedent stimuli, it should come as no surprise that the typical experience of differential outcomes provides another means for generating stimulus classes. Indeed, a considerable body of research has confirmed that disparate stimuli will be partitioned into classes on the basis of a common association with a 368

particular, albeit distinctive outcome (cf. Urcuioli, 2005). Stated otherwise and in reference to the introductory quotation by McIlvane (1992), physically dissimilar stimuli can have similar effects on behavior because the responses they occasion lead to the same reinforcing outcome. One way to demonstrate partitioning of stimuli into classes by common outcomes is to use a transferof-control design such as that shown in Table 16.3, which provides the schematics for two phases of training involving different stimuli and different reinforced responses: S1 and S2 in Phase 1 versus S3 and S4 in Phase 2 and, likewise, reinforced responses R1 and R2 in Phase 1 versus R3 and R4 in Phase 2. Both training phases, however, share the use of the same two reinforcing outcomes (Os), O1 and O2. (The reader should note that it is unnecessary to conduct these training phases separately; they can be conducted concurrently.) The critical feature is that O1 is common to two physically dissimilar stimuli (viz., S1 and S3), and O2 is common to two other physically dissimilar stimuli (S2 and S4). If these outcome associations partition the four stimuli into two classes (cf. the lower portion of Table 16.3), then S1 should be interchangeable with S3 (and vice versa), and likewise, S2 should be Table 16.3 Procedure to Assess Stimulus-Class Formation on the Basis of Partitioning by Common Outcomes Differential outcome training Phase 1 S1 → R1+ (O1) S2 → R2+ (O2)

Phase 2

Stimulus-class tests

S1 → R3 vs. R4 S2 → R3 vs. R4 S3 → R1 vs. R2 S4 → R1 vs. R2 Hypothesized stimulus classes S3 → R3+ (O1) S4 → R4+ (O2)

S1

S2

S3

S4

Note. S1–S4 represent different stimuli; R1–R4 represent different reinforced (+) responses, and O1 and O2 represent different outcomes. Italicized responses represent class-consistent responding assuming the two hypothesized stimulus classes shown as the circled elements.

Stimulus Control and Stimulus Class Formation

interchangeable with S4 (and vice versa). Such interchangeability would then permit each stimulus to occasion a response it had never previously occasioned. Research conducted with pigeons, young children, and adults demonstrate just such an effect. For example, Edwards, Jagielo, Zentall, and Hogan (1982, Experiment 2) trained one group of hungry pigeons on two separate identity-matching tasks in which they learned in one task to respond to the same color comparison (red or green) as a preceding color sample (red or green) and in another task to respond to the same black-on-white shape comparison (plus or circle) as a preceding shape sample (plus or circle). For this group, a correct comparison response after one color (and one shape) sample produced peas as the food reinforcer, whereas a correct comparison response after the other color (and other shape) sample produced wheat as the food reinforcer. In other words, the reinforcing outcomes for this group were correlated with particular sample stimuli (viz., peas [O1] with S1 and S3, and wheat [O2] with S2 and S4). For two other groups, the reinforcing outcome was either always the same (a mixture of peas and wheat) or random or uncorrelated with respect to the sample stimuli (e.g., peas for a correct response after S1 on some trials, but wheat for the same correct response after S1 on other trials, and likewise for the correct responses following the remaining three samples). In testing, the samples from the color identitymatching task were swapped with the samples from the shape identity-matching task such that pigeons now responded to shape comparisons after color samples and to color comparisons after shape samples. For the correlated group, the correct (reinforced) responses in testing were those associated with the two sample stimuli that had shared the same reinforcing outcome during training. For example, if the red comparison response (R1) after the red sample (S1) produced peas (O1) in one training task and the plus comparison (R3) response after the plus sample (S3) also produced peas (O1) in the other training task, then a red comparison response (R1) after the plus sample (S3) in testing was correct and reinforced with peas (O1). Stated otherwise, the class-consistent comparison responses were reinforced with their associated

outcomes during this group’s stimulus class test. The reinforcement contingencies for pigeons in each of the two control groups during testing were matched to those of the correlated group to ensure comparability of test performances across the three groups. If the sample stimuli from training were partitioned into classes as a result of common outcome associations in the correlated group, accuracy on the very first test session should be above the level expected by chance alone (i.e., more than 50% correct). By contrast, the absence of these differential outcome associations in the two control groups should not result in partitioning, and consequently, their first test session accuracies should be at or close to chance. The top panel of Figure 16.4 shows this was precisely the pattern of first-session results that Edwards et al. (1982) observed. A stimulus class interpretation of these results may seem to be a stretch given that the average firstsession accuracy for the correlated group, although significantly above chance, was still well below the accuracy on each training task, which was in the neighborhood of 90% correct (data not shown). However, the fact that peas and wheat share the common feature that both are food and both are familiar to pigeons may have diminished the chance of a strong partitioning of the stimuli with which they are associated. Besides, if no partitioning whatsoever occurred, test session accuracies in the correlated group should have been comparable to those in the two control groups, which was clearly not the case. Maki, Overmier, Delos, and Gutmann (1995, Experiment 3) reported transfer results very similar to these in typical 5-year-old children. In their experiment, each child learned two separate arbitrary matching tasks (viz., tasks in which the comparison alternatives were physically dissimilar to the sample stimuli). For two children, a particular colored chip (e.g., blue) was delivered after a correct response following one sample in each task, and a different colored chip (e.g., red) was delivered after a correct response following the other sample in each task. Each colored chip was later exchangeable for small toys versus food. Thus, the samples in each training task were associated with different reinforcing outcomes. For two other children, every 369

Peter J. Urcuioli

Figure 16.4. Stimulus class (transfer-ofcontrol) test results after common outcome training. Top: Averaged percentage of reinforced class-consistent choices (±1 standard error of the mean) over the first two test sessions for six pigeons trained with pea and wheat reinforcers correlated with pairs of sample stimuli in training (Corr.) and the corresponding reinforced choices (±1 standard error of the mean) for pigeons initially trained with a reinforcer mixture (Same) or with peas and wheat uncorrelated with the sample stimuli (Uncorr.). Data from Edwards, Jagielo, Zentall, and Hogan (1982, Experiment 2, Figure 5). Bottom: Percentage of classconsistent (correct) choices averaged over 32 nonreinforced test trials for two children initially trained with different (D) outcomes correlated with pairs of sample stimuli or with those outcomes occurring nondifferentially (ND) with respect to the samples. Data from Maki, Overmier, Delos, and Gutmann (1995, Experiment 3, Figure 7).

370

reinforced comparison response yielded the same colored chip (white), also exchangeable for either toys or food (a nondifferential condition). Later, the samples from one task were swapped for those of the other, and vice versa, on nonreinforced test trials to see whether those samples would preferentially occasion particular comparison responses of the other task. The expectation was that in the differential condition (cf. Table 16.3), the samples would occasion class-consistent (correct) responding in testing. For comparison, test-trial responses in the nondifferential condition were deemed correct by correspondence to those same responses in the differential group. The bottom half of Figure 16.4 shows the average test results for each differential (D) and nondifferential (ND) child. The above-chance levels of accuracy exhibited by the D children demonstrate transfer of control indicative of stimulus class formation. The chance levels of accuracy exhibited by the ND children show that the transfer effects seen in the D children were, indeed, the consequence of the common outcome associations in training. These effects have been systematically replicated in a wide variety of experiments using rats (Kruse, Overmier, Konz, & Rokke, 1983), pigeons (Meehan, 1999; Peterson, 1984; Urcuioli, 1990), children (Schenk, 1994), and adults with mental retardation (e.g., Dube, McIlvane, Maguire, Mackay, & Stoddard, 1989). Response-independent (Pavlovian) outcome associations are also effective in partitioning stimuli into classes (e.g., Honey & Hall, 1989; Shipley, 1935). In one study, Hall et al. (2003, Experiment 1) had university students observe individual shape– color sequences on a computer monitor. During the 32-trial training phase, two of the shapes were always followed by one colored rectangle, and the other two shapes were always followed by a different colored rectangle (see Figure 16.5). In a subsequent stimulus class test, each subject learned to make a left versus right keyboard response to each shape stimulus (with feedback for correct and incorrect responses). For some subjects, each correct response was mapped onto shapes that shared the same color (outcome) association in training (consistent condition); for the remaining subjects, each spatial response was mapped onto shapes that had different

Stimulus Control and Stimulus Class Formation

Training

Stimulus-Class Test 1

Stimulus-Class Test 2

(Consistent)

Left R + Left R + Right R +

Red Red Green Green

Right R +

or (Inconsistent)

Red

Left vs. Right R

Green

Left vs. Right R

Left R + Right R + Left R + Right R +

Figure 16.5. A schematic depiction of the training contingencies and the reinforced (Test 1) and nonreinforced (Test 2) transfer-of-control tests in Hall, Mitchell, Graham, and Lavis (2003, Experiment 1). R = response.

color associations in training (inconsistent condition). As predicted by outcome-based partitioning of shapes, average accuracy was higher in the consistent than in the inconsistent condition: 96.4% versus 80.2% correct, respectively. In a second stimulus class test, subjects were given the opportunity to make left versus right keyboard presses (without feedback) to the outcomes themselves. If the outcomes had directly or indirectly become members of the partitioned classes (see also Joseph, Overmier, & Thompson, 1997), then given the motor responses learned during the first stimulus class test, consistent subjects were predicted to make the same left versus right responses to the outcome colors as they had to their associated shapes—which they did, 96.9% of the time. By contrast, inconsistent subjects were predicted to respond haphazardly—and did. Hall et al. (2003, Experiment 3) replicated these results using nonsense syllables as the outcomes following the shape stimuli. Specifically, subjects were more accurate in their left versus right keyboard responses when the same motor response was required to shapes previously associated with the same (as opposed to a different) nonsense syllable. Furthermore, when the consistent group from this first stimulus class test (cf. Figure 16.5) was then required to learn the left versus right responses to the nonsense syllables themselves, the discrimination was more accurate when the reinforced

response for each syllable was the same as (rather than the opposite of) the reinforced response to the shapes associated with that syllable. The latter results support the idea that the syllables (outcomes) were either members of the stimulus classes themselves or evoked by them. Reversal Test Assessments Another way to assess partitioning of stimuli into classes is to conduct reversal tests after commonresponse or common-outcome training. The idea behind these tests is that if the response occasioned by, or the outcome signaled by, one or more stimuli in a class is explicitly reversed vis-à-vis original training, that change should propagate through the other class members. Propagation is operationally defined as an immediate and appropriate reversal of the behavior originally occasioned by those other class members. Von Fersen and Delius (2000) provided an example of this by successively reversing a fourstimulus auditory discrimination in two dolphins. A schematic of their procedure is shown in Table 16.4. During initial training, one spatial response (left) was reinforced (indicated by a plus sign) after each of two different tonal frequencies (S1 and S3), and another spatial response (right) was likewise reinforced after each of two other tonal frequencies 371

Peter J. Urcuioli

Table 16.4 Reversal Test Procedure Used by Von Fersen and Delius (2000) to Assess Stimulus-Class Formation on the Basis of Partitioning by Common Reinforced Responses Reversal test Common response training

Leading pair

S1 → Left+ S2 → Right+ S3 → Left+ S4 → Right+

S1 → Right+ S2 → Left+

Trailing pair S3 → Left vs. Right S4 → Left vs. Right

Hypothesized stimulus classes S1

S2

S3

S4

Note. S1–S4 represent different stimuli, and a plus sign indicates reinforced responses. Italicized responses represent class-consistent responding assuming the two hypothesized stimulus classes shown as the circled elements.

(S2 and S4). (Incorrect responses produced a loud whistle.) After accuracy reached 80% correct or better, these contingencies were reversed by (a) reinforcing the opposite spatial response to just one stimulus in each common-response set for 10 trials (e.g., S1 and S2; viz., the so-called “leading pair” of stimuli), after which (b) the remaining two stimuli (e.g., S3 and S4; viz., the so-called “trailing pair”) were presented for another 10 trials. More important, the first three trailing-pair trials ended without any feedback (i.e., without reinforcement or the loud whistle), so von Fersen and Delius could measure how the explicitly taught leading-pair reversal affected the dolphin’s subsequent responses to the trailing pair uncontaminated by any new learning with the latter pair. Because the first four trailingpair choices were made before any feedback was obtained on those trials, these four trials were used to assess stimulus class formation. Von Fersen and Delius (2000) conducted 14 such reversals. Figure 16.6 shows the accuracy of each dolphin for the last eight leading-pair trials (open bars) and the first four trailing-pair trials 372

Figure 16.6. Reversal-test results over the last seven reversals for two dolphins after common-response training. Open bars show average accuracy for the last eight leading trials of the reversals. Black bars show average accuracy for the first four no-feedback trailing trials of the reversals. Data from von Fersen and Delius (2000).

(filled bars) averaged over the last seven reversals. Leading-pair accuracies were approximately 75% correct, which may seem low but is somewhat artificial, reflecting the fact that at the outset of a reversal session, dolphins had no way of knowing that their previously reinforced (correct) responses were no longer correct (i.e., were now errors). Thus, they would make one or more errors during the 10 leadingpair trials that began a reversal session before switching their responses in accordance with the reversal contingencies. More noteworthy is the much higher levels of accuracy (82% and 89% correct) for the no-feedback trailing-pair trials, a result consistent with stimulus class formation. Kastak, Schusterman, and Kastak (2001, Experiment 1) found similar reversal effects in two sea lions trained on sets of simultaneous (S+ vs. S−) discriminations, each of which involved a choice between a letter and a number, using 10 different letters and 10 different numbers. In one experimental phase, letters were the S+ (reinforced) stimuli, and numbers were the S− (nonreinforced) stimuli for an initial block of trials in a session. After achieving at least 90% accuracy during this initial block, the S+ and S− roles of the stimuli were reversed for a subsequent block of trials. Moreover, each reinforced letter choice produced one type of fish

Stimulus Control and Stimulus Class Formation

(herring), whereas each reinforced number choice produced another type (capelin), and a total of 40 within-session reversals were conducted to assess class partitioning by common outcomes. Over the last 10 reversals, the sea lions were 82% and 72% correct overall to the 10 S+ stimuli on their first presentation after a reversal. (The idealized maximum percentage correct was 90% correct because the first reversal trial could not be anticipated.) In short, after one class member (letter or number) was switched from an S+ to an S−, and vice versa, the others followed. Interestingly, the effect obtained by Kastak et al. (2001) required different reinforcing outcomes for the two sets of S+ stimuli. In other words, common association with just the presence versus absence of reinforcement per se was insufficient to induce class partitioning. By contrast, in successive discrimination reversal experiments with pigeons (Vaughan, 1988), adult humans (Sidman, Wynne, Maguire, & Barnes, 1989), and a chimpanzee (Tomonaga, 1999), a within-group association with reinforcement versus nonreinforcement is sufficient to yield reversal effects indicative of stimulus class formation. For example, Vaughan (1988) randomly divided 40 different photographs of trees into two groups and reinforced pecking to any stimulus in one (S+) group; pecking to any stimulus in the other (S−) group was not reinforced. After pigeons learned this discrimination (viz., by pecking at high rates to the S+ stimuli and very low rates to the S− stimuli), the S+ stimuli became the S− stimuli, and vice versa. Once this reversal was learned, the discrimination was again reversed, and this was repeated more than 120 times. Vaughan found that with successive reversals, seeing just a few photographic stimuli at the beginning of a reversal session was sufficient for pigeons to respond appropriately (viz., in a reversed fashion) to most of the remaining slides in that session. Other Stimulus Class Assessments and the Functional Stimulus If explicit training or learning experiences generate classes of interchangeable stimuli, a reasonable expectation is that the subject should be able to

match each stimulus in a class to any other stimulus in the same class (Spradlin & Saunders, 1986). The category sorting or category matching-to-sample task used by Lowe et al. (2002), Horne et al. (2007), and Miguel et al. (2008) provides precisely this sort of assessment and, as indicated earlier, has yielded results that confirm this expectation in young children. Sidman et al. (1989) also observed highly accurate matching of this type in both a typically capable adult and an adult with intellectual challenges (see also Dube, McIlvane, Mackay, & Stoddard, 1987; Dube et al., 1989). To conduct these assessments, researchers present the sample to be matched at one spatial location and the comparison alternatives from which a matching selection or selections are made at other spatially distinct locations. Successful (classconsistent) performances generally require that subjects attend solely to the nominal features of the stimuli in question and ignore their ostensibly irrelevant features (such as where they appear and when they appear). At first glance, this consideration would seem to be inconsequential, but it may not be (Dzhafarov & Colonius, 2006). When humans or other animals do not behave in a manner consistent with stimulus classes, it might reflect a lack of correspondence between the stimulus features that actually control their behavior and what the researcher mistakenly believes those features to be. Stated less charitably, the apparent absence of stimulus class formation in a particular situation may reflect conceptual shortcomings on the part of the researcher rather than conceptual limitations on the part of the subject (McIlvane, Serna, Dube, & Stromer, 2000). Consequently, understanding the functional stimulus (viz., what actually influences the subject’s behavior) is of paramount importance, especially when working with nonhuman animals. To illustrate this point, the top portion of Figure 16.7 depicts an arbitrary matching training task in which pigeons see the sample stimuli on the center key of a three-key display. After the pigeon pecks the sample, it goes off and two comparison alternatives appear simultaneously on the left and right keys (with side-key location counterbalanced across trials; not shown). In this example, pecking the vertical-lines comparison is reinforced (+) after the 373

Peter J. Urcuioli

Figure 16.7. A schematic depiction of training contingencies for two-alternative arbitrary matching using red (R) and green (G) samples and vertical (V)-line and horizontal (H)-line comparisons and symmetry test trials in which the roles of the colors and lines as samples and comparisons are reversed. The circled elements depict two possible stimulus classes resulting from training.

red sample and pecking the horizontal-lines comparison is reinforced (+) after the green sample. Again, this is called arbitrary matching because the samples have no obvious physical resemblance to either of the comparisons. Nevertheless, the task is readily learned; the question of interest is what is learned? One possibility is that pigeons learn that red and vertical go together, as do green and horizontal—in words, that the color and line of each reinforced pair are members of a class, which is indicated by the circled elements shown on the right side of Figure 16.7. If so, then pigeons should now preferentially peck a red comparison when presented with a vertical sample and a green comparison when presented with a horizontal sample, as indicated at the bottom of Figure 16.7. In short, the explicitly taught red → vertical lines and green → horizontal lines conditional relations in training should be symmetrical (Sidman, 2000, 2008). Curiously, symmetry—one of three behavioral indices of an equivalence relation (Sidman & Tailby, 1982)—has been very difficult to demonstrate in nonhuman animals (e.g., D’Amato, Salmon, Loukas, & Tomie, 1985; Dugdale & Lowe, 2000; Hogan & 374

Zentall, 1977; Lionello-DeNolf & Urcuioli, 2002; Lipkens, Kop, & Matthijs, 1988; Sidman et al., 1982; but see Schusterman & Kastak, 1993, and McIntire, Cleary, & Thompson, 1987, for exceptions). Figure 16.8 shows data from experiments with monkeys, baboons, and pigeons in which none exhibited above-chance levels of accuracy on a symmetry test (solid bars) despite highly accurate baseline performances on the arbitrary matching training task (open bars). Notice that in Figure 16.7, I have labeled the members of the possible classes nominally: red (R), vertical lines (V), green (G), and horizontal lines (H). However, these stimuli may not be functional for the animal. After all, in arbitrary matching training, red and green only appeared on the center key and vertical and horizontal only appeared on the left and right keys. Perhaps the stimuli for the pigeon include location. If so, the functional samples would be red-on-the-center-key and green-on-the-centerkey, and likewise, the functional comparisons would be vertical-on-the-left (or right)-key and horizontalon-the-left (or right)-key. Arbitrary matching training, then, would result in learning to match red-on-the-center-key to vertical-on-the-left (or right)-key and green-on-the-center-key to horizontalon-the-left (or right)-key. However, if this is an accurate portrayal of what is learned, then the ostensible symmetry test is nothing of the sort because it does not involve the necessary interchanges of stimuli (viz., seeing whether pigeons will match verticalon-the-left [or right]-key to red-on-the-center-key). Instead, the test involves novel stimuli (viz., verticalon-the-center-key, horizontal-on-the-center-key, red-on-the-left [or right]-key, and green-on-the-left [or right]-key). Consequently, there is no reason to expect symmetry, and the data shown in Figure 16.8 amply confirm this expectation. Of course, if the functional matching stimuli include their location, then after learning twoalternative arbitrary matching (cf. Figure 16.7), subjects should also be unable to perform accurately if the familiar red and green samples now appear on the left or the right key (with the familiar comparisons on the center and remaining side key) because this change, too, generates novel stimuli. This is precisely what happens (Lionello & Urcuioli, 1998):

Stimulus Control and Stimulus Class Formation

Figure 16.9. The average percentage of correct choices in two-alternative matching by pigeons when the sample stimuli appear in their normal, center-key training location and in a novel, side-key test location. Data from Lionello and Urcuioli (1998).

Figure 16.8. Arbitrary matching training results (open bars) and symmetry test results (solid bars) for individual monkeys and baboons (top) and pigeons (bottom). Top panel: Data from Sidman et al. (1982). Bottom panel: From “A Test of Symmetry and Transitivity in the Conditional Discrimination Performances of Pigeons,” by R. Lipkens, P. F. M. Kop, and W. Matthijs, 1988, Journal of the Experimental Analysis of Behavior, 49, p. 405. Copyright 1988 by the Society for the Experimental Analysis of Behavior. Adapted with permission.

Changing the location at which the familiar samples and comparisons appear causes pigeons’ matching accuracies to drop to chance levels (see Figure 16.9). Similar results have been reported for monkeys and for rats (Iversen, 1997; Iversen, Sidman, & Carrigan, 1986; see also Sidman, 1992). In short, a stimulus consistently appearing in one location may not function as the same stimulus when later shown in a different location (Urcuioli, 2007; see also Urcuioli, 2008c). A valid symmetry test requires that either (a) matching performances in training are controlled

only by the nominal attributes of the matching stimuli (i.e., location is ignored) or (b) no stimulus changes location in the shift from training to testing. Lionello-DeNolf and Urcuioli (2000) accomplished the former by varying the locations of the samples and the comparisons throughout training. The sample appeared on either the left or the right key on different trials with the comparisons then appearing on the remaining two keys. After this multiplelocation task was acquired, the samples were then shown for the very first time on the center key (and the comparisons on the two adjacent side keys). Despite the change in location, most pigeons continued to perform at high levels of accuracy (i.e., they ignored location). However, Lionello-DeNolf and Urcuioli (2002) subsequently showed that despite multiple-location arbitrary matching training and verification that pigeons ignored stimulus location, they still did not exhibit symmetry in testing. The same (null) result was obtained even when pigeons received additional multiple-location training to ensure that each stimulus was seen in its eventually tested location. Clearly, the culprit here is not simply location, and the same appears to be true for monkeys and baboons as well (Sidman et al., 1982). That said, it would be a mistake to conclude that researchers interested in stimulus class formation need not 375

Peter J. Urcuioli

worry about the location at which stimuli appear because it may introduce an unwanted source of stimulus control if correlated with the nominal stimuli (cf. McIlvane et al., 2000). Other potentially undesirable sources of stimulus control are when each stimulus appears and whether it appears by itself or with other stimuli (Saunders & Green, 1999). Regarding the latter, just because subjects accurately select a particular comparison from among other concurrently presented comparisons (i.e., simultaneously discriminate between them) does not mean they will appropriately differentiate between those same stimuli when each appears individually (i.e., successively discriminate between them), and vice versa. Regarding the former, the sample stimuli appear first in a trial with the comparisons following them, so their respective ordinal positions (first and second) might very well constitute part of the functional matching stimuli. Indeed, both Pavlovian and operant conditioning provide evidence that animals are highly sensitive to ordinal cues and other forms of temporal information (e.g., Balsam & Gallistel, 2009; D’Amato & Colombo, 1988; Honig, 1981; Miller & Barnet, 1993; Terrace, 1986). These considerations are germane to recent successful demonstrations of symmetry in pigeons (Frank & Wasserman, 2005; Urcuioli, 2008b) using successive or go/no-go matching (Wasserman, 1976; see also Konorski, 1959). Successive matching differs from the n-alternative procedure in that (a) only one comparison is presented after a sample and (b) the single comparison appears at the same location as the preceding sample, which avoids any changes in stimulus location when shifting from training to testing and ensures the same type of required discrimination for both samples and comparisons (viz., successive). The task involves reinforcing responding to one comparison after one sample but not after the other, and vice versa for responding to the alternative comparison. Each comparison is usually presented for an extended period of time (e.g., 5 or 10 seconds) so that the rate of responding to it (the number of comparison responses per second) can be measured. With training, subjects confine more and more of their comparison responses to the reinforced (go) trials. 376

The upper left section of Figure 16.10 illustrates arbitrary successive matching training used by Urcuioli (2008b). The individually presented samples (red [R] and green [G]) are shown on the left; to their right are the individually presented comparisons (a triangle [T] and a set of horizontal stripes [H]). Some sample–comparison combinations (viz., R → T and G → H) ended in reinforcement, meaning that the first comparison stimulus peck after, say, 5 seconds on these trials produced food. The remaining combinations (viz., R → H and G → T) ended without reinforcement; the comparison simply went off after 5 seconds. Pigeons also received concurrent training on two other successive matching tasks, hue and form identity, shown in the middle and upper right sections of Figure 16.10. Reinforcement in these tasks occurred only on trials in which the comparison physically matched the preceding hue (or form) sample. The original rationale for including the latter tasks (Frank & Wasserman, 2005) was to familiarize pigeons with each stimulus in each ordinal position (i.e., as both a sample and a comparison) in preparation for the subsequent symmetry test. That test, shown in the bottom left section of Figure 16.10, involved presenting T and H first (viz., as samples) and R and G second (viz., as comparisons). The infrequent symmetry probe trials were intermixed among the training trials on the three already learned tasks (Urcuioli, 2008b). Each test probe ended without reinforcement (to avoid new learning on these trials) and was designated positive or negative depending on whether it reversed a reinforced or a nonreinforced arbitrary training combination, respectively. If training had generated stimulus classes consisting of the elements of each reinforced combination—[R, T] and [G, H]—these elements should be interchangeable with one another and yield symmetrical responding. In other words, more responding should occur to the R and G comparisons when preceded by the T and H comparisons, respectively, on symmetry probe trials. In fact, this is precisely what was found (Urcuioli, 2008b, Experiment 3; see also Frank & Wasserman, 2005, Experiment 1). Figure 16.11 shows probe data for three pigeons averaged over their first two test sessions along with their arbitrary matching

Stimulus Control and Stimulus Class Formation

Figure 16.10. Successive matching training on arbitrary matching, hue identity matching, and form identity matching, and the four symmetry probe trials in which the arbitrary matching sequences are reversed (cf. Urcuioli, 2008b). Samples and comparisons are shown to the left and right of the arrows, respectively; + and − indicate reinforced and nonreinforced trials, respectively; and pos and neg denote the reverse of the reinforced and nonreinforced arbitrary matching trials, respectively. R = red; G = green; pos = positive; neg = negative.

(training) data from those same sessions. The explicitly trained performances were, of course, highly discriminative: Pigeons responded much more to the comparisons on reinforced (positive) than on nonreinforced (negative) trials. More noteworthy is the finding that they also behaved in a highly discriminative fashion on the symmetry test trials. Specifically, comparison-response rates were much higher on probes that were the reverse of the positive training trials than on probes that were the reverse of the negative training trials. This, then, is another example of stimulus class formation. Given so many past failures to demonstrate symmetry in nonhuman animals using the n-alternative choice procedure, why did the successive matching version succeed? Presenting all stimuli singly and at the same spatial location certainly helped. More important, perhaps, is that half of all successive matching training trials end in nonreinforcement no matter what the subject’s level of performance is. Even if pigeons peck the comparisons only on reinforced trials and never otherwise, they still experience

the nonreinforced combinations as frequently as they did early in training when they pecked equally on all trials. By contrast, the relative frequency of nonreinforcement in n-alternative matching diminishes substantially as subjects learn the task. I have proposed (Urcuioli, 2008b) that the equally frequent nonreinforcement of certain sample–comparison combinations juxtaposed with the reinforcement of other combinations facilitates the formation of stimulus classes whose members are the elements of reinforced combinations. Referring back to Figure 16.10, one such class would contain the R sample and T comparison (because this combination is reinforced), and another class would contain the G sample and the H comparison (because this combination is also reinforced). Note, too, that this means that the R sample and H comparison, and the G sample and the T comparison, are in different classes, as they should be because these combinations are never reinforced. However, this cannot be the entire story because this assumption alone predicts that arbitrary successive 377

Peter J. Urcuioli

Figure 16.11. Comparisonresponse rates (in pecks per second) on the arbitrary matching training trials and for the symmetry probe trials (cf. Figure 16.10) over the first two symmetry test sessions. The data are from two pigeons (EXT 2 and EXT 7) run in Urcuioli (2008b) and one pigeon (PRF4) run in Urcuioli (2008a).

matching training by itself should generate the stimulus classes necessary for symmetry—and it does not (Richards, 1988; see also Frank, 2007). After all, successful training (cf. Figure 16.10) appears to require concurrent hue and form identity training, ostensibly to avoid probe–trial combinations in which the stimuli appear for the very first time in novel ordinal positions. However, there are other ways to accomplish this—for example, by training two other arbitrary matching tasks in which pigeons see the eventually tested stimuli in their ordinal test 378

positions. Interestingly, symmetry does not emerge from such training (Frank, 2007). Clearly, something else is going on here besides ordinal familiarity. That something else brings us back to a consideration of the functional matching stimuli. Apparently, they are not just the nominal stimuli themselves. Rather, pigeons apparently code each successive matching stimulus in terms of “What did I see?” and “When did I see it?”—the nominal stimulus plus its ordinal position. Thus, the R sample is just that: namely, red in the first ordinal position (R1). Likewise, the T comparison is triangle in the second ordinal position (T2), and so forth. Consequently, the elements of the stimulus classes arising from arbitrary matching (cf. Figure 16.10) should not be represented as [R, T] and [G, H] but rather as [R1, T2] and [G1, H2]. For hue identity matching, the corresponding classes should be [R1, R2] and [G1, G2]; for form identity matching, they should be [T1, T2] and [H1, H2]. The reason I have italicized certain elements is to draw your attention to the fact that some classes share a common member. (The same is true for the classes involving G and H, but I have omitted italics for clarity.) If common members cause their respective stimulus classes to “integrate” or merge (Balsam & Gallistel, 2009; Sidman, Kirk, & Willson-Morris, 1985), the net result of the training depicted in Figure 16.10 is the two four-member classes shown in the top part of Figure 16.12. The solid lines indicate the explicitly trained and reinforced sample–comparison combinations from arbitrary matching (red sample– triangle comparison, or R1 → T2, and green sample–horizontal comparison, or G1 → H2). The dotted lines indicate the two probe relations that demonstrate the observed symmetry effect: greater comparison responding to the triangle sample–red comparison (T1 → R2) and the horizontal sample– green comparison (H1 → G2) probe combinations than to the other sample–comparison test combinations. Are the assumptions about ordinal position and class merger valid? One way to find out is to modify training such that hue oddity rather than hue identity serves as one of the three concurrent training tasks. In hue oddity, comparison responding is

Stimulus Control and Stimulus Class Formation

Figure 16.12. Top: The two four-member stimulus classes hypothesized to result from the concurrent successive matching training depicted in Figure 16.10. Bottom: The two four-member stimulus classes hypothesized to result from the concurrent successive matching training involving hue oddity rather than hue matching. Solid lines connect the elements of the reinforced sample–comparison sequences from arbitrary matching. Dotted lines connect the elements representing the symmetrical versions of either the reinforced arbitrary matching sequences (top) or the nonreinforced arbitrary matching sequences (bottom). R = red; G = green; T = triangle; H = horizontal stripes.

reinforced only when the hue comparison mismatches the prior hue sample. Thus, responding to the green comparison (G2) is reinforced after the red sample (R1) and responding to the red comparison (R2) is reinforced after the green sample (G1). This task also familiarizes subjects with each hue at each ordinal position, so on that account, symmetry should still be observed. By contrast, if the functional matching stimuli contain ordinal information, the exact opposite (antisymmetry) should be observed: Fewer comparison responses should occur on probe trials that reverse the reinforced arbitrary matching training trials. Stated otherwise, more comparison responses should occur on probe trials that reverse the nonreinforced arbitrary matching training sequences. The bottom part of Figure 16.12 provides a visual rationale for this prediction by showing the

merged four-member classes that should result from using hue oddity as one of the three successive matching training tasks. The odd reinforced training elements can be seen in each class: the green comparison (G2) after the red sample (R1) in the class on the left, and the red comparison (R2) after the green sample (G1) in the class on the right. Note, too, that the elements from the reinforced arbitrary matching relations (connected by the solid lines) are the same as those in the stimulus classes from training that involved hue matching (cf. top part of Figure 16.12). With hue oddity training, however, the form sample and hue comparison elements representing the reverse of the reinforced training relations (e.g., T1 and R2) are not in the same class. Rather, each class contains the elements representing the reverse of the nonreinforced arbitrary relations (e.g., T1 and G2); hence, the prediction of antisymmetry. Figure 16.13 shows data that confirm this counterintuitive prediction (Urcuioli, 2008b, Experiment 4). This figure presents probe data averaged over the first two sessions for three pigeons trained with hue oddity rather than hue identity as one of the three successive matching training tasks. Also plotted are their arbitrary matching (training) data from those same test sessions. Notice that each pigeon responded far more frequently on probe trials that were the reverse of the nonreinforced (negative) training trials. I want to emphasize two things before ending this rather lengthy section. The first is that distinctive outcomes are effective in partitioning stimuli associated with them into separate classes, a point I made and illustrated in the section Stimulus Classes: Partitioning by Common Outcomes earlier in this chapter. The regular reinforcement of certain sample–comparison combinations versus equally as regular nonreinforcement of other combinations throughout successive matching training generates classes containing the elements of a reinforced combination (see also Vaughan, 1988) and permitting the eventual emergence of new behavior. Second, although it would be easy for the reader to leave with the message that some training procedures are conducive to stimulus class formation and others are not when working with nonhuman animals, the 379

Peter J. Urcuioli

aware, their behavior becomes understandable. The same holds true in research with nonhuman animals. To make sense of their behavior (including why they do or do not exhibit evidence of stimulus class formation), one must know the features of the environment to which they attend. This is another way of saying that researchers must identify the functional stimuli. With nonverbal creatures, researchers cannot (obviously) ask for that information, so the challenge is to find other ways for them to provide that information. This exercise can be lengthy and sometimes daunting, but it is also a very rewarding one when its success helps to reveal interesting and theoretically important phenomena. Concluding Comments

Figure 16.13. Comparisonresponse rates (in pecks per second) for three pigeons (ODD2, ODD5, and ODD8) on the arbitrary matching training trials and for the symmetry probe trials over the first two symmetry test sessions after successive matching training using hue oddity rather than hue identity as one of the training tasks.

main message I wish to convey is why that is so. One reason has just been mentioned (viz., the impact of differential outcomes). Another, more important one has to do with identifying the actual stimuli controlling behavior. Everyone has probably had the experience, at one time or another, of reacting in unexpected ways and, when asked why, claiming to pay attention to certain things others were not fully aware of. When people are made 380

One of the stimulating and exciting aspects of work in the area of stimulus control is finding that certain types of training and learning experiences yield novel instances of stimulus control. In particular, otherwise familiar stimuli begin to occasion behavior never explicitly reinforced in their presence (viz., emergent behavior). Another way to say this is some forms of stimulus control are derivative: They are untrained, arising from other explicitly trained stimulus control relations. This is no small matter: It means that learning a small number of relations can greatly expand the potential for adaptive behavior far beyond the behavior that was explicitly trained. Besides what I have just said, another take-home message concerns the nature of the training and learning experiences that produce this expansion. Specifically, physically dissimilar stimuli are treated as belonging together—they become interchangeable for one another—if subjects learn the same, differentially reinforced response to them (e.g., giving them the same name or making the same spatial or comparison response to them) or if they are associated with a distinctive outcome. In other words, stimuli associated with a common reinforced response or common outcome become members of the same category or class. Although not especially newsworthy to researchers in the field, this message bears repeating and is an important one to convey to those unfamiliar with the supporting data.

Stimulus Control and Stimulus Class Formation

A third take-home message is that identifying the origins of stimulus class formation presupposes that researchers know exactly what the class members consist of. Are they simply the nominal stimuli? Or are they more complex versions that include features such as a stimulus’s spatial or temporal location? If researchers do not know or are mistaken about the nature of the functional stimuli, they run the risk of underestimating the capabilities of their subject—for example, by concluding that they do not categorize when, in fact, they do or can. “Know the functional stimulus” is an especially important message because it is so easily overlooked, something I have experienced firsthand. Stimulus control has long been my primary area of research, yet I have on occasion devoted unnecessarily large amounts of time to understanding certain results (e.g., Urcuioli, 2006b, 2008c) that would have been far less time consuming had I listened to my own message and had a less anthropocentric view of the nature of the functional stimuli. The ability of my recent theory of stimulus class formation (Urcuioli, 2008b) to correctly predict emergent effects originally thought by some to be unique to humans hinges on the proper identification of the functional stimulus. Although theoretical relevance may appear to be just an intellectual exercise (albeit an important one; cf. Skinner, 1935), knowing the functional stimulus has important practical applications, too. For instance, some methods used to teach or to retrain basic language and symbolic skills often consist of procedures for promoting stimulus class formation (e.g., Cowley, Green, & Braunling-McMorrow, 1992; McIlvane, 1992). Their success depends heavily on accurate stimulus definition. Variations in the outcomes of these interventions may reflect unrecognized or incidental sources of stimulus control: features of stimuli that from experimenter’s perspective appear irrelevant or inconsequential but from the subject’s perspective are quite the opposite (McIlvane et al., 2000). Consistently effective interventions will be achieved only by identifying these sources of stimulus control and, if necessary, neutralizing them. In the preceding section of this chapter, I discussed the impact of ordinal information—when a particular stimulus occurs in a sequence of stimuli.

For pigeons and other animals (e.g., Balsam & Gallistel, 2009; Miller & Barnet, 1993), this can be an important component of stimulus control. Is the same true for humans? Do people naturally code stimuli in terms of when as well as in terms of what? The notion of episodic memory, remembering specific past events in people’s lives (Tulving, 1983; see also Zentall, 2006), suggests that they do. Remembering when asked to do so, however, does not mean that performances in tasks of the sort described in this chapter are necessarily affected by the temporal or ordinal aspects of stimuli. To the contrary, emergent effects such as symmetry often displayed by humans in studies of stimulus class formation suggest that those aspects are essentially ignored. However, the failure of some subjects to show these effects (e.g., Sidman et al., 1982, Experiment 3) could mean that for them, these aspects are part of the functional stimuli (cf. McIlvane et al., 2000). Answers to these questions must await future research. “Know the stimulus,” or at least being cognizant of what the functional stimulus might consist of, should provide a helpful guide. In my lab, it has already generated a considerable number of new ideas for developing adequate tests in nonhuman animals for other emergent phenomena associated with stimulus class formation—for example, reflexivity (the untrained ability to match a stimulus to itself) and transitivity (the untrained ability to match A to C after learning to match A to B and B to C). I opened this chapter with quotations I thought were apropos to my subsequent discussion of categorization and stimulus class formation. Perhaps it would be a fitting (and symmetrical—no pun intended) end to this chapter to provide a few other relevant remarks: The ability to form concepts is surely adaptive because concepts allow us to respond appropriately to . . . stimuli after having experience with only a few instances from a given category. (Lazareva & Wasserman, 2008, p. 197) Human language may depend on categorization . . . but categorization does not depend on language. . . . Rather than 381

Peter J. Urcuioli

psycholinguistics, it is the psychology of perception and stimu lus discrimination that impinge most directly on categorization. (Herrnstein, 1984, p. 257) The first quote underscores the behavioral economy humans and other animals achieve by their ability to group together disparate objects sharing some common association. The second reinforces this notion of adaptiveness by highlighting what behavioral psychologists have known for quite some time: Stimulus class formation is not special to humans. I hope that the reader now appreciates this point and that basic reinforcement and stimulus control processes are often sufficient to account for many emergent phenomena. Research on stimulus class formation in nonhuman animals gives researchers another opportunity to see firsthand the mental continuity across species suggested by Charles Darwin (1871/1920) and to appreciate its possible evolutionary origins. This type of work is also indispensible if researchers are to eventually identify the psychological lines of fracture between animals that are capable of language and those that are not and between the variety of the latter (e.g., Thompson & Oden, 2000; Zentall, Wasserman, Lazareva, Thompson, & Ratterman, 2008). Finally, researchers who work with nonhuman animals are apt to label much of the data presented here as evidence of animal cognition. In the introduction to Wasserman and Zentall’s (2006a) extensive (35-chapter) volume on comparative cognition, they stated that “a cognitive process is one that does not merely result from the repetition of a behavior or from the repeated pairing of a stimulus with reinforcement. Cognitive processes often involve emergent (untrained) relations” (pp. 4–5). By these criteria, new behavior and new forms of stimulus control that emerge from other explicitly taught relations, the hallmark of stimulus class formation, also exemplify animal cognition. Although behavior analysts often eschew terms such as cognition, perhaps they should simply regard such terminology as recognition that derivative forms of behavior and stimulus control that frequently occasion these descriptive terms have evolutionary origins that transcend humans’ extraordinary language and inferential capabilities. 382

References Balsam, P. D., & Gallistel, C. R. (2009). Temporal maps and informativeness in associative learning. Trends in Neurosciences, 32, 73–78. doi:10.1016/j.tins.2008.10.004 Bhatt, R. S., Wasserman, E. A., Reynolds, W. F., Jr., & Knauss, K. S. (1988). Conceptual behavior in pigeons: Categorization of both familiar and novel examples from four classes of natural and artificial stimuli. Journal of Experimental Psychology: Animal Behavior Processes, 14, 219–234. doi:10.1037/00977403.14.3.219 Cowley, B. J., Green, G., & Braunling-McMorrow, D. (1992). Using stimulus equivalence procedures to teach name-face matching to adults with brain injuries. Journal of Applied Behavior Analysis, 25, 461–475. doi:10.1901/jaba.1992.25-461 D’Amato, M. R., & Colombo, M. (1988). Representation of serial order in monkeys (Cebus apella). Journal of Experimental Psychology: Animal Behavior Processes, 14, 131–139. doi:10.1037/0097-7403.14.2.131 D’Amato, M. R., Salmon, D. P., Loukas, E., & Tomie, A. (1985). Symmetry and transitivity in the conditional relations in monkeys (Cebus apella) and pigeons (Columba livia). Journal of the Experimental Analysis of Behavior, 44, 35–47. doi:10.1901/jeab.1985.44-35 Darwin, C. (1920). The descent of man and selection in relation to sex (2nd ed.). New York, NY: Appleton. (Original work published 1871) Dougher, M. J., & Markham, M. R. (1994). Stimulus equivalence, functional equivalence, and the transfer of function. In S. C. Hayes, L. J. Hayes, M. Sato, & K. Ono (Eds.), Behavior analysis of language and cognition (pp. 71–90). Reno, NV: Context Press. Dougher, M. J., & Markham, M. R. (1996). Stimulus classes and the untrained acquisition of stimulus functions. In T. R. Zentall & P. M. Smeets (Eds.), Stimulus class formation in humans and animals (pp. 137–152). New York, NY: Elsevier. doi:10.1016/ S0166-4115(06)80107-X Dube, W. V., McIlvane, W. J., Mackay, H. A., & Stoddard, L. T. (1987). Stimulus class membership established via stimulus-reinforcer relations. Journal of the Experimental Analysis of Behavior, 47, 159–175. doi:10.1901/jeab.1987.47-159 Dube, W. V., McIlvane, W. J., Maguire, R. W., Mackay, H. A., & Stoddard, L. T. (1989). Stimulus class formation and stimulus-reinforcer relations. Journal of the Experimental Analysis of Behavior, 51, 65–76. doi:10.1901/jeab.1989.51-65 Dugdale, N., & Lowe, C. F. (1990). Naming and stimulus equivalence. In D. E. Blackman & H. Lejeune (Eds.), Behaviour analysis in theory and practice: Contributions and controversies (pp. 115–138). Hove, England: Erlbaum.

Stimulus Control and Stimulus Class Formation

Dugdale, N., & Lowe, C. F. (2000). Testing for symmetry in the conditional discriminations of language-trained chimpanzees. Journal of the Experimental Analysis of Behavior, 73, 5–22. doi:10.1901/jeab.2000.73-5 Dzhafarov, E. N., & Colonius, H. (2006). Regular minimality: A fundamental law of discrimination. In H. Colonius & E. N. Dzhafarov (Eds.), Measurement and representation of sensations (pp. 1–46). Mahwah, NJ: Erlbaum. Edwards, C. A., Jagielo, J. A., Zentall, T. R., & Hogan, D. E. (1982). Acquired equivalence and distinctiveness in matching to sample by pigeons: Mediation by reinforcer-specific expectancies. Journal of Experi mental Psychology: Animal Behavior Processes, 8, 244–259. doi:10.1037/0097-7403.8.3.244

Information processing in animals: Memory mechanisms (pp. 167–197). Hillsdale, NJ: Erlbaum. Honig, W. K., & Urcuioli, P. J. (1981). The legacy of Guttman and Kalish (1956): 25 years of research on stimulus generalization. Journal of the Experimental Analysis of Behavior, 36, 405–445. doi:10.1901/ jeab.1981.36-405 Horne, P., & Lowe, C. F. (1997). Toward a theory of verbal behavior. Journal of the Experimental Analysis of Behavior, 68, 271–296. doi:10.1901/jeab.1997.68-271 Horne, P. J., Lowe, C. F., & Harris, F. D. A. (2007). Naming and categorization in young children: V. Manual sign training. Journal of the Experimental Analysis of Behavior, 87, 367–381. doi:10.1901/ jeab.2007.52-06

Eikeseth, S., & Smith, T. (1992). The development of functional and equivalence classes in high functioning autistic children. Journal of the Experimental Analysis of Behavior, 58, 123–133. doi:10.1901/ jeab.1992.58-123

Horne, P. J., Lowe, C. F., & Randle, V. R. L. (2004). Naming and categorization in young children: II. Listener behavior training. Journal of the Experimental Analysis of Behavior, 81, 267–288. doi:10.1901/ jeab.2004.81-267

Frank, A. J. (2007). An examination of the temporal and spatial stimulus control in emergent symmetry in pigeons. Unpublished doctoral dissertation, University of Iowa.

Hull, C. L. (1939). The problem of stimulus equivalence in behavior theory. Psychological Review, 46, 9–30. doi:10.1037/h0054032

Frank, A. J., & Wasserman, E. A. (2005). Associative symmetry in the pigeon after successive matching-tosample training. Journal of the Experimental Analysis of Behavior, 84, 147–165. doi:10.1901/jeab.2005.115-04 Garcia, Y. A., & Rehfeldt, R. A. (2008). The effects of common names and common FR responses on the emergence of stimulus equivalence classes. European Journal of Behavior Analysis, 9, 99–120. Goldiamond, I. (1962). Perception. In A. J. Bachrach (Ed.), Experimental foundations of clinical psychology (pp. 280–340). New York, NY: Basic Books. Hall, G., Mitchell, C., Graham, S., & Lavis, Y. (2003). Acquired equivalence and distinctiveness in human discrimination learning: Evidence for associative mediation. Journal of Experimental Psychology: General, 132, 266–276. doi:10.1037/00963445.132.2.266 Herrnstein, R. J. (1984). Objects, categories, and discriminative stimuli. In H. L. Roitblat, T. S. Bever, & H. S. Terrace (Eds.), Animal cognition (pp. 233–261). Hillsdale, NJ: Erlbaum. Hogan, D. E., & Zentall, T. R. (1977). Backward association in the pigeon. American Journal of Psychology, 90, 3–15. doi:10.2307/1421635 Honey, R. C., & Hall, G. (1989). The acquired equivalence and distinctiveness of cues. Journal of Experimental Psychology: Animal Behavior Processes, 15, 338–346. doi:10.1037/0097-7403.15.4.338 Honig, W. K. (1981). Working memory and the temporal map. In N. E. Spear & R. R. Miller (Eds.),

Iversen, I. (1997). Matching-to-sample performance in rats: A case of mistaken identity? Journal of the Experimental Analysis of Behavior, 68, 27–45. doi:10.1901/jeab.1997.68-27 Iversen, I. H., Sidman, M., & Carrigan, P. (1986). Stimulus definition in conditional discriminations. Journal of the Experimental Analysis of Behavior, 45, 297–304. doi:10.1901/jeab.1986.45-297 Jenkins, J. J. (1963). Mediated associations: Paradigms and situations. In C. N. Cofer & B. S. Musgrave (Eds.), Verbal behavior and learning (pp. 210–257). New York, NY: McGraw-Hill. doi:10.1037/11178-006 Jenkins, J. J., & Palermo, D. S. (1964). Mediation processes and the acquisition of linguistic structure. Monographs of the Society for Research in Child Development, 29, 141–191. Joseph, B., Overmier, J. B., & Thompson, T. (1997). Food- and nonfood-related differential outcomes in equivalence learning by adults with Prader-Willi syndrome. American Journal on Mental Retardation, 101, 374–386. Kastak, C. R., Schusterman, R. J., & Kastak, D. (2001). Equivalence classification by California sea lions using class-specific reinforcers. Journal of the Experimental Analysis of Behavior, 76, 131–158. doi:10.1901/jeab.2001.76-131 Konorski, J. (1959). A new method of physiological investigation of recent memory in animals. Bulletin de L’Academie Polonaise des Sciences, 7, 115–117. Kruse, J. M., Overmier, J. B., Konz, W. A., & Rokke, E. (1983). Pavlovian conditioned stimulus effects 383

Peter J. Urcuioli

upon instrumental choice behavior are reinforcer specific. Learning and Motivation, 14, 165–181. doi:10.1016/0023-9690(83)90004-8 Lazareva, O. F., & Wasserman, E. A. (2008). Categories and concepts in animals. In H. H. Byrne (Ed.), Learning and memory: A comprehensive reference (Vol. 1, pp. 197–226). Oxford, England: Elsevier. doi:10.1016/B978-012370509-9.00056-5 Lionello, K. M., & Urcuioli, P. J. (1998). Control by sample location in pigeons’ matching to sample. Journal of the Experimental Analysis of Behavior, 70, 235–251. doi:10.1901/jeab.1998.70-235 Lionello-DeNolf, K. M., & Urcuioli, P. J. (2000). Transfer of pigeons’ matching to sample to novel sample locations. Journal of the Experimental Analysis of Behavior, 73, 141–161. doi:10.1901/jeab.2000.73-141 Lionello-DeNolf, K. M., & Urcuioli, P. J. (2002). Stimulus control topographies and tests of symmetry in pigeons. Journal of the Experimental Analysis of Behavior, 78, 467–495. doi:10.1901/jeab.2002.78-467 Lipkens, R., Kop, P. F. M., & Matthijs, W. (1988). A test of symmetry and transitivity in the conditional discrimination performances of pigeons. Journal of the Experimental Analysis of Behavior, 49, 395–409. doi:10.1901/jeab.1988.49-395 Lowe, C. F., Horne, P. J., Harris, F. D. A., & Randle, V. R. L. (2002). Naming and categorization in young children: Vocal tact training. Journal of the Experimental Analysis of Behavior, 78, 527–549. doi:10.1901/ jeab.2002.78-527 Lowe, C. F., Horne, P. J., & Hughes, J. C. (2005). Naming and categorization in young children: III. Vocal tact training and transfer of function. Journal of the Experimental Analysis of Behavior, 83, 47–65. doi:10.1901/jeab.2005.31-04 Mackintosh, N. J. (2000). Abstraction and discrimination. In C. Heyes & L. Huber (Eds.), The evolution of cognition (pp. 123–141). Cambridge, MA: MIT Press. Maki, P., Overmier, J. B., Delos, S., & Gutmann, A. J. (1995). Expectancies as factors influencing conditional discrimination performance of children. Psychological Record, 45, 45–71. Mandell, C., & Sheen, V. (1994). Equivalence class formation as a function of the pronounceability of the sample stimulus. Behavioural Processes, 32, 29–46. doi:10.1016/0376-6357(94)90025-6 McIlvane, W. J. (1992). Stimulus control analysis and nonverbal instructional methods for people with intellectual disabilities. International Review of Research in Mental Retardation, 18, 55–109. doi:10.1016/S0074-7750(08)60116-0 McIlvane, W. J., Serna, R. W., Dube, W. V., & Stromer, R. (2000). Stimulus control topography coherence and stimulus equivalence: Reconciling test outcomes 384

with theory. In J. Leslie & D. E. Blackman (Eds.), Experimental and applied analysis of human behavior (pp. 85–110). Reno, NV: Context Press. McIntire, K. D., Cleary, J., & Thompson, T. (1987). Conditional relations by monkeys: Reflexivity, symmetry, and transitivity. Journal of the Experimental Analysis of Behavior, 47, 279–285. doi:10.1901/ jeab.1987.47-279 Meehan, E. (1999). Class-consistent differential reinforcement and stimulus class formation in pigeons. Journal of the Experimental Analysis of Behavior, 72, 97–115. doi:10.1901/jeab.1999.72-97 Miguel, C. F., Petursdottir, A. I., Carr, J. E., & Michael, J. (2008). The role of naming in stimulus categorization by preschool children. Journal of the Experimental Analysis of Behavior, 89, 383–405. doi:10.1901/jeab. 2008-89-383 Miller, R. R., & Barnet, R. C. (1993). The role of time in elementary associations. Current Directions in Psychological Science, 2, 106–111. doi:10.1111/14678721.ep10772577 Peters, H. N. (1935). Mediate association. Journal of Experimental Psychology, 18, 20–48. doi:10.1037/ h0057482 Peterson, G. B. (1984). How expectancies guide behavior. In H. L. Roitblat, T. G. Bever, & H. S. Terrace (Eds.), Animal cognition (pp. 135–147). Hillsdale, NJ: Erlbaum. Reese, H. W. (1972). Acquired distinctiveness and equivalence of cues in young children. Journal of Experimental Child Psychology, 13, 171–182. doi:10.1016/00220965(72)90017-3 Richards, R. W. (1988). The question of bidirectional associations in pigeons’ learning of conditional discrimination tasks. Bulletin of the Psychonomic Society, 26, 577–579. Saunders, R. R., & Green, G. (1992). The nonequivalence of behavioral and mathematical equivalence. Journal of the Experimental Analysis of Behavior, 57, 227–241. doi:10.1901/jeab.1992.57-227 Saunders, R. R., & Green, G. (1999). A discrimination analysis of training structure effects on stimulus equivalence outcomes. Journal of the Experimental Analysis of Behavior, 72, 117–137. doi:10.1901/jeab.1999.72-117 Schenk, J. J. (1994). Emergent relations of equivalence generated by outcome-specific consequences in conditional discrimination. Psychological Record, 44, 537–558. Schusterman, R. J., & Kastak, D. (1993). A California sea lion (Zalophus californianus) is capable of forming equivalence relations. Psychological Record, 43, 823–839. Shipley, W. C. (1935). Indirect conditioning. Journal of General Psychology, 12, 337–357. doi:10.1080/00221 309.1935.9920108

Stimulus Control and Stimulus Class Formation

Sidman, M. (1992). Adventitious control by the location of comparison stimuli in conditional discriminations. Journal of the Experimental Analysis of Behavior, 58, 173–182. doi:10.1901/jeab.1992.58-173 Sidman, M. (2000). Equivalence relations and the reinforcement contingency. Journal of the Experimental Analysis of Behavior, 74, 127–146. doi:10.1901/ jeab.2000.74-127 Sidman, M. (2008). Symmetry and equivalence relations in behavior. Cognitive Studies, 15, 322–332. Sidman, M., Kirk, B., & Willson-Morris, M. (1985). Sixmember stimulus classes generated by conditionaldiscrimination procedures. Journal of the Experimental Analysis of Behavior, 43, 21–42. doi:10.1901/ jeab.1985.43-21 Sidman, M., Rauzin, R., Lazar, R., Cunningham, S., Tailby, W., & Carrigan, P. (1982). A search for symmetry in the conditional discrimination of rhesus monkeys, baboons, and children. Journal of the Experimental Analysis of Behavior, 37, 23–44. doi:10.1901/jeab.1982.37-23 Sidman, M., & Tailby, W. (1982). Conditional discrimination vs. matching to sample: An expansion of the testing paradigm. Journal of the Experimental Analysis of Behavior, 37, 5–22. doi:10.1901/jeab.1982.37-5

Tomonaga, M. (1999). Establishing functional classes in a chimpanzee (Pan troglodytes) with a twoitem sequential-responding procedure. Journal of the Experimental Analysis of Behavior, 72, 57–79. doi:10.1901/jeab.1999.72-57 Tulving, E. (1983). Elements of episodic memory. New York, NY: Oxford University Press. Urcuioli, P. J. (1990). Some relationships between outcome expectancies and sample stimuli in pigeons’ delayed matching. Animal Learning and Behavior, 18, 302–314. doi:10.3758/BF03205290 Urcuioli, P. J. (1996). Acquired equivalences and mediated generalization in pigeon’s matching-to-sample. In T. R. Zentall & P. M. Smeets (Eds.), Stimulus class formation in humans and animals (pp. 55–70). Amsterdam, the Netherlands: Elsevier. doi:10.1016/ S0166-4115(06)80103-2 Urcuioli, P. J. (2003). Generalization. In L. Nadel (Ed.-inChief), Encyclopedia of cognitive sciences (Vol. 2, pp. 275–281). London, England: Macmillan. Urcuioli, P. J. (2005). Behavioral and associative effects of differential outcomes in discrimination learning. Learning and Behavior, 33, 1–21. doi:10.3758/ BF03196047

Sidman, M., Wynne, C. K., Maguire, R. W., & Barnes, T. (1989). Functional classes and equivalence relations. Journal of the Experimental Analysis of Behavior, 52, 261–274. doi:10.1901/jeab.1989.52-261

Urcuioli, P. J. (2006a). Responses and acquired equivalence classes. In E. A. Wasserman & T. R. Zentall (Eds.), Comparative cognition: Experimental explorations of animal intelligence (pp. 405–421). New York, NY: Oxford University Press.

Skinner, B. F. (1935). The generic nature of the concepts of stimulus and response. Journal of General Psychology, 12, 40–65. doi:10.1080/00221309.1935. 9920087

Urcuioli, P. J. (2006b). When discrimination fails (or at least falters). Journal of Experimental Psychology: Animal Behavior Processes, 32, 359–370. doi:10.1037/ 0097-7403.32.4.359

Spradlin, J. E., Cotter, V. W., & Baxley, N. (1973). Establishing a conditional discrimination without direct training: A study of transfer with retarded adolescents. American Journal of Mental Deficiency, 77, 556–566.

Urcuioli, P. J. (2007). Sample and comparison location as factors in matching acquisition, transfer, and acquired equivalence. Learning and Behavior, 35, 252–261. doi:10.3758/BF03206431

Spradlin, J. E., & Saunders, R. R. (1986). The development of stimulus classes using match-to-sample procedures: Sample classification vs. comparison classification. Analysis and Intervention in Developmental Disabilities, 6, 41–58. doi:10.1016/0270-4684(86)90005-4 Terrace, H. S. (1986). A nonverbal organism’s knowledge of ordinal position in a serial learning task. Journal of Experimental Psychology: Animal Behavior Processes, 12, 203–214. doi:10.1037/0097-7403.12.3.203 Thompson, R. K. R., & Oden, D. L. (2000). Categorical perception and conceptual judgments by nonhuman primates: The paleological monkey and the analogical ape. Cognitive Science, 24, 363–396. doi:10.1207/ s15516709cog2403_2 Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Monographs, 2(Whole No. 8).

Urcuioli, P. J. (2008a). [Associative symmetry after partially reinforced successive matching training]. Unpublished raw data. Urcuioli, P. J. (2008b). Associative symmetry, “antisymmetry,” and a theory of pigeons’ equivalence-class formation. Journal of the Experimental Analysis of Behavior, 90, 257–282. doi:10.1901/jeab.2008.90-257 Urcuioli, P. J. (2008c). The nature of the response in Simon discriminations by pigeons. Learning and Behavior, 36, 200–209. doi:10.3758/LB.36.3.200 Urcuioli, P. J., & Lionello-DeNolf, K. M. (2001). Some tests of the anticipatory mediated generalization model of acquired sample equivalence in pigeons’ many-to-one matching. Animal Learning and Behavior, 29, 265–280. doi:10.3758/BF03192892 Urcuioli, P. J., & Lionello-DeNolf, K. M. (2005). The role of common reinforced comparison responses in 385

Peter J. Urcuioli

acquired sample equivalence. Behavioural Processes, 69, 207–222. doi:10.1016/j.beproc.2005.02.005 Vaughan, W., Jr. (1988). Formation of equivalence sets in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 14, 36–42. doi:10.1037/00977403.14.1.36 Vickery, T. J., & Jiang, Y. V. (2009). Associative grouping: Perceptual grouping of shapes by association. Attention, Perception, and Psychophysics, 71, 896–909. doi:10.3758/APP.71.4.896 von Fersen, L., & Delius, J. D. (2000). Acquired equivalences between auditory stimuli in dolphins (Tursiops truncates). Animal Cognition, 3, 79–83. doi:10.1007/s100710000063 Wasserman, E. A. (1976). Successive matching-to-sample in the pigeon: Variation on a theme by Konorski. Behavior Research Methods and Instrumentation, 8, 278–282. doi:10.3758/BF03201713 Wasserman, E. A., DeVolder, C. L., & Coppage, D. J. (1992). Nonsimilarity-based conceptualization

386

in pigeons. Psychological Science, 3, 374–379. doi:10.1111/j.1467-9280.1992.tb00050.x Wasserman, E. A., & Zentall, T. R. (2006a). Comparative cognition: A natural science approach to the study of animal intelligence. In E. A. Wasserman & T. R. Zentall (Eds.), Comparative cognition: Experimental explorations of animal intelligence (pp. 3–11). New York, NY: Oxford University Press. Wasserman, E. A., & Zentall, T. R. (2006b). Comparative cognition: Experimental explorations of animal intelligence. New York, NY: Oxford University Press. Zentall, T. R. (2006). Mental time travel in animals: A challenging question. Behavioural Processes, 72, 173–183. doi:10.1016/j.beproc.2006.01.009 Zentall, T. R., Wasserman, E. A., Lazareva, O. F., Thompson, R. K. R., & Rattermann, M. J. (2008). Concept learning in animals. Comparative Cognition and Behavior Reviews, 3, 13–45. doi:10.3819/ ccbr.2008.30002

Chapter 17

Attention and Conditioned Reinforcement Timothy A. Shahan

The ability to appropriately attend to the important features of a complex environment is a critical survival skill. Problems related to the allocation and persistence of attending are associated with various psychological disorders including attention-deficit/ hyperactivity disorder (American Psychiatric Association, 1994), autism and other developmental disabilities (Bryson, Wainwright-Sharp, & Smith, 1990; Dube & McIlvane, 1997; Lovaas, Koegel, & Schreibman, 1979), schizophrenia (Nestor & O’Donnell, 1998), and substance abuse (Ehrman et al., 2002; Johnsen, Laberg, Cox, Vaksdal, & Hugdahl, 1994; Lubman, Peters, Mogg, Bradley, & Deakin, 2000; Townshend & Duka, 2001). Although many factors surely contribute to the allocation and persistence of attending, the relation between patterns of attending and the resultant consequences may play an important role. I begin this chapter with a consideration of attention and its potential relation to existing accounts of behavior maintained by its consequences (i.e., operant, or instrumental, behavior). Next, I review research on how differential consequences affect the allocation and persistence of attending to important environmental stimuli. Finally, I examine the relation between attending to stimuli associated with differential consequences and the traditional concept of conditioned, or secondary, reinforcement— the notion that stimuli associated with reinforcers themselves acquire the ability to reinforce behavior. In considering the concept of conditioned reinforcement, I explore the utility of an alternative framework based on the notion that attending to important stimuli is instrumental in acquiring

reinforcers (i.e., obtaining goals), rather than such stimuli becoming reinforcers themselves. Attention and Behavior

Attention An appropriate starting point for any treatment of attention is a consideration of how to define attention and circumscribe what is to be discussed. Unfortunately, clear technical definitions of attention are difficult to find. Consider that two booklength treatments of the topic (Pashler, 1998; Styles, 1997) and a chapter on attention in Stevens’s Handbook of Experimental Psychology (Luck & Vecera, 2002) provide no technical definition of the term. Instead, all three sources note William James’s (1890) famous suggestion that “everyone knows what attention is.” All three then move on to suggest that, despite James’s claim, it may be more appropriate to say that no one knows what attention is. Instead, they suggest that the term attention almost certainly refers to more than one psychological phenomenon. The phenomena typically considered to fall under the heading of attention involve limitations in the capacity of cognitive functioning and selectivity of perception. Even if attention is not a unitary psychological phenomenon, the rest of James’s (1890) definition is instructive as a sort of summary of the set of phenomena captured under the heading. James suggested, It is the taking possession by the mind, in clear and vivid form, of one out of what

DOI: 10.1037/13937-017 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

387

Timothy A. Shahan

seem several simultaneously possible objects or trains of thought. Focalization, concentration of consciousness are of its essence. It implies withdrawal from some things in order to deal effectively with others. (pp. 403–404) Clearly, defining attention in terms of the even more poorly defined concept of consciousness is less than ideal. In addition, despite James’s (1890) assertion that attention involves the focusing of consciousness, it is becoming increasingly clear that conscious awareness is not necessary for even apparently goal-related shifts in attention (see Dijksterhuis & Aarts, 2010, for review). Nonetheless, James’s (1890) definition does generally capture the properties of capacity limitation and selectivity noted earlier. James’s assertion that people all know what attention is probably stems from the common personal experience that only a small fraction of the stimuli impinging on people’s senses seem to affect them at any given time. James’s definition also suggests that attention appears to be something that organisms do. In this chapter, I explore how attention might be thought of in terms similar to those used to describe other activities in which organisms engage. A large empirical literature (see Yantis, 2000, for review) has suggested that changes in what an organism is attending to can be driven automatically by properties of stimuli themselves (e.g., salience, novelty, abrupt onset, innate biological significance) or by goal-directed control. The phenomenon of goal-directed control of attention goes by many names, including endogenous, top-down, and voluntary control. A simple demonstration of such control is to keep your gaze fixed on a word in the center of this page and then pay attention to something in the periphery to your right and then to your left. Another example is to listen to a piece of music and then shift your attention to different instruments at different times—first the drums, then the bass, and so on. In theoretical treatments of goal-directed control, such changes in the allocation of attention have generally been hypothesized to be under the control of what has been called a central administrator (Kahneman, 1973), a central executive (Baddeley, 1986), 388

or a supervisory attentional system (Norman & Shallice, 1986). As noted by Styles (1997), cognitive psychologists have widely recognized that providing a name for an entity in charge of decision making does not escape the homunculus problem and that the names serve only as temporary placeholders for a problem remaining to be solved. Experimental examinations of goal-directed control of attention almost always use instructions to direct participants how to allocate their attention to particular stimulus features, dimensions, or spatial locations. For example, subjects might be instructed to attend to only the color or shape of an object or to devote 80% of their attention to the color and 20% to the shape (e.g., Bonnel & Prinzmetal, 1998). As another example, subjects in a cued detection task might be asked to keep their gaze fixed at a central location on a computer screen and then to report targets presented to the left or right of the fixation point. Arrows presented at the fixation location and pointing to the left or the right instruct participants where in space to allocate their attention, but on some percentage of the trials the arrow is inaccurate (see Posner, 1980). Detection accuracy or reaction time data from such procedures are consistent with what would be expected if subjects allocate their attention as instructed (e.g., greater accuracy with attended dimension, shorter reaction times with accurate instruction). Instructions about what to attend to might be thought of as a way to change what Norman (1968) called the pertinence of a stimulus. Norman suggested that in addition to the physical aspects of a stimulus (i.e., stimulus activation), the importance of a stimulus to the organism (i.e., pertinence) is involved in the control of attention. The pertinence of a stimulus reflects the current goals of the organism and its history with a stimulus. For example, your spoken name is a stimulus that likely commands attention, not because of the physical properties of the stimulus per se but because of your history with it. Instructing subjects about how to allocate their attention can be seen as a way of creating languagebased goals or quickly providing a relevant history with respect to a stimulus. Although such instructional control of attending is important in demonstrating that attending can be goal directed, it does

Attention and Conditioned Reinforcement

not document what factors contribute to the pertinence of stimuli or the goal-directed control of attending in the natural environment. Nonetheless, the study of learning and behavior (i.e., conditioning and learning) provides a reasonably well-developed framework for understanding how experience changes the importance of stimuli and how consequences affect the allocation and persistence of goaldirected behavior. This framework might also be useful for understanding the control of attention.

Behavior Although some may find it distressing that a wellarticulated definition of attention was not forthcoming in the preceding section, one could argue that providing a definition of behavior is similarly problematic. Consider the treatment of the terms learning and behavior in Catania’s (1998) influential textbook, Learning: This book is about learning, but from the start we have to face the fact that we won’t be able to define it. There are no satisfactory definitions of learning. Still, we can study it. (p. 2) Behavior is no easier to define than learning. We may say glibly that behavior is anything an organism does, but this definition is too global. . . . Let’s not try to resolve this problem. Our aim is to examine some properties of behavior. Although they sometimes share common names, the phenomena of behavior are varied, so we’ll probably do better by considering examples than by attempting definitions. (p. 7) I hope the parallels between my treatment of attention and Catania’s (1998) treatment of learning and behavior are obvious. As with attention, behavior may be difficult to define, but researchers have nevertheless made progress in understanding some basic principles of the phenomena captured under that heading. In the study of learning and behavior, it is customary to distinguish between classical (i.e., Pavlovian, respondent) and operant (i.e., instrumental) conditioning. Classical conditioning involves learning the

relation between two environmental stimuli (see Chapter 13, this volume). For example, if a tone (i.e., conditioned stimulus [CS]) reliably predicts food delivery (i.e., unconditioned stimulus [US]) that already elicits salivation (i.e., unconditioned response [UR]), then the tone itself will come to elicit salivation (i.e., conditioned response [CR]). A large empirical literature since the time of Pavlov has provided a wealth of information about how organisms learn the predictive relations between stimuli and how initially neutral stimuli acquire the ability to affect behavior (see Mackintosh, 1974, and Rescorla, 1988, for reviews). Such stimulus–stimulus learning may be contrasted with operant (i.e., instrumental) conditioning, which involves learning the relation between a response and the consequences of that response. For example, a hungry rat may learn to press a lever that produces food or a child may learn to ask nicely for a toy. In both cases, the behavior is instrumental in the production of the consequence, and the future probability of the behavior is changed as a result. Consequences that increase the probability of the behavior are called reinforcers and those that decrease the probability of the behavior are called punishers. Operant behavior roughly corresponds to what is usually called voluntary or goal-directed behavior. A vast literature on operant conditioning with humans and nonhumans has demonstrated that variations in such behavior are lawfully related to the type and arrangement of consequences obtained (see Mazur, 2006, for review). The fundamental unit of operant behavior is widely believed to be the discriminated operant, embodied by the three-term contingency between (a) the discriminative stimulus context in which the behavior occurs, (b) the behavior itself, and (c) the consequence of the response. Schematically, the three-term contingency is often described as SD: B → SR, where SD is a discriminative stimulus, B is the behavior, and SR is a reinforcer for that behavior. An SD comes to modulate the occurrence of the behavior, or to set the occasion for the behavior, as a result of the consequences experienced for the behavior in its presence. For example, if the lever pressing of a rat produces food when a light is on, but not when it is off, lever pressing will come to occur predominately in the presence of the light. 389

Timothy A. Shahan

Although distinguishing between classical and operant conditioning may be customary, the two types of learning clearly interact in most instances of behavior. For example, not only do organisms learn that an SD signals the contingent relation between the behavior and the reinforcer, they also learn the Pavlovian stimulus–stimulus relation between the SD and the reinforcer (see Rescorla, 1998, for a review). Such stimulus–stimulus learning within the discriminated operant plays an important role in the motivation and persistence of operant behavior (e.g., Nevin & Grace, 2000), a topic I return to later. Another way in which classical and operant conditioning interact is through the process of conditioned (i.e., secondary) reinforcement. Stimuli associated with the delivery of reinforcers for operant behavior are commonly believed to become reinforcers themselves as a result of the Pavlovian relation between the stimulus and the existing reinforcer. For example, a click associated with the delivery of a food reinforcer for a rat may also appear to serve as a reinforcer for the lever pressing that produces it (see Williams, 1994, for review). Although both classical and operant conditioning play a role in the treatment of attention in this chapter, I focus largely on operant conditioning because my goal is to explore how consequences affect the allocation and persistence of attending. Before proceeding to a discussion of how consequences affect attending, I briefly review the matching law and behavioral momentum theory, two influential theories of operant conditioning that play a critical role in the discussion. Operant behavior as choice: The matching law. Much of the contemporary study of operant conditioning focuses on how differential consequences affect the allocation of behavior (i.e., choice). This focus on choice is largely the result of the influence of Herrnstein’s (1961, 1970) matching law. Herrnstein (1961) varied the rate at which two concurrently available operant responses provided food reinforcement for the key pecking of pigeons in a standard operant conditioning preparation. He found that the proportion of responses allocated to one of two options roughly equaled (i.e., matched) the proportion of reinforcers obtained from that 390

option. Herrnstein formalized this relation with the matching law, which states that B1 R1 = , B1 + B2 R1 + R 2

(1)

where B1 and B2 refer to behavior allocated to the two options and R1 and R2 refer to the reinforcers obtained by responding at those options (for a primer on quantitative models of behavior, see Chapter 10, this volume). Thus, when organisms are confronted with a choice situation, they allocate their behavior to the available options in proportion to the reinforcers obtained from those options. Later, Herrnstein (1970) extended the matching law to behavior occurring in situations in which only one response is explicitly defined. Although only a single response may be defined, the organism could also engage in other unmeasured behavior that may provide other unmeasured sources of reinforcement. This formulation has come to be known as Herrnstein’s hyperbola, or the quantitative law of effect. The quantitative law of effect states that B=

kR , R + Re

(2)

where B is the rate of the measured behavior, R is the rate of reinforcement for the measured behavior, the parameter k defines asymptotic response rates when all reinforcement is for the measured response, and the parameter Re represents alternative sources of reinforcement (i.e., reinforcement for all other behavior). Thus, the frequency of an operant behavior is described as being determined by the rate of reinforcement for that behavior in the context of reinforcement for all other behavior. Response rates increase with increases in reinforcement rate for the measured behavior (i.e., R), but the degree to which response rates increase with increases in reinforcement rate is determined by the availability of alternative reinforcers (i.e., Re). Within the framework provided by the quantitative law of effect, all operant behavior is choice and is governed by the relative frequency of reinforcement obtained for the different activities in which an organism might engage. Baum (1974, 1979) extended the matching law to account for common deviations from Equation 1. Although Equation 1 suggests that the distribution

Attention and Conditioned Reinforcement

of behavior to two options strictly matches the distribution of reinforcers, real organisms are often less than perfectly sensitive to the distribution of reinforcers. The generalized matching law suggests that the ratio of behavior allocated to two options is a power function of the ratio of reinforcers obtained at the options. Thus, a

R  B1 = b 1  , B2  R2 

(3)

where B1 and B2 refer to behaviors 1 and 2, respectively, and R1 and R2 refer to reinforcers for B1 and B2. The parameter b represents bias for one option or the other unrelated to variations in relative reinforcement rate. The parameter a represents sensitivity of the allocation of behavior to changes in reinforcement ratios. Values of a less than 1 reflect relative insensitivity of the allocation of behavior to the ratio of reinforcers (i.e., undermatching), whereas values of a greater than 1 reflect hypersensitivity (i.e., overmatching). Equation 3 can also be extended to more than two choice options (see Schneider & Davison, 2005), although the details of such an extension are not of concern here. Although Equation 3 suggests that choice is a function of relative reinforcement rate, the reinforcers available for different options might differ with respect to any number of parameters (e.g., amount, delay to receipt, quality). Accordingly, Baum and Rachlin (1969) proposed the concatenated matching law, which suggests in its generalized matching law version that choice is dependent on the multiplicative effects of different reinforcement parameters, such that a1

a2

a3

a4

 R   A   1 / d1   q1  B1 = b 1   1   , (4) B2  R 2   A 2   1 / d 2   q2  with terms as in Equation 3 and additional terms for relative reinforcement amount (A1 and A2), immediacy (1/d1 and 1/d2), quality (q1 and q2), and their respective sensitivity parameters a2, a3, and a4. The overall impact of a variety of parameters of reinforcement on choice can also be collapsed into a central intervening variable called value (i.e., V) such that a

B1  V1  . = B2  V2 

(5)

Thus, in its most general form, the matching law suggests that the allocation of behavior to the available choice options is governed by the relative value of the reinforcers associated with those options. The matching law has provided a quantitative framework for describing how differential consequences govern the allocation of behavior and has provided insights into how the value of reinforcers and stimuli associated with reinforcers might be calculated. Although the matching law was developed largely with data generated in operant conditioning preparations with nonhumans, it has since been shown to have widespread applicability to broader issues in decision making, especially the apparently irrational, suboptimal, or impulsive decision making of humans (see Herrnstein, 1990; Herrnstein, Rachlin, & Laibson, 1997; and Madden & Bickel, 2010, for reviews). Furthermore, the matching law and its derivatives have provided a useful framework within which to conduct analyses of the neuroscience of decision making and reward valuation (e.g., Loewenstein & Seung, 2006; McClure, Laibson, Loewenstein, & Cohen, 2004; Sugrue, Corrado, & Newsome, 2004). Perhaps the matching law may also provide a useful framework for characterizing how differential consequences affect decisions about the allocation of attention. Persistence of operant behavior: Behavioral momentum theory. Behavioral momentum theory (e.g., Nevin & Grace, 2000) addresses how the conditions of reinforcement under which an operant behavior is trained affect the persistence of that behavior when it is disrupted. The theory suggests that there are two important aspects of operant behavior: response rates under steady-state conditions and resistance to change under disruption conditions. The theory suggests that response rate is governed by the contingent relation between the response and the reinforcer it produces in a manner consistent with Herrnstein’s (1970) single-response version of the matching law (i.e., Equation 2). Resistance to change is governed by the Pavlovian relation between the discriminative stimulus context in which the behavior occurs and the reinforcers obtained in that context. Resistance to change is typically studied by arranging different conditions of reinforcement 391

Timothy A. Shahan

(e.g., reinforcement rates or magnitudes), presented one at a time and each in the presence of a distinctive stimulus (i.e., a multiple schedule of reinforcement). Once steady-state responding is achieved in the presence of both stimuli, a disruptor is introduced (e.g., satiation of the reinforcer, extinction). The resultant decrease in behavior relative to the predisruption baseline level provides the measure of resistance to change. Relatively smaller decreases in responding compared with baseline reflect greater resistance to change. Although simple response rate under steady-state conditions has traditionally been used to measure the strengthening effects of reinforcers, behavioral momentum theory suggests that resistance to change provides a better measure of response strength. The reason is that response rates are well known to be influenced by operations (e.g., pacing contingencies such as differential reinforcement of low rate) that are generally not considered to affect response strength (e.g., Nevin, 1974; Nevin & Grace, 2000). Higher rates of reinforcement in a stimulus context would generally be expected to produce both higher response rates and greater resistance to change—and they do (see Nevin, 1992). However, because response rates and resistance to change are governed separately by the operant response– reinforcer and Pavlovian stimulus–reinforcer relations, respectively, it is possible to simultaneously decrease response rates and yet increase resistance to change. In practice, this has been demonstrated by adding reinforcers to the stimulus context that are not contingent on the operant response. The noncontingent reinforcers decrease baseline response rates by degrading the contingent response– reinforcer relation but increase resistance to change by improving the Pavlovian stimulus–reinforcer relation signaled by the discriminative stimulus context (e.g., Nevin, Tota, Torquato, & Shull, 1990). Quantitatively, behavioral momentum theory suggests that relative resistance to change in the presence of two stimuli is a power function of the relative rate of reinforcement obtained in the presence of those stimuli such that b

R  m1 = b 1  , m2  R2  392

(6)

where m1 and m2 are resistance to change of responding in the presence of stimuli 1 and 2, R1 and R2 refer to the rates of primary reinforcement obtained in the presence of those stimuli, and the parameter b reflects sensitivity of relative resistance to change to variations in relative reinforcement rates (Nevin, 1992). Equation 6 and related quantitative models making up behavioral momentum theory have provided a general framework within which to understand how reinforcement experienced in the presence of stimuli govern the persistence of operant behavior in the presence of that stimulus. The framework provided by the theory for characterizing the persistence of operant behavior has been found to have broad generality across reinforcer types, settings, and species ranging from fish to humans (e.g., Ahearn, Clark, Gardenier, Chung, & Dube, 2003; Cohen, 1996; Grimes & Shull, 2001; Harper, 1999; Igaki & Sakagami, 2004; Mace et al., 1990; Quick & Shahan, 2009; Shahan & Burke, 2004).

Allocation and Persistence of Attention and Operant Behavior: Shared Mechanisms? Operant behavior and at least part of what is called attention appear to be goal-related activities. Although the goal-directed aspects of attention have generally been manipulated with instructions, insights provided by quantitative theories of operant behavior might be useful in understanding how goals in the natural environment help to govern attention. For example, both behavior and attention appear to be limited resources requiring allocation decisions. Thus, the quantitative framework for choice provided by matching theory might also provide a first step toward replacing placeholder concepts, such as a central administrator, with a quantitative theory of how experience translates into the differential allocation of attending. In addition, the framework provided by behavioral momentum theory for understanding the persistence of operant behavior might be useful for understanding the persistence of attending under conditions of disruption. Nonetheless, the suggestion that consequences might similarly affect what is typically called attention and what is typically called behavior would require empirical evidence. Behavior analysts know

Attention and Conditioned Reinforcement

how consequences affect what traditionally falls under the heading of operant behavior, but what about phenomena more likely to be categorized under the heading of attention? Attention and Differential Consequences

Human Attention and Differential Consequences Until recently, there was nearly no experimental analysis of how variations in consequences affect performance of humans in attention tasks. As noted earlier, studies of goal-directed attention have almost always relied on instructions to produce changes in goal-directed attention. Nonetheless, research by Gopher (1992) suggested that the successful allocation of attention appears to be a trainable skill that improves with practice and differential consequences. For example, participants were exposed to a demanding space-based game designed to simulate the complexities of divided attention encountered by fighter pilots. The simulation was complex enough that the participants often initially expressed panic and adopted suboptimal strategies. Participants were instructed to focus their attention on a subset of aspects of the situation and to periodically shift their attention to other aspects. The participants received feedback and point-based rewards for successful performance, thus encountering differential consequences with changes in their allocation of attention. As a result, the participants avoided adopting suboptimal strategies and became proficient at the highly demanding tasks. Such results are consistent with the suggestion that the allocation of attention can be affected by experiencing differential consequences. A more recent set of experiments examining how differential consequences affect attention used a negative priming procedure (Della Libera & Chelazzi, 2006). Negative priming refers to a phenomenon whereby selectively attending to a target stimulus accompanied by an irrelevant distractor during a prime trial results in poorer performance (i.e., increased reaction time) when the distractor stimulus is used as a target shortly thereafter during a probe test. For example, attending to a red shape

and ignoring a concurrently presented green distractor shape in a prime trial leads to an increased reaction time when participants are required to attend to the previously ignored green shape in the next trial (i.e., probe). The typical interpretation of negative priming is that it results from a switching of selective visual attention to the target and away from the distractor during the prime that persists until the probe test (see Pashler, 1998, for a review). Pashler (1998) suggested that negative priming might occur as a result of an error-correction learning mechanism because a distractor item is unhelpful at one time and therefore typically unlikely to be helpful shortly after. If Pashler’s suggestion is correct, it is reasonable to expect that negative priming should be affected by changing the consequences of attending to the target stimulus during the prime. Della Libera and Chelazzi (2006) examined how differential consequences affected negative priming in a visual selective attention task. In one experiment, prime stimuli were global numbers consisting of appropriately arranged local numbers (e.g., multiple instances of smaller 6s at the local level arranged into a larger number 5 at the global level). Before the presentation of a prime stimulus, the letter G or the letter L instructed the subjects to attend to the global or local level, respectively. A correct response was defined as reporting the number at the instructed level (either global or local). Thus, the number at the other level became the to-be-ignored distractor. For example, if the prime was a global 5 made up of local 6s, choosing the number 5 was correct after the instruction G and choosing the number 6 was correct after the instruction L. To manipulate the consequences of attending to the target stimulus, correct responses to primes were followed by either a high (€0.10) or a low (€0.01) payoff, as reported on the computer screen. The probe stimulus was then presented 400 milliseconds later. Subjects were to report the number in the probe stimulus, regardless of whether it occurred at the global level (e.g., a 5 made up of local Xs) or the local level (i.e., an X made up of local 5s). In this procedure, negative priming would be evidenced by longer reaction times when the number in the probe stimulus occurred at the same level of the distractor stimulus in the prime (e.g., subject attended to 393

Timothy A. Shahan

local-level 6s and ignored a global-level 5 in the prime, but in the probe trial, the target was at the global level). Consistent with the hypothesis that consequences can affect the allocation of attention, negative priming occurred only when detecting the target stimulus in the prime produced the larger payoff. In other words, the highly reinforced allocation of attention to the target in the prime produced a greater shift of attention to the target and away from the distractor that was apparent in the subsequent probe test. A second experiment using the same basic paradigm extended the result to formbased stimuli and same–different choice-based responding. Thus, shifts in attention as measured by a common selective attention paradigm appear to be affected by differential consequences. A subsequent set of experiments by Della Libera and Chelazzi (2009) showed that differential consequences in selective visual attention tasks could have relatively long-term effects on how attention is allocated. In both experiments, subjects were exposed to a 3-day training phase in which performance on a selective visual attention task produced different magnitude reinforcers for two sets of stimuli. Specifically, trials started with a red or a green cue that signaled the target in an immediately following display. The display consisted of three different nonsense shapes. One of the shapes was a black comparison and was presented next to two overlapping shapes, one of which was red and the other of which was green. The subjects’ task was to determine whether the shape of the previously cued color (i.e., either red or green) matched the black comparison. Thus, the other colored shape served as a distractor stimulus to be ignored. Correct responses were more likely to lead to a higher payoff (€0.10) for four shapes and a lower payoff (€0.01) for four different shapes. Two of the high-payoff stimuli resulted in the higher payoff when they served as the target and were correctly detected. The other two high-payoff stimuli resulted in the higher payoff when they served as the distractor stimulus and were correctly ignored. The same was true for the low-payoff stimuli—two stimuli resulted in the lower payoff when they were correctly detected as targets and the other two resulted in the lower payoff when they were correctly ignored as distractors. 394

An additional eight neutral shapes were equally as likely to lead to the higher and lower payoffs as targets or distractors. Five days after the training phase with the differential payoff, subjects were exposed to either the same task (Experiment 1) or a different visual search task (Experiment 2), both using the stimuli from the training phase but in the absence of reward. In Della Libera and Chelazzi’s (2009) Experiment 1, stimuli associated with a high payoff when serving as the target in the training phase were more difficult to ignore when they appeared as the distractor during the subsequent test. However, stimuli associated with a low payoff when serving as the target in training were more easily ignored when they appeared as the distractor during the subsequent test. Complementary results were obtained with stimuli that had served as distractors during the training phase. Stimuli that were associated with a high payoff when they were ignored as distractors in training produced less interference when they continued to serve as distractors in the test, but stimuli previously associated with a low payoff when they had been ignored in training continued to interfere with test performance as distractors. In Experiment 2, the testing phase was a visual search task in which a black sample shape was briefly presented and followed by two black comparison shapes. The subjects’ task was to report whether the sample stimulus was present in the comparison display. The results showed that detections in the comparison display were faster if the sample had previously been associated with a higher payoff as a target in the training phase. Together, the results from these two experiments suggested that differential reinforcement for attending to or ignoring stimuli in a selective attention task can have relatively longlasting effects on attending to or ignoring those stimuli later. An experiment by Raymond and O’Brien (2009) examined how differences in payoff for choosing stimuli in a choice task affected subsequent recognition of those stimuli in an attentional blink procedure. Subjects were first given a series of choices between sets of two computer-generated faces. Half of the 24 face stimuli were associated with either a gain (plus 5 pence) or a loss (minus 5 pence) at a

Attention and Conditioned Reinforcement

high probability (.8) or a low probability (.2). The other half of the stimuli were neutral and associated with no consequence. During exposure to this choice condition, subjects learned to choose the stimuli associated with a higher probability of a gain and a lower probability of a loss. Next, the subjects were exposed to a standard attentional blink procedure in the absence of explicit consequences. In the attentional blink procedure, subjects were shown a series of four brief stimuli in succession (85 milliseconds per stimulus). The first stimulus at Time 1 was an oval made up of either circles or squares followed immediately by a masking stimulus. The masking stimulus occupied the same space as the first stimulus and was used to stop processing of the first stimulus and eliminate afterimages. The masking stimulus was followed by either a short (200-millisecond) or long (800-millisecond) delay, during which the screen was blank. At Time 2, immediately after the delay, the face stimuli from the earlier choice task or novel face stimuli were presented and followed immediately by a masking stimulus. Finally, the subjects were asked to report by pressing buttons whether the stimulus at Time 1 had been made up of circles or squares and then whether the face at Time 2 had been familiar or novel. An attentional blink effect in this procedure is evidenced by poorer performance on the face recognition task at Time 2 after a shorter delay between Time 1 and Time 2. The usual interpretation of the attentional blink effect is that it results from the fact that attention is a limited resource that is temporarily taxed by performance at Time 1 and is therefore not available again for some small amount of time thereafter. Thus, long-delay (800-millisecond) and short-delay (200-millisecond) conditions were chosen such that attention would typically be fully available or taxed, respectively. The results showed that regardless of whether attention was fully available or taxed, recognition of faces at Time 2 was better for faces previously producing a gain or a loss than for previously neutral stimuli. Thus, stimuli that needed to be chosen or avoided because of their associated consequences in the choice task produced improved recognition performance at Time 2. Most important, a typical attentional blink effect was obtained when attention was taxed with the

shorter delay, but only for neutral stimuli or stimuli previously associated with a loss. The attentional blink effect was eliminated for stimuli that were previously associated with a gain. Thus, the availability of attention for stimuli in a taxing situation appears to depend on the consequences encountered as a result of previous choices involving those stimuli such that attention was more readily available for stimuli previously associated with a gain. The studies reviewed in this section generally support the conclusion that consequences encountered as a result of attending to particular features of the environment have an impact on the allocation of attention in a manner similar to their impact on simple operant behavior. Thus, existing theoretical accounts of operant behavior may be useful for describing how differential consequences affect the goal-directed control of attention.

Conditioning, Learning, and Attention Unlike the relatively recent focus on the impact of consequences on the allocation of human attention, theories of conditioning and learning have a long history of assuming reinforcement-related changes in attention. For example, theories of discrimination learning have commonly assumed that changes in attending to relevant stimulus dimensions are based on the experience of differential reinforcement in the presence of stimuli (e.g., Blough, 1969; Lovejoy, 1965; Mackintosh, 1975; Mackintosh & Little, 1969; Nevin, Davison, & Shahan, 2005; Sutherland & Mackintosh, 1971; Zeaman & House, 1963). Many such models have also been closely related to the study of Pavlovian conditioning. Phenomena such as blocking (Kamin, 1968), overshadowing (Pavlov, 1927), and the relative validity effect (Wagner, Logan, Haberlandt, & Price, 1968) show that CSs that are more salient or are better predictors of USs reduce conditioning to jointly available stimuli that are less salient or are poorer predictors of a US. The Rescorla–Wagner (1972) model describes these cue competition effects as being the result of competition between stimuli for a limited amount of associative value available from a US. Other theories, however, describe cue competition effects in terms of inferred changes in attending to stimuli with experience rather than changes in US effectiveness. 395

Timothy A. Shahan

Mackintosh’s (1975) attention theory is based on the fact that stimuli that are better predictors of reward attract increasing attention as conditioning proceeds. Likewise, Lubow’s (1989) conditioned attention theory suggests that attention to rewardpredictive stimuli increases in a manner consistent with the principles of classical conditioning. Interestingly, an alternative attention-based model forwarded by Pearce and Hall (1980) predicts many cue competition phenomena with the opposite assumption—that attention to a stimulus decreases when it is a reliable predictor of reinforcement. More recent accounts have allowed for both increases and decreases in attention by proposing multiple attention systems for different types of learning situations (Hogarth, Dickinson, & Duka, 2010). Although an evaluation of the relative merits of these attention-based theories of conditioning is beyond the scope of this chapter, all the theories share the assumptions that attention is critical for learning associations between stimuli and that attention itself depends on previously experienced relations between stimuli and reinforcers. In a general sense, such theories might be thought of as addressing how experience changes what Norman (1968) referred to as the pertinence of stimuli. The approach I explore in this chapter is clearly in the same tradition as these other approaches in asserting that at least part of what is referred to as attention is modified by experience in accordance with existing accounts of conditioning and learning. Specifically, research in my laboratory has been focused on extending existing quantitative accounts of operant conditioning to attending. To do so, my lab has relied on procedures with animals that are aimed at more directly measuring attention. To study attention more directly with animals, Riley and colleagues (Riley & Leith, 1976; Riley & Roitblat, 1978; Zentall & Riley, 2000) suggested using procedures more similar to those used with humans and likely to produce selective processing as a result of capacity limitations (cf. Mackintosh, 1975). One such procedure is a modified delayed matching-to-sample task used by Maki and Leuin (1972) to study divided attention in pigeons. The pigeons were presented with compound samples consisting of a combination of two elements or 396

Figure 17.1. Schematic of a divided-attention procedure for pigeons. B = blue; R = red.

samples made up of just one of the elements (see Figure 17.1). In Maki and Leuin, line orientation (i.e., vertical or horizontal) and key color (i.e., red or blue) were used for the two sets of element stimuli. For single-element sample presentations, either a simple color (i.e., red or blue) or line orientation (i.e., white vertical or horizontal lines on a black background) was presented. For compound sample presentations, both a color and a line orientation were presented simultaneously (e.g., a blue key with superimposed white vertical lines). After both single-element and compound sample stimuli, two comparison stimuli were presented. The two comparison stimuli were always single-element stimuli and were either the two line orientations or the two colors. One of the comparison stimuli had been present in the sample, and choice of that stimulus was followed by a food reinforcer. Choice of the other stimulus was not followed by a reinforcer. Because the type of comparison stimuli to be presented is unpredictable, the pigeons must attend to both elements in the compound sample to perform accurately. Using this procedure, Maki and Leuin found that to maintain accuracy at 80% for compound samples, the sample stimuli needed to be presented for longer durations than for element samples. Subsequently, Maki and Leith (1973) found superior accuracy for element-sample trials than for compound-sample trials when the sample duration was fixed at the value for both types of trials. Many

Attention and Conditioned Reinforcement

related experiments have obtained a similar result (cf. Gottselig, Wasserman, & Young, 2001; Langley & Riley, 1993; Leith & Maki, 1975; Santi, Grossi, & Gibson, 1982; Zentall, Sherburne, & Zhang, 1997). Non–attention-based hypotheses have been proposed to account for the poorer accuracy with compound samples than with element samples (e.g., generalization decrement, coding decrement), but the data generally support the suggestion that this difference results from the division of attention required for compound sample stimuli (see Roberts, 1998, and Zentall & Riley, 2000, for reviews).

The Matching Law and Allocation of Divided Attention To assess the utility of the matching law for describing the effects of variations in relative reinforcement on the allocation of attending, Shahan and Podlesnik (2006) used a divided-attention procedure with pigeons similar to that of Maki and Leith (1973). Figure 17.2 shows a schematic of the procedure. Trials started with a white trial-ready stimulus, a single peck on which produced a compound-sample stimulus with one of two colors (i.e., green or blue) and one of two line orientations (i.e., vertical or horizontal). The sample stimulus was presented for 5 seconds and was terminated response independently. Next, single-element comparison stimuli

that were of either two colors (hereinafter called color trials) or two line orientations (hereinafter called line trials) were presented, and a peck on the comparison stimulus that had appeared in the sample produced a food reinforcer. Across conditions, the probability of reinforcement for correct matches for the two types of comparison stimuli was varied. The overall probability of reinforcement remained at .5, but the ratio of reinforcement probabilities for the color and line dimensions varied across values of 1:9, 1:3, 1:1, 3:1, and 9:1 for the two color and two line dimensions, respectively. The results showed that variations in the ratio of reinforcement for the two dimensions produced orderly changes in accuracy for that dimension. Specifically, accuracy increased for the dimension associated with a higher probability of reinforcement and decreased for the dimension associated with a lower probability of reinforcement. Shahan and Podlesnik (2006) applied the matching law to changes in accuracy with changes in reinforcement allocation using log d as the measure of accuracy. Log d is a common measure in behavioral approaches to signal detection and conditional discrimination performance (e.g., Davison & Nevin, 1999; Davison & Tustin, 1978). Log d is particularly useful because it is bias free and ranges from 0 at chance performance to infinity at perfect performance. These metric properties make it similar to response-rate measures typically used to characterize simple operant responding with the matching law. Log d was calculated separately for color and line trials such that  S1   S2   log d = 0.5 log  corr   corr   ,  S1incorr   S2incorr  

(7)

where S1corr and S1incorr refer to correct and incorrect responses after presentation of one sample (e.g., blue) and S2corr and S2incorr refer to correct and incorrect responses after the other sample (e.g., green). To characterize changes in relative accuracy with changes in relative reinforcement, Shahan and Podlesnik used this version of the generalized matching law: Figure 17.2. Schematic of the divided-attention procedure for pigeons used by Shahan and Podlesnik (2006). G = green; B = blue.

R  log dC − log d L = a log  C  + log b,  RL 

(8)

397

Timothy A. Shahan

where log dC and log dL refer to log d for color and line trials, respectively, and RC and RL refer to reinforcers obtained on color and line trials, respectively. The parameters a and log b refer to sensitivity of relative accuracy to relative reinforcement and bias in accuracy for one trial type unrelated to relative reinforcement, respectively. Figure 17.3 shows the fits of Equation 8 to the mean data across pigeons. Equation 8 did a good job describing the effects of changes in relative reinforcement on relative accuracy for the two types of comparison trials. With a (i.e., sensitivity to relative reinforcement) equal to 0.57 and log b (i.e., bias) equal to 0.26, Equation 8 accounted for 96% of the variance in the averaged data. Assuming that the changes in relative accuracy with the two types of comparison trials reflect differences in attending to the elements of the compound stimuli, the fact that sensitivity (i.e., the a parameter) was greater than zero suggests that the allocation of attention to the elements was sensitive to the relative reinforcement associated with those elements. The finding that log b was greater than zero reflects the fact that overall accuracy also tended to be greater for colors than for lines across the range of relative reinforcement rates. An alternative interpretation of the changes in accuracy in the Shahan and Podlesnik (2006) experiment is that variations in relative reinforcement for

the two types of comparison stimuli affected behavior only at the choice point without changing attending to the elements of the compound samples. Even if the subjects had attended equally to the elements of the compound samples, variations in relative reinforcement at the comparison stimuli might have resulted in changes in motivation to choose the correct stimulus at the comparison choice point. To assess this alternative account, Shahan and Podlesnik (2007) examined whether changes in the duration of the compound samples altered the effects of differential reinforcement on performance in the same divided-attention task. They compared performance with sample durations of 2.25 seconds and 0.75 seconds and examined sensitivity of both accuracy and choice-response speeds (i.e., l/latency) at the comparison choice point. If differential reinforcement affected only motivation at the choice point, sensitivity of accuracy to changes in allocation of reinforcement should not depend on the sample duration. Figure 17.4 shows mean sensitivity values for both accuracy and choice-response speeds for the short and long sample durations. Consistent with the findings of Shahan and Podlesnik (2006), accuracy on the two types of comparison trials was sensitive to variations in relative reinforcement. In addition, sensitivity of accuracy was greater for the longer sample duration. However, choice-response speeds were only weakly sensitive to changes in relative reinforcement and did not differ for the short and long sample durations. Although overall

Figure 17.3. Generalized matching law analysis of the effects of differential reinforcement on pigeons’ accuracy in Shahan and Podlesnik’s (2006) divided-attention experiment. The fitted line and parameters values were produced by least-squares regression of Equation 8.

Figure 17.4. Sensitivity (i.e., a in Equation 8) of accuracy and choice-response speeds to differential reinforcement when the sample was either short or long in the Shahan and Podlesnik (2007) experiment on divided attention of pigeons.

398

Attention and Conditioned Reinforcement

accuracy was lower with the short samples, Shahan and Podlesnik (2007) showed that the lower sensitivity of accuracy with the shorter sample was not likely an artifact of the lower accuracies because it appeared to result from the pigeons attending too much to the element associated with the lower probability of reinforcement. These findings led Shahan and Podlesnik to conclude that changes in motivation at the comparison choice point were likely not responsible for the effects of relative reinforcement on relative accuracy. Further support for the suggestion that differential reinforcement affects attending to the elements of compound sample stimuli in the experiments of Shahan and Podlesnik (2006, 2007) came from Shahan and Quick (2009). Specifically, using an adjusting sample duration procedure similar to that of Maki and Leuin (1972), we found that sample stimuli associated with a lower reinforcement rate require a longer sample to maintain a constant accuracy. This outcome was consistent with the suggestion that effects of differential reinforcement in the divided-attention task are mediated through changes in attending to the sample stimuli. These findings suggest that the allocation of attention of both humans and pigeons is affected by differential consequences in a manner consistent with simple principles of operant conditioning. The choice-based account provided by the matching law appears to provide a useful framework within which to understand how differential consequences affect the allocation of attention. For example, Shahan and Podlesnik (2006, 2007) suggested that the sensitivity and bias parameters of Equation 8 could be useful for providing quantitative descriptions of goal-driven and stimulus-driven aspects of attentional control, respectively. They suggested that sensitivity (i.e., a) might be used to capture the effects of variations in relative reinforcement on stimulus pertinence or goal-directed control of attention and that bias in accuracy independent of reinforcement variations (i.e., log b) might reflect the effects of sensory features of the stimuli (i.e., sensory activation; Norman, 1968). Thus, the matching law appears useful as a first approximation to replace concepts such as a central administrator with a quantitative theory of how differential consequences translate

into differential attending. Nonetheless, it is important to note that the applicability of the matching law to attending has only been directly assessed with pigeons. Ultimately, assessing the utility of the matching law for describing goal-directed attention will require direct examinations with humans in standard preparations such as those reviewed earlier. Regardless, the findings reviewed in the preceding section suggest that considering both behavior and attention as limited resources requiring similar allocation decisions based on differential consequences may indeed be useful.

Behavioral Momentum and the Persistence of Attending Although I am unaware of any experiments with humans or nonhumans examining how differential reinforcement affects the persistence of attending using procedures such as those described earlier, my laboratory has examined the issue using observing responses of pigeons. Observing responses produce sensory contact with stimuli to be discriminated and have been used as an overt analog of attention for more than 50 years (e.g., Wyckoff, 1952). In the typical observing-response procedure, unsignaled periods of primary reinforcement availability for some response alternate unpredictably with periods in which the response does not produce reinforcement (i.e., extinction). A second response (i.e., the observing response) produces brief access to stimuli signaling whether reinforcement is available. A stimulus signaling the availability of reinforcement is referred to as an S+, and a stimulus signaling extinction is referred to as an S−. Figure 17.5 shows an example of an observing-response procedure for pigeons. Responses on the white key on the right produce primary reinforcement (i.e., food) on a variable-interval (VI) schedule of reinforcement. (A VI schedule arranges reinforcement for the first response after a variable amount of time around a mean value.) The periods of VI food reinforcement on the key alternate unpredictably with unsignaled periods in which reinforcement is not available for pecking the key (i.e., extinction is in effect). Pecks on the key on the left (i.e., the observing response) produce 15-second periods of stimuli correlated with the conditions of reinforcement on the food 399

Timothy A. Shahan

Figure 17.5. Schematic of an observingresponse procedure for pigeons. W = white; FR = fixed ratio; Ext = extinction; G = green; R = red; VI = variable interval.

key. If the VI schedule of food reinforcement is available on the food key, both keys are lighted green for 15 seconds (i.e., S+). If extinction is in effect on the food key, both keys are lighted red for 15 seconds (i.e., S−). Pigeons readily learn to peck the observing key and produce the stimuli differentially associated with the periods of primary reinforcement and extinction. In addition, they learn to discriminate S+ and S− and come to respond on the food key predominately during S+. Responses on the observing key are used as the measure of looking at or attending to the stimuli to be discriminated (see Dinsmoor, 1985, for discussion). Consistent with the research described earlier using the divided-attention procedure, research in my laboratory has shown that the matching law is also applicable to attending as measured with an observing response. For example, when the rate of primary reinforcement associated with a stimulus increases, rats’ observing of that stimulus increases in a manner well described by the single-response version of the matching law (i.e., Equation 2; see Shahan & Podlesnik, 2008b). Furthermore, Shahan, Podlesnik, and Jimenez-Gomez (2006) found that when pigeons are given a choice between two observing keys and the relative rate of S+ presentation is varied between the keys, pigeons allocate their observing in accordance with the generalized matching law (i.e., Equation 3). This result is consistent with the results obtained in the delayed matching-to-sample procedure reported earlier 400

(i.e., Shahan & Podlesnik, 2006, 2007). Thus, in terms of evaluating the applicability of quantitative accounts of operant behavior to attending, the observing-response procedure appears to yield conclusions similar to the procedures discussed earlier. To examine how differential reinforcement affects the persistence of attending, researchers have used methods such as those typically used to study the resistance to change of simple operant behavior. For example, Shahan, Magee, and Dobberstein (2003) arranged a multiple schedule in which observing responses of pigeons produced different stimuli in the two components of the multiple schedule. In a rich component, observing responses produced an S+ associated with a higher rate of primary reinforcement. In a lean component, observing responses produced an S+ associated with a lower rate of primary reinforcement. Consistent with the matching law data presented earlier, Shahan et al. found that observing occurred at a higher rate in the rich than in the lean component during a predisruption baseline. When performance was disrupted by satiating the pigeons with presession or intercomponent food, observing was more resistant to change in the rich component than in the lean component. Thus, using observing responses as a measure of attention, it appears that stimuli associated with a higher rate of reinforcement command more persistent attending. Furthermore, quantitative analyses revealed that resistance to change of observing and simple operant behavior maintained directly by the food reinforcer were similarly sensitive to the reinforcement ratios arranged (i.e., b in Equation 6). Thus, extending behavioral momentum theory from simple operant behavior to attending appears to be relatively straightforward. Partly on the basis of the observing data, Nevin et al. (2005) and Nevin, Davison, Odum, and Shahan (2007) suggested that the effects of reinforcement on attending are governed by behavioral momentum theory. Specifically, the theory is based on a version of behavioral momentum theory that predicts rates of an operant response during baseline and disruption conditions such that B = k exp

(r

s

−x

/ ra )

b

,

(9)

Attention and Conditioned Reinforcement

where B is response rate, rs is the rate of reinforcement in the presence of the stimulus in which the behavior is occurring, and ra is the overall rate of background reinforcement in the entire session. The parameters k, x, and b correspond to asymptotic response rate, background disruption that reduces responding, and sensitivity to relative reinforcement, respectively. Under baseline conditions, Equation 9 predicts functions relating response rates and reinforcement rates that are nearly indistinguishable from Herrnstein’s (1970) single-option version of the matching law (i.e., Equation 2; see Nevin et al., 2005). When additional terms are included in the numerator for specific disruptors, Equation 9 also captures the finding that responding is more resistant to disruption in contexts associated with higher rates of reinforcement. On the basis of Shahan et al.’s (2003) finding that the persistence of observing (i.e., attending) appears to be affected by reinforcement in a manner similar to simple response rates, Nevin et al. (2005) suggested the following equation: p ( A s ) = exp

(r

s

−x

/ ra )

b

,

(10)

where p(As) is the probability of attending to a sample stimulus and rs is now the rate of reinforcement associated with attending to the stimulus and ra remains the overall rate of background reinforcement in the session. The other parameters are as in Equation 9. The scalar k is not required because the asymptote of the probability of attending is at 1.0. In effect, Equation 10 assumes that the probability and persistence of attending is governed by reinforcement in the same way as is simple operant behavior. Nevin et al. (2005, 2007) have shown that inferences about changes in attending based on Equation 10 can be used in a larger model addressing the effects of reinforcement on discrimination performance in the steady state and during disruption. The use of Equation 10 in this broader model of stimulus control has provided an account of a wide variety of reinforcement-related effects in the stimulus control literature, including the effects of reinforcement on the persistence of discriminative performance when disrupted (e.g., Nevin, Milo, Odum, & Shahan, 2003). These initial successes

suggest that behavioral momentum theory may indeed be useful for understanding how differential reinforcement contributes to the allocation and persistence of attending. Nonetheless, the research effort directed at examining how reinforcement affects the persistence of attending has just begun. Data from a variety of typical paradigms for studying attention will be required to more fully assess the utility of the approach. Attention, Conditioned Reinforcement, and Signposts The research reviewed in the preceding sections suggests that differential reinforcement affects both the allocation and the persistence of attending to stimuli. One interpretation of how differential reinforcement affects attending is that stimuli associated with reinforcers become conditioned reinforcers as a result of Pavlovian conditioning (Mackintosh, 1974; Williams, 1994). From this perspective, the effects of differential reinforcement on attending result from the strengthening of a connection between an initially neutral stimulus and an existing reinforcer. As a result, initially neutral stimuli may come to reinforce behavior or attending that produces contact with them. Thus, quantitative models of simple operant behavior may describe the effects of differential reinforcement on the allocation and persistence of attending because the stimuli being attended to are themselves reinforcers. Although such an account works fairly well when considered only within the context of the effects of differential reinforcement on the allocation of attending, the account encounters some difficulties within the context of the persistence of attending. I consider these issues in turn in the following sections.

Conditioned Reinforcement and the Allocation of Attention To understand how conditioned reinforcement might provide an account of the effects of differential reinforcement on the allocation of attending, consider Shahan and Podlesnik’s (2006) dividedattention experiment. In that experiment, each of the two elements making up the compound sample stimuli in the delayed matching-to-sample procedure 401

Timothy A. Shahan

was differentially associated with reinforcement. When the color dimension was associated with a higher probability of reinforcement than the line dimension, the color appeared to maintain more attending than the line. An account based on conditioned reinforcement would suggest that the color served as a better conditioned reinforcer because it was a better predictor of reinforcement than was the line. As a result of being a more potent conditioned reinforcer, the color would be expected to more effectively strengthen attending to it. The resultant data relating variations in relative reinforcement to attending might be expected to conform to the matching law because a variety of quantitative extensions of the generalized matching law (i.e., Equation 3) to choice between conditioned reinforcers predict just this type of effect (see Mazur, 2001, for review). The specific details of those models are not important for the present purposes, but many of the models take the general form a

r  V  B1 = b 1   1  , B2  r2   V2 

(11)

where B1 and B2 refer to behavior that produces contact with two conditioned reinforcers, r1 and r2 refer to frequencies of production of the conditioned reinforcers, and V1 and V2 are summary terms describing the value or strengthening effects of the conditioned reinforcers. The models differ in how value is calculated, but most of the approaches to calculating value share important characteristics with existing theoretical accounts of Pavlovian conditioning (see Chapters 13 and 14, this volume). Assuming that such models can be extended to attention, they predict relative changes in attending to stimuli with changes in relative value such as those obtained in Shahan and Podlesnik (2006, 2007) or even in the human attention experiments reviewed earlier. Furthermore, the models accurately predict changes in the allocation of attending when the relative frequency of reinforcementassociated stimuli is varied as in the concurrent observing-response experiment of Shahan, Podlesnik, and Jimenez-Gomez (2006). Thus, such models of conditioned reinforcement may hold promise as a means to understand how stimuli differentially associated with primary reinforcers come to govern 402

the allocation of attention. Research evaluating the different models in simple choice situations has documented several important phenomena contributing to how conditioned reinforcers govern choice (see Grace, 1994, and Mazur, 2001, for reviews). A potentially fruitful research endeavor might be to examine whether similar effects occur in tasks designed specifically to study the effects of differential reinforcement on attention. Regardless of the ultimate utility of the models for describing how reinforcement affects the allocation of attention, an interpretation of the effects of reinforcement on the persistence of attending based on conditioned reinforcement encounters some difficulties, as described in the next section.

Conditioned Reinforcement and the Persistence of Attending As I have noted, the only research I know of examining the effects of differential reinforcement on the persistence of attending is Shahan et al.’s (2003) work with observing responses. Shahan et al. showed that stimuli associated with higher rates of primary reinforcement maintain more persistent attending as measured with observing responses. This outcome could suggest that the stimulus associated with the higher rate of primary reinforcement was a more effective conditioned reinforcer and, hence, maintained more persistent attending. In fact, observing responses are often considered one of the best ways to study the conditioned reinforcing effects of reinforcement-associated stimuli (e.g., Dinsmoor, 1983; Fantino, 1977; Williams, 1994). Regardless, it is important to note that Shahan et al. examined only the effects of differences in primary reinforcement on the persistence of observing. If stimuli associated with differential reinforcement do affect the allocation and persistence of attention because they have themselves become conditioned reinforcers, variations in the rate or value of such stimuli would themselves be expected to contribute to the persistence of attending in the same way as primary reinforcers. However, experiments from my laboratory have shown that putative conditioned reinforcers do not contribute to the persistence of attending in the same way as primary reinforcers.

Attention and Conditioned Reinforcement

Shahan and Podlesnik (2005) showed that an observing response that produced more frequent presentations of a food-associated stimulus (i.e., S+) occurred more frequently than an observing response that produced less frequent S+ presentations. More important, the variation in frequency of S+ presentation occurred in the absence of differences in frequency of primary reinforcement. Thus, this result is consistent with the earlier choice results and with what would be expected if S+ presentations strengthen responding in the same fashion as primary reinforcers. Despite the effect of S+ frequency on the rate of observing, however, differences in S+ frequency had no impact on the persistence of observing under conditions of disruption. This result is not what would be expected if S+ presentations functioned as conditioned reinforcers. Additional experiments by Shahan and Podlesnik (2008a) examined the effects of variations in the value of an S+ on observing rates and resistance to change. The value of an S+ was varied by changing the predictive relation between S + and the delivery of primary reinforcement in variety of ways. Although the details of the manipulations are not important for present purposes, some of the methods pitted S+ value against the rate of primary reinforcement, and some varied S+ value while holding constant rates of primary reinforcement. The results showed that as with higher frequencies of S+ presentation in Shahan and Podlesnik (2005), higher valued S+ presentations maintained higher rates of observing. However, differences in the value of S+ presentations had no impact on the persistence of observing under conditions of disruption. If higher valued S+ presentations maintained higher observing rates because they served as more potent conditioned reinforcers, then higher valued S+ presentations should also have produced greater resistance to disruption, but they did not. Shahan and Podlesnik (2008b) provided a summary quantitative analysis of all these observing and resistance-to-disruption experiments. Although rates of observing were clearly affected by the frequency and value of S+ presentations, resistance to disruption of observing was only an orderly function of the rate of primary reinforcement in the context in which observing was occurring. Resistance to

disruption of observing and frequency or value of S+ presentations had no meaningful relationship. Given that both frequency and value of S+ presentations would be expected to affect resistance to disruption if S+ presentations were serving as conditioned reinforcers, Shahan and Podlesnik concluded that S+ presentations might affect the frequency of observing through some other mechanism. Thus, at present, it appears that the effects of differential reinforcement on the persistence of attending might not be mediated through the process of conditioned reinforcement. In short, stimuli that might be thought to command attention because they have become conditioned reinforcers do not appear to have an impact on the persistence of attending as do other reinforcers. Although the effects of differential reinforcement on the allocation of attending discussed in the preceding section might still be mediated through the process of conditioned reinforcement, it would not be parsimonious to assert different mechanisms for the allocation and persistence of attention. As I show, the debate surrounding the utility of the concept of conditioned reinforcement is considerable. As a result, the generalized matching law–based theories of conditioned reinforcement might also be interpreted in other terms.

Signposts as an Alternative to Conditioned Reinforcement A long history of skepticism surrounds the concept of conditioned reinforcement. Data traditionally interpreted as reflecting the acquired strengthening effects of stimuli associated with reinforcers may be interpreted in a variety of other ways. Although a review of all these alternatives and the supporting data is beyond the scope of this chapter, Shahan (2010) provided a more thorough treatment of the subject. Many commentators have noted that the effects of stimuli associated with reinforcers on response rates and choice might be more consistent with a signaling or guidance process than with an acquired reinforcement-strengthening process (e.g., Bolles, 1975; Davison & Baum, 2006; Longstreth, 1971; Rachlin, 1976; Staddon, 1983). Although a variety of names have been used for reinforcementassociated stimuli in such an account, I have 403

Timothy A. Shahan

adopted signposts or means to an end as suggested by Wolfe (1936), Longstreth (1971), and Bolles (1975). According to a signpost account, stimuli associated with reinforcers guide behavior because they predict where reinforcers are located in space, time, or both. Thus, an observing response that produces contact with a stimulus associated with a primary reinforcer serves to guide the organism to the reinforcer by providing feedback about how or where the reinforcer is obtained. Behavior that produces signposts occurs because the signpost is useful for getting to the primary reinforcer, not because the signpost has acquired the capacity to strengthen behavior in a reinforcement-like manner. In this sense, signposts might also be thought of as a means to an end for acquiring reinforcers. A set of two experiments by Bolles (1961) is useful for contrasting a signpost-based account and a conditioned reinforcement account. In the first experiment, rats responded on two levers that were available at the same time and periodically produced a food pellet accompanied by a click of the pellet dispenser. Receipt of a pellet plus click on one of the levers was predictive of future pellets for pressing that lever for a short time. During an extinction test, the two levers were present, but only one of them produced the click. Consistent with what would be expected if the click had acquired the capacity to function as a reinforcer, the rats preferred to press the lever that produced the click than the lever that did not. Nonetheless, in the second experiment, receipt of a pellet plus click on one lever predicted that subsequent pellets were more likely on the other lever for a short period. Unlike the first experiment, during the extinction test the rats shifted their preference away from the lever that produced the click. If the click was a conditioned reinforcer, one would expect it to further strengthen the response that produced it during extinction. Bolles concluded that previous demonstrations of apparent acquired conditionedreinforcing effects of stimuli might result not from a reinforcement process, but because such stimuli are signposts for how subsequent food is to be obtained. Davison and Baum (2006) reached a similar conclusion on the basis of a related but more extensive set of experiments with pigeons. 404

A traditional account of results such as those of Bolles (1961) and Davison and Baum (2006) would suggest that stimuli associated with the delivery of a primary reinforcer might have multiple functions. Specifically, in the Bolles experiments, one might interpret the effect of the click in terms of both its strengthening effects and its discriminative stimulus effects. As I noted, the occurrence of operant behavior is known to be modulated by the presence of a discriminative stimulus in the presence of which it has been reinforced. In the Bolles experiment, the click associated with food delivery might be interpreted to function as a discriminative stimulus for pressing the appropriate lever (i.e., the same lever in the first experiment and the other lever in the second experiment). A similar account might be applied to the related findings of Davison and Baum: When a food-paired stimulus signals that engaging in a response is likely to be reinforced, the stimulus sets the occasion for repeating the response, even if that response is not the one that produced the stimulus. Thus, any conditioned reinforcement–based strengthening effects of the food-paired stimulus would be interpreted as being overridden by such discriminative stimulus effects of the stimulus. Although this account seems plausible enough on its surface, it is difficult to find any compelling data in the literature demonstrating strengthening effects of foodassociated stimuli above and beyond such signaling effects (see Bolles, 1975; Davison & Baum, 2006; Longstreth, 1971; Rachlin, 1976; Shahan, 2010; and Staddon, 1983, for reviews). In short, it does not appear that both a signaling and a strengthening function are required to account for effects typically ascribed to conditioned reinforcement; a discriminative or signaling effect appears to be enough. The suggestion that stimuli associated with reinforcers might serve as signals for behavior rather than strengthen it is also consistent with some contemporary treatments of associative learning. For example, Gallistel and Gibbon (2002; see also Balsam & Gallistel, 2009) suggested that conditioning occurs not because of a reinforcement-driven backpropagation (i.e., strengthening) mechanism but because organisms learn that some events in the environment predict other events. Gallistel and Gibbon suggested that organisms learn temporal,

Attention and Conditioned Reinforcement

spatial, and predictive regularities in the environment in the absence of any reinforcement-like backpropagation process. This suggestion is based on a wealth of data showing that performance in both Pavlovian and operant conditioning preparations is often strikingly different from what would be expected if learning resulted from a strengthening process. As with Gallistel and Gibbon, but based largely on data separate from operant choice experiments, Davison and Baum (2006) have also suggested that what are traditionally referred to as reinforcers of operant behavior have their effects by predicting additional such events and, thus, guiding rather than strengthening behavior. Although a full discussion of such a non– reinforcement-based approach to conditioning and learning is not appropriate here (but see Shahan, 2010), its most important aspect is that it provides mechanisms by which initially neutral stimuli might come to guide behavior as signposts without acquiring conditioned-reinforcing effects. Because this approach also asserts that the effect of primary reinforcers is to guide rather than strengthen behavior, referring to initially neutral stimuli that come to guide behavior as discriminative stimuli would not be appropriate given the traditional meaning of that term. For this reason, I have adopted the term signposts to describe a more general guidance and means-to-an-end effect as an alternative to conditioned reinforcement. Terminological issues aside, even if one rejects the assertion that primary reinforcers do not strengthen behavior, an account of conditioned reinforcement in terms of a signposts or means-toan-end account is useful for understanding the effects of differential reinforcement on the allocation and persistence of attending. From the perspective of a signpost-based account, differential reinforcement might have its impact on the allocation and persistence of attending because attending to stimuli that are more predictive of reinforcers is instrumental in guiding the organism to those reinforcers. From this perspective, interpreting the data from the Shahan and Podlesnik (2005, 2008a) experiments on the persistence of observing responses becomes easier. Although more frequent production of a signpost or production of a more informative signpost might be

expected to maintain more attending to it, such signposts would not be expected to contribute to the strength of attending (as would be predicted by a back-propagation response-strengthening conditioned reinforcement process). In terms of the allocation of attention, discarding the notion that stimuli associated with existing reinforcers themselves become reinforcers has little impact on the utility of the generalized matching law–based theories of conditioned reinforcement. Such theories provide a quantitative description of how choice between stimuli follows changes in the predictive relations between those stimuli and reinforcers. The notion of value can be interpreted to reflect the utility of the stimuli in terms of their efficacy in guiding organisms to reinforcers. The concept of value need not carry any connotation of an acquired reinforcement-like strengthening effect for the models to be useful in describing how such stimuli govern the allocation of behavior and attention. Interpreting the models in such a way is consistent with more general maximization-based economic or regulatory approaches to instrumental action within which the models might be included (e.g., Rachlin, Battalio, Kagel, & Green, 1981; Rachlin, Green, Kagel, & Battalio, 1976). Summary and Conclusions Both operant behavior and at least part of what is usually referred to as attention can be characterized as goal-directed activities. Although researchers have only recently started examining the effects of differential consequences on attention, such consequences appear to affect attention in much the same way as they affect simple operant behavior. At this early stage of the investigation, quantitative theories describing how consequences affect the allocation and persistence of operant behavior appear readily applicable to attention. Such theories might provide a starting point for replacing placeholder concepts such as a central administrator responsible for the goal-directed control of attention with a quantitative framework for translating experience into differential attending. Finally, one interpretation of the effects of differential reinforcement on attending to stimuli is that stimuli predictive of existing 405

Timothy A. Shahan

reinforcers acquire the capacity to reinforce activity that produces contact with them (i.e., they become conditioned reinforcers). However, an alternative approach suggests that such stimuli serve as signposts that guide behavior to reinforcers and serve as a means to an end for acquiring reinforcers. Thus, differentially attending to stimuli predictive of reinforcers might be considered instrumental in acquiring reinforcers rather than such stimuli strengthening attending through a back-propagation mechanism. Regardless of the ultimate mechanism responsible for the effects of differential reinforcement on attention, the fact remains that both operant behavior and attention are affected by consequences in a manner that is usefully described by existing quantitative models. Because the parameters known to affect behavior in these models are easily manipulated, these theories may lead to practical applications designed to affect attention and other important behavior (see Volume 2, Chapters 5 and 7, this handbook).

References Ahearn, W. H., Clark, K. M., Gardenier, N. C., Chung, B. I., & Dube, W. V. (2003). Persistence of stereotyped behavior: Examining the effects of external reinforcers. Journal of Applied Behavior Analysis, 36, 439–448. doi:10.1901/jaba.2003.36-439 American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author.

Bolles, R. C. (1961). Is the “click” a token reward? Psychological Record, 11, 163–168. Bolles, R. C. (1975). Theory of motivation (2nd ed.). New York, NY: Harper & Row. Bonnel, A.-M., & Prinzmetal, W. (1998). Dividing attention between the color and the shape of objects. Perception and Psychophysics, 60, 113–124. doi:10.3758/BF03211922 Bryson, S. E., Wainwright-Sharp, J. A., & Smith, L. M. (1990). Autism: A developmental spatial neglect syndrome? In J. Enns (Ed.), The development of attention: Research and theory (pp. 405–427). Amsterdam, the Netherlands: Elsevier. doi:10.1016/S01664115(08)60468-9 Catania, A. C. (1998). Learning (4th ed.). Upper Saddle River, NJ: Prentice-Hall. Cohen, S. L. (1996). Behavioral momentum of typing behavior in college students. Journal of Behavior Analysis and Therapy, 1, 36–51. Davison, M., & Baum, W. M. (2006). Do conditional reinforcers count? Journal of the Experimental Analysis of Behavior, 86, 269–283. doi:10.1901/ jeab.2006.56-05 Davison, M., & Nevin, J. A. (1999). Stimuli, reinforcers, and behavior: An integration. Journal of the Experimental Analysis of Behavior, 71, 439–482. doi:10.1901/jeab.1999.71-439 Davison, M., & Tustin, R. D. (1978). The relation between the generalized matching law and signal-detection theory. Journal of the Experimental Analysis of Behavior, 29, 331–336. doi:10.1901/jeab.1978.29-331

Baddeley, A. D. (1986). Working memory. Oxford, England: Oxford University Press.

Della Libera, C., & Chelazzi, L. (2006). Visual selective attention and the effects of momentary rewards. Psychological Science, 17, 222–227. doi:10.1111/ j.1467-9280.2006.01689.x

Balsam, P. D., & Gallistel, C. R. (2009). Temporal maps and informativeness in associative learning. Trends in Neurosciences, 32, 73–78. doi:10.1016/j. tins.2008.10.004

Della Libera, C., & Chelazzi, L. (2009). Learning to attend and to ignore is a matter of gains and losses. Psychological Science, 20, 778–784. doi:10.1111/ j.1467-9280.2009.02360.x

Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242. doi:10.1901/jeab.1974.22-231

Dijksterhuis, A., & Aarts, H. (2010). Goals, attention, and (un)consciousness. Annual Review of Psychology, 61, 467–490. doi:10.1146/annurev.psych.093008.100445

Baum, W. M. (1979). Matching, undermatching, and overmatching in studies of choice. Journal of the Experimental Analysis of Behavior, 32, 269–281. doi:10.1901/jeab.1979.32-269 Baum, W. M., & Rachlin, H. C. (1969). Choice as time allocation. Journal of the Experimental Analysis of Behavior, 12, 861–874. doi:10.1901/jeab.1969.12-861 Blough, D. S. (1969). Attention shifts in a maintained discrimination. Science, 166, 125–126. doi:10.1126/ science.166.3901.125 406

Dinsmoor, J. A. (1983). Observing and conditioned reinforcement. Behavioral and Brain Sciences, 6, 693–728. doi:10.1017/S0140525X00017969 Dinsmoor, J. A. (1985). The role of observing and attention in establishing stimulus control. Journal of the Experimental Analysis of Behavior, 43, 365–381. doi:10.1901/jeab.1985.43-365 Dube, W. V., & McIlvane, W. J. (1997). Reinforcer frequency and restricted stimulus control. Journal of the Experimental Analysis of Behavior, 68, 303–316. doi:10.1901/jeab.1997.68-303

Attention and Conditioned Reinforcement

Ehrman, R. N., Robbins, S. J., Bromwell, M. A., Lankford, M. E., Monterosso, J. R., & O’Brien, C. P. (2002). Comparing attentional bias to smoking cues in current smokers, former smokers, and non-smokers using a dot-probe task. Drug and Alcohol Dependence, 67, 185–191. doi:10.1016/S0376-8716(02)00065-0 Fantino, E. (1977). Conditioned reinforcement: Choice and information. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 313–339). Englewood Cliffs, NJ: Prentice-Hall. Gallistel, C. R., & Gibbon, J. (2002). The symbolic foundations of conditioned behavior. Mahwah, NJ: Erlbaum. Gopher, D. (1992). The skill of attention control: Acquisition and execution of attention strategies. In D. E. Meyer & S. Kornblum (Eds.), Attention and performance XIV: Synergies in experimental psychology, artificial intelligence, and cognitive neuroscience (pp. 299–322). Cambridge, MA: MIT Press. Gottselig, J. M., Wasserman, E. A., & Young, M. E. (2001). Attentional trade-offs in pigeons learning to discriminate newly relevant visual stimulus dimensions. Learning and Motivation, 32, 240–253. doi:10.1006/lmot.2000.1081 Grace, R. C. (1994). A contextual model of concurrentchains choice. Journal of the Experimental Analysis of Behavior, 61, 113–129. doi:10.1901/jeab.1994.61-113 Grimes, J. A., & Shull, R. L. (2001). Responseindependent milk delivery enhances persistence of pellet-reinforced lever pressing by rats. Journal of the Experimental Analysis of Behavior, 76, 179–194. doi:10.1901/jeab.2001.76-179 Harper, D. N. (1999). Drug-induced changes in responding are dependent on baseline stimulus-reinforcer contingencies. Psychobiology, 27, 95–104. Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267–272. doi:10.1901/jeab.1961.4-267 Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243–266. doi:10.1901/jeab.1970.13-243 Herrnstein, R. J. (1990). Rational choice theory: Necessary but not sufficient. American Psychologist, 45, 356–367. doi:10.1037/0003-066X.45.3.356 Herrnstein, R. J., Rachlin, H., & Laibson, D. I. (1997). The matching law: Papers in psychology and economics. Cambridge, MA: Harvard University Press. Hogarth, L., Dickinson, A., & Duka, T. (2010). Selective attention to conditioned stimuli in human discrimination learning: Untangling the effect of outcome prediction, value, arousal and uncertainty. In C. J. Mitchell & M. E. Le Pelley (Eds.), Attention and associative learning (pp. 71–97). Oxford, England: Oxford University Press.

Igaki, T., & Sakagami, T. (2004). Resistance to change in goldfish. Behavioural Processes, 66, 139–152. doi:10.1016/j.beproc.2004.01.009 James, W. (1890). The principles of psychology (Vol. 1). New York, NY: Holt. doi:10.1037/11059-000 Johnsen, B. H., Laberg, J. C., Cox, W. M., Vaksdal, A., & Hugdahl, K. (1994). Alcoholic subjects’ attentional bias in the processing of alcohol-related words. Psychology of Addictive Behaviors, 8, 111–115. doi:10.1037/0893-164X.8.2.111 Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice Hall. Kamin, L. J. (1968). “Attention-like” processes in classical conditioning. In M. R. Jones (Ed.), Miami Symposium on the Prediction of Behavior: Aversive stimulation (pp. 9–31). Miami, FL: University of Miami Press. Langley, C. M., & Riley, D. A. (1993). Limited capacity information processing and pigeon matching-to-sample: Testing alternative hypotheses. Animal Learning and Behavior, 21, 226–232. doi:10.3758/BF03197986 Leith, C. R., & Maki, W. S. (1975). Attention shifts during matching-to-sample performance in pigeons. Animal Learning and Behavior, 3, 85–89. doi:10.3758/ BF03209105 Loewenstein, Y., & Seung, H. S. (2006). Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity. Proceedings of the National Academy of Sciences of the United States of America, 103, 15224–15229. doi:10.1073/pnas.0505220103 Longstreth, L. E. (1971). A cognitive interpretation of secondary reinforcement. In J. K. Cole (Ed.), Nebraska symposium on motivation (Vol. 19, pp. 33–81). Lincoln: University of Nebraska Press. Lovaas, O. I., Koegel, R. L., & Schreibman, L. (1979). Stimulus overselectivity in autism: A review of research. Psychological Bulletin, 86, 1236–1254. doi:10.1037/0033-2909.86.6.1236 Lovejoy, E. (1965). An attention theory of discrimination learning. Journal of Mathematical Psychology, 2, 342–362. doi:10.1016/0022-2496(65)90009-X Lubman, D. I., Peters, L. A., Mogg, K., Bradley, B. P., & Deakin, J. F. W. (2000). Attentional bias for drug cues in opiate dependence. Psychological Medicine, 30, 169–175. doi:10.1017/S0033291799001269 Lubow, R. E. (1989). Latent inhibition and conditioned attention theory. New York, NY: Cambridge University Press. doi:10.1017/CBO9780511529849 Luck, S. J., & Vecera, S. P. (2002). Attention. In H. Pashler & S. Yantis (Eds.), Stevens’s handbook of experimental psychology: Vol. 1. Sensation and perception (3rd ed., pp. 235–286). New York, NY: Wiley. 407

Timothy A. Shahan

Mace, F. C., Lalli, J. S., Shea, M. C., Lalli, E. P., West, B. J., Roberts, M., & Nevin, J. A. (1990). The momentum of human behavior in a natural setting. Journal of the Experimental Analysis of Behavior, 54, 163–172. doi:10.1901/jeab.1990.54-163 Mackintosh, N. J. (1974). The psychology of animal learning. London, England: Academic Press. Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276–298. doi:10.1037/ h0076778 Mackintosh, N. J., & Little, L. (1969). Intradimensional and extradimensional shift learning by pigeons. Psychonomic Science, 14, 5–6. Madden, G. J., & Bickel, W. K. (Eds.). (2010). Impulsivity: The behavioral and neurological science of discounting. Washington, DC: American Psychological Association. doi:10.1037/12069-000 Maki, W. S., & Leith, C. R. (1973). Shared attention in pigeons. Journal of the Experimental Analysis of Behavior, 19, 345–349. doi:10.1901/jeab.1973.19-345 Maki, W. S., & Leuin, T. C. (1972). Informationprocessing by pigeons. Science, 176, 535–536. doi:10.1126/science.176.4034.535 Mazur, J. E. (2001). Hyperbolic value addition and general models of animal choice. Psychological Review, 108, 96–112. doi:10.1037/0033-295X.108.1.96 Mazur, J. E. (2006). Learning and behavior (6th ed.). Upper Saddle River, NJ: Prentice Hall. McClure, S. M., Laibson, D. I., Loewenstein, G., & Cohen, J. D. (2004). Separate neural systems value immediate and delayed momentary rewards. Science, 306, 503–507. doi:10.1126/science.1100907

Nevin, J. A., & Grace, R. C. (2000). Behavioral momentum and the law of effect. Behavioral and Brain Sciences, 23, 73–90. doi:10.1017/S0140525X00002405 Nevin, J. A., Milo, J., Odum, A. L., & Shahan, T. A. (2003). Accuracy of discrimination, rate of responding, and resistance to change. Journal of the Experimental Analysis of Behavior, 79, 307–321. doi:10.1901/jeab.2003.79-307 Nevin, J. A., Tota, M. E., Torquato, R. D., & Shull, R. L. (1990). Alternative reinforcement increases resistance to change: Pavlovian or operant contingencies? Journal of the Experimental Analysis of Behavior, 53, 359–379. doi:10.1901/jeab.1990.53-359 Norman, D. A. (1968). Toward a theory of memory and attention. Psychological Review, 75, 522–536. doi:10.1037/h0026699 Norman, D. A., & Shallice, T. (1986). Attention to action: Willed and automatic control of behavior. In R. J. Davidson, G. E. Schwartz, & D. Shapiro (Eds.), Consciousness and self-regulation (Vol. 4, pp. 1–18). New York, NY: Plenum. Pashler, H. E. (1998). The psychology of attention. Cambridge, MA: MIT Press. Pavlov, I. P. (1927). Conditioned reflexes. Oxford, England: Oxford University Press. Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552. doi:10.1037/ 0033-295X.87.6.532 Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3–25. doi:10.1080/00335558008248231

Nestor, P. G., & O’Donnell, B. F. (1998). The mind adrift: Attentional dysregulation in schizophrenia. In R. Parasuraman (Ed.), The attentive brain (pp. 527–546). Cambridge, MA: MIT Press.

Quick, S. L., & Shahan, T. A. (2009). Behavioral momentum of cocaine self-administration: Effects of frequency of reinforcement on resistance to change. Behavioural Pharmacology, 20, 337–345. doi:10.1097/ FBP.0b013e32832f01a8

Nevin, J. A. (1974). Response strength in multiple schedules. Journal of the Experimental Analysis of Behavior, 21, 389–408. doi:10.1901/jeab.1974.21-389

Rachlin, H. (1976). Behavior and learning. San Francisco, CA: Freeman.

Nevin, J. A. (1992). An integrative model for the study of behavioral momentum. Journal of the Experimental Analysis of Behavior, 57, 301–316. doi:10.1901/ jeab.1992.57-301 Nevin, J. A., Davison, M., Odum, A. L., & Shahan, T. A. (2007). A theory of attending, remembering, and reinforcement in delayed matching to sample. Journal of the Experimental Analysis of Behavior, 88, 285–317. doi:10.1901/jeab.2007.88-285 Nevin, J. A., Davison, M., & Shahan, T. A. (2005). A theory of attending and reinforcement in conditional discriminations. Journal of the Experimental Analysis of Behavior, 84, 281–303. doi:10.1901/jeab.2005.97-04 408

Rachlin, H., Battalio, R., Kagel, J., & Green, L. (1981). Maximization theory in behavioral psychology. Behavioral and Brain Sciences, 4, 371–417. doi:10.1017/S0140525X00009407 Rachlin, H., Green, L., Kagel, J. H., & Battalio, R. (1976). Economic demand theory and psychological studies of choice. In G. Bower (Ed.), The psychology of learning and motivation (Vol. 10, pp. 129–154). New York, NY: Academic Press. Raymond, J. E., & O’Brien, J. L. (2009). Selective visual attention and motivation: The consequences of value learning in an attentional blink task. Psychological Science, 20, 981–988. doi:10.1111/j.1467-9280.2009. 02391.x

Attention and Conditioned Reinforcement

Rescorla, R. A. (1988). Pavlovian conditioning: Its not what you think it is. American Psychologist, 43, 151–160. doi:10.1037/0003-066X.43.3.151 Rescorla, R. A. (1998). Instrumental learning: Nature and persistence. In M. Sabourin, F. Craik, & M. Robert (Eds.), Advances in psychological science: Vol. 2. Biological and cognitive aspects (pp. 239–257). Hove, England: Psychology Press. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York, NY: Appleton-Century-Crofts. Riley, D. A., & Leith, C. R. (1976). Multidimensional psychophysics and selective attention in animals. Psychological Bulletin, 83, 138–160. doi:10.1037/ 0033-2909.83.1.138 Riley, D. A., & Roitblat, H. L. (1978). Selective attention and related cognitive processes in pigeons. In S. H. Hulse, H. Fowler, & W. K. Honig (Eds.), Cognitive processes in animal behavior (pp. 249–276). Hillsdale, NJ: Erlbaum. Roberts, W. A. (1998). Principles of animal cognition. Boston, MA: McGraw-Hill. Santi, A., Grossi, V., & Gibson, M. (1982). Differences in matching-to-sample performance with element and compound sample stimuli in pigeons. Learning and Motivation, 13, 240–256. doi:10.1016/0023-9690 (82)90023-6 Schneider, S. M., & Davison, M. (2005). Demarcated response sequences and generalized matching. Behavioural Processes, 70, 51–61. doi:10.1016/ j.beproc.2005.04.005 Shahan, T. A. (2010). Conditioned reinforcement and response strength. Journal of the Experimental Analysis of Behavior, 93, 269–289. doi:10.1901/ jeab.2010.93-269 Shahan, T. A., & Burke, K. A. (2004). Ethanolmaintained responding of rats is more resistant to change in a context with added non-drug reinforcement. Behavioural Pharmacology, 15, 279–285. doi:10.1097/01.fbp.0000135706.93950.1a Shahan, T. A., Magee, A., & Dobberstein, A. (2003). The resistance to change of observing. Journal of the Experimental Analysis of Behavior, 80, 273–293. doi:10.1901/jeab.2003.80-273 Shahan, T. A., & Podlesnik, C. A. (2005). Rate of conditioned reinforcement affects observing rate but not resistance to change. Journal of the Experimental Analysis of Behavior, 84, 1–17. doi:10.1901/jeab. 2005.83-04 Shahan, T. A., & Podlesnik, C. A. (2006). Divided attention performance and the matching law. Learning and Behavior, 34, 255–261. doi:10.3758/BF03192881

Shahan, T. A., & Podlesnik, C. A. (2007). Divided attention and the matching law: Sample duration affects sensitivity to reinforcement allocation. Learning and Behavior, 35, 141–148. doi:10.3758/BF03193049 Shahan, T. A., & Podlesnik, C. A. (2008a). Conditioned reinforcement value and resistance to change. Journal of the Experimental Analysis of Behavior, 89, 263–298. doi:10.1901/jeab.2008-89-263 Shahan, T. A., & Podlesnik, C. A. (2008b). Quantitative analyses of observing and attending. Behavioural Processes, 78, 145–157. doi:10.1016/j.beproc. 2008.01.012 Shahan, T. A., Podlesnik, C. A., & Jimenez-Gomez, C. (2006). Matching and conditioned reinforcement rate. Journal of the Experimental Analysis of Behavior, 85, 167–180. doi:10.1901/jeab.2006.34-05 Shahan, T. A., & Quick, S. (2009, May). Reinforcement probability affects adjusting sample duration in a divided-attention task. In M. Davison (Chair), Divided attention, divided stimulus control. Symposium conducted at the 35th annual meeting of the Association for Behavior Analysis International, Phoenix, AZ. Staddon, J. E. R. (1983). Adaptive behavior and learning. New York, NY: Cambridge University Press. Styles, E. A. (1997). The psychology of attention. East Essex, England: Psychology Press. doi:10.4324/ 9780203690697 Sugrue, L. P., Corrado, G. S., & Newsome, W. T. (2004). Matching behavior and the representation of value in the parietal cortex. Science, 304, 1782–1787. doi:10.1126/science.1094765 Sutherland, N. S., & Mackintosh, H. J. (1971). Mechanisms of animal discrimination learning. New York, NY: Academic Press. Townshend, J., & Duka, T. (2001). Attentional bias associated with alcohol cues: Differences between heavy and occasional social drinkers. Psychopharmacology, 157, 67–74. doi:10.1007/s002130100764 Wagner, A. R., Logan, F. A., Haberlandt, K., & Price, T. (1968). Stimulus selection in animal discrimination learning. Journal of Experimental Psychology, 76, 171–180. doi:10.1037/h0025414 Williams, B. A. (1994). Conditioned reinforcement: Neglected or outmoded explanatory construct? Psychonomic Bulletin and Review, 1, 457–475. doi:10.3758/BF03210950 Wolfe, J. B. (1936). Effectiveness of token-rewards for chimpanzees. Comparative Psychology Monographs, 12, 1–72. Wyckoff, L. B., Jr. (1952). The role of observing responses in discrimination learning: Part 1. Psychological Review, 59, 431–442. doi:10.1037/h0053932 409

Timothy A. Shahan

Yantis, S. (2000). Goal-directed and stimulus-driven determinants of attentional control. In S. Monsell & J. Driver (Eds.), Attention and performance: Vol. 18: Control of cognitive processes (pp. 73–103). Cambridge, MA: MIT Press. Zeaman, D., & House, B. J. (1963). The role of attention in retardate discrimination learning. In N. R. Ellis (Ed.), Handbook of mental deficiency: Psychological theory and research (pp. 159–223). New York, NY: McGraw-Hill.

410

Zentall, T. R., & Riley, D. A. (2000). Selective attention in animal discrimination learning. Journal of General Psychology, 127, 45–66. doi:10.1080/0022130000 9598570 Zentall, T. R., Sherburne, L. M., & Zhang, Z. (1997). Shared attention in pigeons: Retrieval failure does not account for the element superiority effect. Learning and Motivation, 28, 248–267. doi:10.1006/ lmot.1996.0965

Chapter 18

Remembering and Forgetting K. Geoffrey White

Remembering and forgetting raise the question of how an event at one time influences behavior that occurs at another. In physics, action at a distance used to be a puzzle because it was thought that objects could influence each other only through direct contact. In psychology, the question of action at a temporal distance remains a puzzle. How do people remember and why do people forget? Research on human memory is wide ranging, and naturalistic and laboratory studies of remembering in nonhuman animals are covered by the field called comparative cognition (Roberts, 1998). Sometimes the different abilities reveal fascinating species- specific characteristics (Wasserman, 1993), and sometimes the species similarities owe to procedural limits (White, Juhasz, & Wilson, 1973). This chapter is more constrained in scope. In it, I examine the experimental analysis of nonhuman remembering and forgetting in laboratory procedures in which the retention intervals are typically short and in which parametric variation is a main focus. Can an experimental analysis contribute a solution to the puzzle of action at a temporal distance? In any experiment in which remembering or forgetting is studied, the most important parameter is the temporal distance between original learning and the point of remembering, because the time delay, or retention interval, defines the behavior as remembering or its converse, forgetting. The process used to bridge the temporal gap is loosely referred to as memory, but as I discuss later, this question is a theoretical one that has many different interpretations.

Remembering in both humans and nonhuman animals has been studied for more than 100 years. At about the time that Thorndike (1898) first demonstrated remembering in cats, Ebbinghaus (1885/1964) reported the results of the first systematic experimental study of human remembering. Ebbinghaus recognized the difficulty caused by people’s prior learning experiences for a study of remembering. Accordingly, he used nonsense syllables, an innovation that has influenced the study of human memory to the present day. He also varied the duration of the retention interval from minutes to many days. Since then, an enormous number of studies have been devoted to the empirical and theoretical study of remembering in humans. An accessible account was provided by Baddeley (1997) in his book Human Memory: Theory and Practice. Several specialist journals cover the area, and many reviews in the Annual Review of Psychology have addressed important issues (e.g., Jonides et al., 2008; Nairne, 2002; Roediger, 2008; Wixted, 2004b). The research has identified many variables and conditions that influence remembering, such as the familiarity and repetition of material to be remembered, whether learning is spaced, and whether interfering events occur between learning and recollection. The effects of some variables, however, are relative to the effects of others, making it difficult to establish lawful regularities (Roediger, 2008). None of this extensive research with human subjects is explicitly reviewed here, although it tends to have influenced the direction of research with nonhuman animals.

I thank Glenn Brown for his guidance and advice. DOI: 10.1037/13937-018 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

411

K. Geoffrey White

Overview I begin the chapter with a preliminary description of the main focus of an experimental analysis of remembering. I follow this description with an account of the procedure most often used to study short-term remembering in nonhuman animals (delayed matching to sample) and the advantages of quantifying the forgetting function, the hallmark of a memory study. In the next section, I summarize the empirical research on the effects of variables that influence the forgetting function. These variables are related to the to-be-remembered events: the sample stimuli in delayed matching to sample; the retention interval; the choice response, including its reinforcing consequences; and the intertrial interval. In a subsequent section, I describe the three main behavioral theories of remembering, all of which rely on the effects of reinforcers for remembering and their role in forgetting. In the final section, I consider a behavioral perspective on memory, in which remembering is treated as a discrimination specific to the retention interval at which remembering occurs. My main purpose in this chapter is to document the effects of variables that must be accounted for by successful theories, to demonstrate the empirical and theoretical importance of the consequences for remembering, and to suggest that remembering can be understood in the same terms as discrimination.

Experimental Analysis of Remembering From a behavioral perspective, an experimental analysis of remembering has mainly been advanced using nonhuman animals for two reasons. The first is the relative ease of specifying the environmental variables that influence the individual’s behavior, especially when variables are manipulated across different values for each individual in the study. The second is that a fundamental element determining the individual’s behavior in any task requiring learning and remembering is the reinforcement contingency. Very few studies with humans can easily control the rewards, or reinforcers, for appropriate completion of the task. When studying 412

remembering in nonhuman animals, the task requirements are made explicit through reinforcement contingencies, and a response language is thus developed (Goldiamond, 1966). For these reasons, nearly all of the studies I review in this chapter examine remembering in nonhuman animals. An experimental analysis of remembering emphasizes observable behavior and the description of variables of which the behavior is a function. Much of this chapter concerns such variables, but as I noted earlier, the most important variable is the retention interval—the temporal gap between original learning and later remembering. Without a time delay, researchers would be studying perceiving and would see no need to refer to memory. An additional consideration for an experimental analysis of behavior is that the effects of the experimental variables are demonstrated in each individual (Sidman, 1960). The effects of a variable are shown by altering it across several values. Most of the studies reviewed in this chapter follow these principles. From a behavioral perspective, remembering can be placed in the broader context of discrimination and generalization. Remembering typically requires a discrimination between concurrently available response alternatives. A choice is followed by reinforcement depending on the event or stimulus to be remembered. In recognition procedures, one or more of the stimuli to be remembered is presented again at the time of choice. In recall procedures, the to-be-remembered stimulus is not presented again but is associated with a choice alternative. The discrimination thus involves both the stimulus or event to be remembered and the stimuli associated with the choice alternatives. It is a conditional discrimination because the reinforced choice response is conditional on the to-be-remembered stimulus. Because the to-be-remembered stimulus exerts its influence on the subsequent choice after some delay, accurate remembering shows delayed stimulus control (Catania, 1992), which implies that the discrimination occurs at the time of remembering, not at the time at which the to-be-remembered event was presented (Berryman, Cumming, & Nevin, 1963; White, 2001).

Remembering and Forgetting

Delayed Matching Task Fifty years ago, two innovative procedures were reported for studying remembering over short delays (generally 20 seconds or less). One, by Peterson and Peterson (1959), was designed to study human memory. The other, by Blough (1959), was designed to study remembering in nonhumans. Although I do not consider research on human memory in this chapter, the two tasks are similar. Peterson and Peterson asked human participants to repeat three consonants after a delay lasting several seconds. During the delay, the participant counted down in threes from a number provided by the experimenter. The result was very rapid forgetting of the consonants, interpreted by the Petersons as trace decay, a process specific to short-term memory. As it happens, the distinction between short-term and longterm forgetting that followed from the Petersons’ experiment has little substance (Nairne, 2002; Suprenant & Neath, 2009). Instead, it is best understood simply in terms of the relative lengths of delays being studied (short vs. long), and the same principles can apply to remembering over both the short and the long term. The study by Peterson and Peterson has been very influential in stimulating a large body of research on human short-term memory ( Jonides et al., 2008), and the theoretical ideas have been transferred to account for remembering over short delays by nonhuman animals (Kendrick, Rilling, & Denny, 1986; Roberts, 1998). Wixted (1989) observed that in the study of nonhuman short-term memory, “the ratio of theory to data often seems unacceptably high, and efforts to identify common empirical principles of memory are relatively rare” (p. 409). I aim to improve this ratio in this chapter. The other innovative procedure, delayed matching-to-sample, was reported by Blough (1959) with pigeons as subjects. As with the task devised by Peterson and Peterson (1959), the delayed matching task was performed trial by trial, with delays lasting for several seconds. Each trial began with the presentation of a steady or flickering light on the central response key of a three-key experimental chamber. The two light patterns, the samples, alternated randomly across trials. Responses to the

sample darkened the key and led to a delay interval of several seconds, varying from trial to trial. At the end of the delay, the two patterns were presented as comparison stimuli on the left and right response keys, with their position alternating randomly across trials. Correct choices of the comparison pattern that matched the sample at the beginning of the trial were followed by delivery of grain. A houselight illuminated the chamber throughout the experimental session. Figure 18.1 illustrates a general version of a

Fifth peck at red or green center key starts a dark delay interval Red

b

After the delay interval a choice is made between red and green side keys Red

Green

Correct choices are reinforced Intertrial Interval

Figure 18.1. Illustration of the delayed matching-to-sample procedure involving three response keys that can display different stimuli (e.g., red and green keylights). From “Psychophysics of Remembering: The Discrimination Hypothesis,” by K. G. White, 2002, Current Directions in Psychological Science, 11, p. 142. Copyright 2002 by the Association for Psychological Science. Adapted with permission.

413

K. Geoffrey White

the delayed matching-to-sample procedure (in which the delay and intertrial interval are usually dark). In Blough’s study, the percentage of correct matching responses decreased with increasing duration of the delay, just as in the experiment with humans reported by Peterson and Peterson. Enhanced accuracy was interestingly correlated with whether different stereotyped behaviors, such as bobbing up and down, occurred during the delay. Blough’s (1959) observations of mediating behaviors suggested a behavioral answer to the question of how the temporal gap was bridged. Mediating behaviors, however, may not always be obvious. Berryman et al. (1963) charted the development of delayed matching to sample but were unable to identify mediating behaviors. Whereas Blough gradually increased the duration of the delays, Berryman et al. ensured that exposure to each of the different delays was equal throughout training. Berryman et al. suggested that the mediating behaviors in Blough’s study were more likely to have developed as a result of gradually lengthening the delays. Apart from a few studies in which responding on a lit center key during the delay differed after two sample stimuli (e.g., Jones & White, 1994; Wasserman, Grosch, & Nevin, 1982) or behavioral observations during the delay were systematically recorded (Urcuioli & DeMarse, 1997), there is a dearth of studies of mediating behavior during the delay. Human short-term memory studies have shown renewed interest in the process of rehearsal and its prevention during the delay (Berman, Jonides, & Lewis, 2009). The study of mediating behaviors in the retention interval of nonhuman forgetting procedures, and their relation to accurate remembering under a variety of conditions already shown to influence accuracy, could be a productive avenue for future research. Most studies of delayed matching to sample begin by training the animal to respond to the choice stimuli as a first step, with alternating responses producing reinforcers. The next step is to introduce the sample stimuli, which precede the choice stimuli without delay. After a few days of such training, a very short delay is introduced. In some early studies, training with no delay was 414

f ollowed by a series of test sessions with several different delays (Roberts, 1972). The sudden introduction of nonzero delays can result in chance performance at all nonzero delays for some individuals, or a generalization decrement from the zero delay (Rayburn-Reeves & Zentall, 2019; Sargisson & White, 2001). Similarly, when one delay is arranged for all trials within a session, accuracy can be low or at chance level (Harnett, McCarthy, & Davison, 1984), confounded by large response bias ( Jones & White, 1992), and the averaged forgetting function can appear hyperbolic in form. To maintain high accuracy at short delays and minimize the development of response bias at long delays, a successful strategy is to gradually lengthen the delays as training progresses and to retain a zero or near-zero delay in all sets of delays ( Jones & White, 1994; Sargisson & White, 2001). The absence of bias at long delays means that the reinforcer proportion remains at its arranged value, typically 0.5, and the discrimination is not influenced by fluctuations in reinforcer proportions. Attempts to control the reinforcer proportion (McCarthy & Davison, 1991) can result in large reductions in the levels of obtained reinforcers at long delays and large bias for other dimensions of the choice between comparison stimuli (Alsop & Jones, 2008; Brown & White, 2009a; Jones & White, 1992; White & Wixted, 1999). Inclusion of a very short delay at all stages of training minimizes response bias at medium and long delays (White, 1985). The procedural advantages of varying delay within experimental sessions yields an interesting major empirical benefit—the mapping of the forgetting function for individual subjects. Forgetting Over Delays Blough’s (1959) study can be seen as the beginning of an experimental analysis of remembering. Delayed matching to sample is the most frequently used procedure in the study of nonhuman shortterm remembering, and it has been studied among a wide range of species (White, Ruske, & Colombo, 1996). Examples include humans (Adamson, Foster, & McEwan, 2000; Lane, Cherek, & Tcheremissine, 2005), monkeys (D’Amato, 1973), dolphins (Herman, 1975), mice (Goto, Kurashima, & Watanabe,

Remembering and Forgetting

2010), and rats (Dunnett & Martel, 1990; Harper, 2000; Ruske & White, 1999). The trial-by-trial procedure allows the direct translation of the main procedural elements (sample, delay, choice) into the terms of the cognitive psychology of human remembering (encoding, storage, retrieval). It also allows the study of the effects on matching accuracy in pigeons of variables that corresponded to main effects in human short-term remembering such as repetition, rehearsal, proactive interference, retroactive interference, and spaced practice (Roberts, 1972; Roberts & Grant, 1976). Most important, the delayed matching-to-sample procedure allows within-subject variation of the fundamental variable that defines the procedure as a memory procedure— the delay or retention interval. Because the function relating accuracy to delay duration typically decreases with increasing time, the function is referred to as a forgetting function (White, 1985, 2001). In the absence of delay interval variation, a single data point at a given delay confounds potential differences in the intercept of a forgetting function with its slope. I consider the effects on forgetting functions of the sample stimuli, delay interval conditions, choice, and intertrial interval in delayed matching to sample in the sections that follow. Quantifying the forgetting function in terms of the intercept and slope of fitted functions reveals that some variables influence the intercept and others the slope. Additionally, quantification is a hallmark of an experimental analysis of behavior, and the ability to fit functions to data provides another level of analysis in the search for order (Mazur, 2006). An excellent example is the comparison of auditory memory in humans and starlings in a delayed matching task in which samples were pure tones and starling song motifs (Zokoll, Naue, Herrmann, & Langemann, 2008). The forgetting functions were well described by exponential functions that did not differ in intercept for the tones versus motifs. Repetition of the samples increased intercepts for the starlings but not for the humans, and rate of forgetting was greater for the starlings. Thus, the higher order description in terms of intercepts and slopes of forgetting functions facilitated an illuminating cross-species comparison.

Quantifying Forgetting The beauty of the forgetting function is that it measures performance across a wide range of levels of accuracy, from high at very short retention intervals to low at very long intervals. Percentage of correct choice in delayed matching to sample is the standard and most basic measure of performance at each retention interval, but increasingly a measure of discriminability is used. The problem with percentage correct is that it is bounded at 1.0 and can suffer from ceiling effects. By transforming proportion correct (p) to logit p, using logit p = log10 [p/(1 − p)], the problem of ceiling effects can be avoided. Logit p is a ratio-based measure that varies on an equalinterval scale, as do measures of discriminability. As a result, the slopes of different forgetting functions can be compared without encountering the problem that slope differences can be generated by non– equal-interval measurement scales (Loftus, 1985; Wixted, 1990). Technically, logit p can be influenced by response bias, whereas the discriminability measures d′, log α, and log d estimate discriminability separately from response bias. Macmillan and Creelman (1991) provided a comprehensive account of signal detection theory’s d9 and log α from choice theory. Log d (Davison & Tustin, 1978; Nevin, 1981) is the same as log α except that it uses log to base 10. Both logit p and log d express discriminability as the log of the ratio of correct responses to errors and are equal when there is no response bias. The log d measure is easy to calculate: log d = 0.5log10 [(correct after S1 × correct after S2) / (errors after S1 × errors after S2)], S1 and S2 are the two stimuli. When there are no errors, log d cannot be determined. Brown and White (2005b) used a computationally intensive analysis to show that the optimal correction in such cases is achieved by adding 0.25 to the response totals in each of the four cells of the response matrix (correct and error responses after S1 and S2). The form of the mathematical function that best fits the data from delayed matching-to-sample studies (Rubin & Wenzel, 1996) and whether the fits depend on the measure of accuracy (Wickens, 1998) have been extensively discussed. In practice, functions that differentiate different conditions measured 415

K. Geoffrey White

in terms of discriminability equally differentiate them when measured in terms of percentage correct, as long as the different levels of chance performance are recognized (zero for discriminability measures and 50% for percentage correct; White, 2001). A function that appeals from a behavioral perspective is the simple exponential decay function, y = a · exp(−b · t), because it is “memoryless.” The exponential function is the only mathematical function that has a constant rate of decrement (b), with the property that the reduction in performance between two times depends only on that temporal distance and not on the level of performance at earlier times (White, 2001). The exponential function is memoryless in that performance does not depend on changes in memory that might result from organismic variables. A practical problem with the simple exponential function, however, is that it underestimates accuracy at longer delays. A better fitting exponential function scales time to the square root (White & Harper, 1996) and retains the memoryless properties of the simple exponential function, that is, y = a · exp(b · √t) (White, 2001). Power functions have also proven useful in quantifying forgetting functions. Wixted (2004a, 2004b), Wixted and Carpenter (2007), and Wixted and Ebbesen (1991) have argued persuasively in favor of the power function, y = a · (t + 1)b, because of its consistency with the notion of consolidation and Jost’s law (Woodworth & Schlosberg, 1954, pp. 730– 731). The power function is difficult to discriminate from the exponential function with time scaled as √t in terms of their accurate description of forgetting functions (White, 2001). Indeed, apart from theoretical reasons, which particular function is used for descriptive purposes does not matter greatly as long as it provides a reasonable fit to the data. The advantages of fitting functions to data are that the entire forgetting function can be quantified in terms of the parameters of the fitted function, typically intercept and slope, and that comparisons can be made between different experimental conditions in terms of their effects on either or both of the two parameters (White, 1985; Wixted, 1990). Where functions are fitted to data in this chapter, I used the exponential function in the square root of time. In the following sections, I describe the results 416

of varying the different components of the delayed matching task: the sample stimuli, the retention intervals, the comparison stimuli and choice response, and the intertrial interval. The results of fitting functions to the data from the wide range of studies I summarize suggest some impressive regularities: Variation in attributes of the sample stimulus influence the intercept of the forgetting function, whereas conditions during the retention interval and at the time of remembering influence the slope of the forgetting function. Variations in Sample Stimuli The initial impetus for studying the effects of variation in sample-stimulus parameters in studies of nonhuman delayed matching tasks was provided by the analogy with processes of human short-term memory—distinctiveness, complexity, repetition, and rehearsal of the to-be-remembered events. The results, however, established many basic findings that can now be described in terms of their effects on forgetting functions. The main feature that these different variables have in common is that they affect the overall difficulty of discrimination, as shown by changes in the intercept parameter of the forgetting function. In other words, the different aspects of the sample stimuli influence the discrimination independently of time.

Sample Stimulus Disparity Roberts (1972) showed that percentage of correct matching was overall higher for an easier color discrimination than for a harder color discrimination, although comparison stimuli also differed owing to the identical nature of samples and comparisons. White (1985) described forgetting functions for two levels of wavelength disparity between the samples, with disparity between comparison stimuli held constant. Figure 18.2 shows discriminability, log d, averaged over five pigeons in the experiment, recalculated from data in White’s Table 1, and with the exponential function in the square root of time fitted to the data. The fits suggest that variations in the disparity of the sample stimuli produce a change in the intercepts of the fitted function without affecting their slope.

Remembering and Forgetting

Discriminability Log d

2.5 538 vs. 576 nm: a= 1.93 b= .49 501 vs. 606 nm: a= 2.51 b= .45

2.0 1.5 1.0 0.5 0.0

0

4

8

12

16

20

Delay Interval (s)

Figure 18.2. Discriminability as a function of delay for two conditions of wavelength disparity between sample stimuli. Smooth curves are nonlinear least-squares fits of y = a · exp(b · √t). Data from White (1985, Table 1).

most studies—up to 60 seconds. His data, transformed to logit p values, are plotted in Figure 18.3. The intercepts of fitted exponential functions in the square root of time increased systematically with increasing sample duration, without an obvious change in slope. Foster, Temple, Mackenzie, DeMello, and Poling (1995) varied independently both the FR requirement (0, 1, 3, 7, 10) and sample duration (2 seconds, 5 seconds, 10 seconds) in a delayed matching procedure with hens as subjects. Although Foster et al. arranged just one delay interval in the delayed matching task, their data clearly demonstrated that increases in both FR and sample duration had independent effects in increasing discriminability.

Serial Compound Sample Stimuli

Repeating a to-be-remembered item can increase accuracy. In the delayed matching task, repetition can be achieved by repeated presentations of the sample (Kangas, Vaidya, & Branch, 2010; Roberts & Grant, 1976; Zokoll et al., 2008), extending its duration, or requiring repeated observing responses to the sample. Roberts (1972) varied the fixed ratio (FR) response requirement for pigeons’ pecks on the sample key in a delayed matching-to-sample task in which retention intervals were varied over 0, 1, 3, and 6 seconds. The FR values in different conditions were 1, 5, and 15 (i.e., required 1, 5, and 15 responses, respectively). This manipulation, along with variation in the exposure duration of the sample, was seen as affecting repetition. White (1985, Figure 13) fitted simple exponential functions to the logit p transform of Roberts’ data and found that the intercept of the fitted functions increased systematically and the slope decreased with increasing FR value. Both White (1985) and White and Wixted (1999) compared the effects of FR 1 and FR 5 requirements for sample-key responding across a range of delay intervals and reported higher intercepts for fitted exponential functions for FR 5 than for FR 1 without any systematic change in slope.

In the delayed paired-comparison task arranged by Shimp and Moffitt (1977), two stimuli were presented in succession and with a delay between them. At the same time as the second stimulus was presented, a choice was made available—peck left if the stimuli were the same, or peck right if they differed. The procedure is a choice version of successive matching to sample (Nelson & Wasserman, 1978). In White’s (1974) version of the task, the choice response follows the second stimulus. In all three procedures, lengthening the time between successive presentation of the two stimuli decreases accuracy. The stimulus associated with a correct response is actually a compound or abstract stimulus—same

Sample Duration Grant (1976) varied the exposure duration of sample stimuli over four values ranging from 1 second to 14 seconds and also used delays longer than in

Discriminability Logit p

Fixed-Ratio Requirement

2.5

14-s Sample 8-s Sample 4-s Sample 1-s Sample

2.0 1.5 1.0 0.5 0.0

0

20 40 Delay Interval (s)

60

Figure 18.3. Discriminability as a function of delay and sample stimulus duration. Smooth curves are nonlinear least-squares fits of y = a · exp(b · √t). Data from Grant (1976). DRO = differential reinforcement of other behavior; FR = fixed ratio. 417

K. Geoffrey White

versus different. By increasing the temporal separation between elements of the compound, the discrimination is made more difficult. White and McKenzie (1982) held constant the time between successive stimuli and varied the retention, or delay, interval between the second stimulus and the choice. They also compared the forgetting functions for the same or different compound with forgetting functions for the element stimuli (red and green) making up the compound. These functions differed only in intercept, not in slope. In particular, the intercept for the compound stimuli was lower. That is, the discrimination of same versus different was more difficult than the discrimination of the elements making up the compound, and increasing the retention interval resulted in a similar decrement in discriminability in both cases. A question of interest is whether the function relating discriminability to retention interval has the same slope as the function relating discriminability to the time between successive stimuli. Data relevant to this question were reported by Urcuioli and DeMarse (1997). When pigeons chose left versus right response keys according to whether two successively presented stimuli were the same or different, discriminability with increasing delay between successive stimuli decreased at a faster rate than did discriminability with increasing time between the second stimulus of a pair and the choice. This result suggests that the two intervals have different functions. One relates to the pairing of the elements to form a discriminable compound (Wixted, 1989), and the other relates to the delayed control by the compound over the subsequent choice response.

Categorized Samples Extensive research has documented the ability of nonhuman animals to discriminate categories of natural and artificial objects at different levels (e.g., Sands, Lincoln, & Wright, 1982; Vonk & MacDonald, 2004; Wasserman, Kiedinger, & Bhatt, 1988). Lazareva and Wasserman (2009) examined the choice responses of pigeons to samples categorized at basic levels (cars, chairs, flowers, people) or superordinate levels (natural vs. artificial) after three delays (0, 1, and 4 seconds). Discriminability was lower for the basic categories at all delays, consistent 418

with the lower intercept for elements than for serial compounds in the same–different discriminations. The forgetting function for basic categories also tended to have a greater slope than that for the superordinate categories, although confirmation of this trend relies on analysis of functions fitted to individual data.

Asymmetrical Samples Technically, delayed matching to sample is a twoalternative forced-choice procedure. When the samples are symmetrical, there should be no reason to prefer one sample over the other. A variant of the standard procedure involves asymmetrical samples, in which the choice responses are associated with whether a sample was present or absent. Preference to report the absence of the sample increases with increasing duration of the retention interval (Dougherty & Wixted, 1996; Wixted, 1993). Similarly, when samples are two different durations of a stimulus, the tendency to report the shorter of the two durations increases as the retention interval lengthens—the “choose-short” effect (Spetch & Wilkie, 1982, 1983). Gaitan and Wixted (2000) have shown that short durations seem to function in the same way as absent samples. The effect generalizes to number, the “choose-few” effect (Fetterman & MacEwan, 1989), and to the effects of prior training with one of the samples (Grant, 2006). Hypotheses to account for the effect range from subjective shortening (Spetch & Wilkie, 1983) to ambiguity between delay and intertrial intervals (Sherburne, Zentall, & Kaiser, 1998). Ward and Odum (2007) convincingly demonstrated that the choose-short effect can be accounted for in terms of overall control by the sample stimuli and not by mechanisms such as subjective shortening. In a delayed matching task with just one 0-second delay, pigeons chose one comparison stimulus after four generally short delays and another comparison after four longer delays. The psychometric functions relating choice accuracy to sample duration were asymmetrical, like those of Fetterman (1995). Various disruptors affected accuracy on long-duration trials (a choose-short effect). Ward and Odum used Blough’s (1996) model to analyze their data, owing to its ability to separate stimulus

Remembering and Forgetting

control factors from other influences. Their conclusion is consistent with the emphasis emerging from this chapter, that sample-stimulus variation affects overall stimulus control, as reflected in the intercept of the forgetting functions. Research on the choose-short effect tends to examine functions for the two sample stimuli separately. Of interest is comparing full forgetting functions that plot discriminability as a function of retention interval (thus combining the effects of the two samples) for duration samples and color samples or for a range of duration samples that differ in relative duration. The latter comparison was reported by Fetterman (1995). Reanalysis of Fetterman’s data (White, 2001) showed that for samples that differed in duration but that were otherwise closely separated or more distant, forgetting functions for the easier discriminations were characterized by higher intercepts.

Sample-Specific Responding

FR10 vs. DRO on Samples FR10 on both Samples

1.0

Variations in Delay Interval Conditions Conditions during the retention or delay interval that reduce accuracy are described as instances of retroactive interference (Cook, 1980; Grant, 1988). Such conditions include illuminating a houselight in a normally dark delay, introducing other stimulus events, providing food, and reinforcing responses in an alternative task. The general effect of such intruding events is an increase in the slope of the forgetting function. Competition from reinforcers for other tasks can also result in a reduction in the intercept of the forgetting function. That is, competing reinforcers can generate an overall reduction in discrimination accuracy.

Retroactive Interference When the houselight is illuminated for the duration of a normally dark delay, matching accuracy plummets, as illustrated in Figure 18.5, for logit p transformations of data reported by Roberts and Grant (1978). The exponential functions fitted to the data in Figure 18.5 do not differ in intercept but have very different slopes, or rates of forgetting. The same effect was reported for pigeons by White (1985) and Dark Delay Houselight in Delay

1.0

0.5

0.0

0

1 2 3 Delay Interval (s)

4

Figure 18.4. Discriminability as a function of delay and different sample stimulus response requirements. Smooth curves are nonlinear least-squares fits of y = a · exp(b · √t). FR10 = fixed-ratio 10 responses; DRO = differential reinforcement of other behavior. Data from Zentall and Sherburne (1994).

Discriminability Logit p

Discriminability Logit p

When the sample response requirement differs, not only is there a choose-few effect but overall accuracy is also higher than in conditions with the same ratio requirement for sample responses (Fetterman & MacEwan, 1989; Zentall & Sherburne, 1994). Zentall and Sherburne (1994) trained pigeons to respond (FR 10) or not respond (differential reinforcement of other behavior) to color samples in a delayed matching-to-sample task. Their results are

replotted in Figure 18.4, with percentage correct transformed to logit p. The main difference in the exponential functions fitted to their transformed data is in the intercepts, not the slopes. That is, sample-specific responding enhances overall stimulus control, independent of the delay.

0.5

0.0

0

4

8

12

Delay Interval (s)

Figure 18.5. Discriminability as a function of delay for dark delays and illuminated delays. Smooth curves are nonlinear least-squares fits of y = a · exp(b · √t). Data from Roberts and Grant (1978). 419

Harper and White (1997), for monkeys by D’Amato and O’Neill (1971), and for humans with delays filled by interfering noise (Zokoll et al., 2008). Roberts and Grant (1978, Experiment 2) varied the duration of houselight interpolated in a 10-second delay interval and found that accuracy decreased with increasing houselight duration. Other studies that introduced various stimuli in the delay, including food presentations ( Jans & Catania, 1980) and geometric forms (Wilkie, Summers, & Spetch, 1981), have reported the same general result. Harper and White (1997) argued that the increased rate of forgetting observed when the houselight is illuminated is caused by the increasing duration of houselight that is otherwise normally correlated with the increasing duration of the retention interval. When they included a constant 1.5-second illumination of the houselight at the end of each delay interval, the forgetting function (not including the shortest delay) had the same slope as the function for delays that were dark throughout. The effects of houselight illumination during the delay for pigeons are not surprising because once the light is on, the birds tend to peck at irrelevant objects and find grain spilled from the hopper when the reinforcers were delivered. Such behaviors during the delay interval, presumably maintained by reinforcers extraneous to the remembering task, are likely to compete with any possible mediating behavior during the delay.

Competing Reinforcers An explicit competing alternative in the delay interval was arranged by Brown and White (2005c). Using a standard delayed matching task with red and green sample stimuli and delays ranging over four values from 0.2 second to 12 seconds, they included conditions in which responding on the center key (lit white) was reinforced at variable intervals averaging 15 seconds, 30 seconds, or none (extinction). The result from their Experiment 2 is shown in Figure 18.6, with exponential functions in the square root of time fitted to the log d measures. As reinforcers for center-key responding in the delay became more frequent, the intercept of the forgetting function decreased to a small extent (2.60, 2.54, and 2.33 for extinction, variable interval 30, and variable 420

Discriminability Log d

K. Geoffrey White

EXT VI-30 s VI-15 s

2

1

0

0

4 8 Delay Interval (s)

12

Figure 18.6. Discriminability as a function of delay and reinforcement for an extraneous task in the delay. Smooth curves are nonlinear least-squares fits of y = a · exp(b · √t). EXT = extinction; VI = variable interval. Data from Brown and White (2005).

interval 15, respectively), and the rate of forgetting increased (0.53, 0.64, and 0.73, respectively). The increase in the rate of forgetting is consistent with the conclusion that as the retention interval lengthens, the extent of interference from the competing task increases with the delayed matching task. Conditions for the Choice Response Typically, the comparison stimuli are assumed to be highly discriminable, although their disparity can influence overall matching accuracy ( Jones, 2003; White, 1986). Additionally, the reinforcement contingencies are also assumed to be unambiguous, with no reinforcement for errors. Nonetheless, the reinforcement contingency for correct responses has a powerful effect on delayed matching performance. The absolute probability, magnitude, and delay of reinforcement affect matching accuracy. The signaled magnitude effect is the result of signaling two different reinforcer magnitudes (or probabilities) within sessions, with higher accuracy occurring on trials in which the larger reinforcer is signaled. The relative reinforcer probability for correct choices also influences performance. When different reinforcer probabilities, magnitudes, or other qualitative aspects (e.g., food vs. water) follow the different correct choices, the resulting enhancement in accuracy is called the differential outcomes effect. When disrupting events are introduced at different stages during the delayed matching trials, discriminability is

resistant to change in the same way as the response rate of a single operant. Reinforcement of responses on the previous trial influences choice on the current trial, the local proactive interference effect previously thought to result from the influence of the sample stimulus on the previous trial. These various influences are all associated with the effects of the reinforcement of correct choices in the remembering task.

Absolute Frequency and Magnitude of Reinforcement Instead of reinforcing each correct response, correct responses can be reinforced with a certain probability. When the probability of reinforcement for each correct choice is reduced from 1.0 to 0.5 or 0.2 across blocks of sessions, overall discriminability decreases (Brown & White, 2009b; White & Wixted, 1999). The same result applies when the magnitude of the reinforcers is reduced (Brown & White, 2009b). When the magnitude of reinforcers for correct choices is varied within an experimental session and the different magnitudes are signaled by different cues on each trial, accuracy or discriminability is greater on trials in which the larger reinforcer is signaled. This effect, the signaled magnitude effect, is illustrated in Figure 18.7 from a reanalysis of data reported by McCarthy and Voss (1995). The effect is well documented (Brown & White, 2005a; Jones, White, & Alsop, 1995; Nevin & Grosch, 1990) and reflects a difference in the intercepts of the forgetting functions but not their slopes, as is clear in Figure 18.7. An analogous effect also occurs for signaled probabilities of reinforcement (Brown & White, 2005a).

Delay of Reinforcement Interest in the effects of delaying the delivery of the reinforcer for correct matching responses was sparked by the possibility that the forgetting function confounds the delay of the choice with a delay of the reinforcer for a correct choice, with both delays measured from the sample. That is, the forgetting function reflects the influence of the reinforcer delay (Weavers, Foster, & Temple, 1998), a possibility that has been shown to be incorrect. In studies reporting a systematic reduction in

Discriminability Log d

Remembering and Forgetting

4.5-s Reinforcer 1.5-s Reinforcer

1.5 1.0 0.5 0.0

0

4 8 12 Delay Interval (s)

Figure 18.7. Discriminability as a function of delay for different reinforcer magnitudes (duration of access to food signaled within sessions). Smooth curves are nonlinear least-squares fits of y = a · exp(b · √t). From “Delayed Matching-toSample Performance: Effects of Relative Reinforcer Frequency and of Signaled Versus Unsignaled Reinforcer Magnitudes, by D. McCarthy and P. Voss, 1995, Journal of the Experimental Analysis of Behavior, 63, p. 39. Copyright 1995 by the Society for the Experimental Analysis of Behavior, Inc. Adapted with permission.

discriminability with increasing reinforcer delay measured from the time of choice (McCarthy & Davison, 1986, 1991), reinforcer delays were varied only for a 0-second retention interval (see Sargisson & White, 2003, Figure 1). Sargisson and White (2003) varied retention interval within sessions and reinforcer delays across conditions and observed a substantial and systematic reduction in the intercepts of the forgetting functions with increasing reinforcer delay. Their analysis (Sargisson & White, 2003, Figure 4) demonstrated that varying the retention interval with the delay between sample and reinforcer held constant did not result in a constant level of discriminability. That is, the reduction in discriminability with increasing retention interval duration is not caused by an increase in the temporal distance between the sample and reinforcer.

Differential Outcomes The signaled magnitude effect and the signaled probability effect both involve the same magnitude or probability of reinforcement for correct choices 421

on a given trial type, but with two different trial types signaled by different cues within the experimental session. An alternative arrangement involves different outcomes for the two correct choices within a session. For example, correct choices of red result in one outcome, and correct choices of green result in another. The two outcomes may differ in quality (food vs. no food, food vs. water) or quantity (magnitude or probability). Compared with separate sessions in which outcomes of correct responses are the same, overall accuracy is higher when outcomes are different. This is known as the differential outcomes effect, and it has been extensively investigated in a range of species, including rats (Savage & Parsons, 1997; Trapold, 1970), dogs (Overmier, Bull, & Trapold, 1971), horses (Miyashita, Nakajima, & Imada, 2000), pigeons (Nevin, Ward, JimenezGomez, Odum, & Shahan, 2009), and humans (Estévez, Overmier, & Fuentes, 2003; Legge & Spetch, 2009). The differential outcomes effect has influenced the direction of theories of discrimination and their account of the role of the stimulus–reinforcer relation (Urcuioli, 2005). Jones and White (1994) reported a within-session differential outcomes effect in pigeons, in a procedure in which trials with different outcomes and trials with same outcomes were differentially signaled. Using the within- sessions procedure, Jones et al. (1995) showed that the differential outcomes effect was very different from the signaled magnitude effect. Whereas the signaled magnitude effect is manifest as a difference in the intercepts of the forgetting functions but not their slopes, the differential outcomes effect is best described as a difference in slopes, whereby the rate of forgetting for trials with different outcomes is less than that for trials with same outcomes. Figure 18.8 illustrates the difference in rates of forgetting for different- and same-outcome trials, for data taken from the within-sessions procedure of Jones and White (1994). Jones and White (1994) also reported a study in which pigeons acquired the discrimination without any prior experience and with training that included four delays ranging from 0.01 second to 8 seconds from the outset of training. For the first 10 sessions, performance was at chance at all delays and on both 422

Discriminability Logit p

K. Geoffrey White

2

Different Outcomes Same Outcomes

1

0

0

2

4

6

8

Delay Interval (s)

Figure 18.8. Discriminability as a function of delay and trials with different and same outcomes, replotted from data reported by Jones and White (1994). Smooth curves are nonlinear least-squares fits of y = a · exp(b · √t).

types of trials. By Session 30, a differential outcomes effect emerged, and by Sessions 60 through 80, it was strongly established. Figure 18.9 shows the result for Bird C5 in Jones and White’s study, plotted as percentage correct over six successive blocks of 10 sessions and a final block of 20 sessions. The main change in the course of the development of the differential outcomes effect was a progressive reduction in the rate of forgetting on different-outcomes trials (Figure 18.9, filled circles).

Resistance to Change The resistance to change of an operant response to extinction or other disruptors depends directly on rate of reinforcement and not on response rate (Nevin & Grace, 2000). Odum, Shahan, and Nevin (2005) applied resistance to change analysis to accuracy in delayed matching to sample by using a novel multiple schedule (or successive discrimination) procedure (also see Nevin, Milo, Odum, & Shahan, 2003). In the presence of two colors (red and green) presented in succession on a center key, responses produced delayed matching trials at variable intervals. The delayed matching trials, with retention intervals that varied in duration, used blue and yellow sample and comparison stimuli. Reinforcers for correct matching responses in red and green components were obtained with different probabilities. Both the responses to red and green stimuli and accuracy in the delayed matching trials were resistant to various disruptors (prefeeding, food in the

Remembering and Forgetting

11-20

21-30

31-40

41-50

51-60

61-80

Proportion Correct

Days 1-10

Delay Interval (s)

Figure 18.9. The development of the differential outcomes effect over 80 sessions with one pigeon from the study reported by Jones and White (1994). Proportion correct on different outcomes trials is shown as filled circles and on same outcomes trials as unfilled circles. From “An Investigation of the Differential-Outcomes Effect Within Sessions” by B. M. Jones and K. G. White, 1994, Journal of the Experimental Analysis of Behavior, 61, p. 399. Copyright 1994 by the Society for the Experimental Analysis of Behavior, Inc. Adapted with permission.

intercomponent interval, extinction) in the same way. Thus, accuracy in delayed matching to sample depends on reinforcement rate in the same way as does the rate of a single operant response. Brown and White (2009b) also reached this conclusion by using different measures of the strength of delayed matching performance when reinforcement probability and magnitude were varied, as did Nevin, Shahan, and Odum (2008), who demonstrated behavioral contrast for both response rate and discriminability in delayed matching. Nevin et al. (2009) used the multiple-schedule procedure to study the resistance to change of the differential outcomes effect (see preceding section). Responses in successive red and green components of the multiple schedule led to delayed matching trials with different or same outcomes, respectively. The different outcomes were two probabilities of reinforcers for correct yellow and blue choices. On same-outcome trials, reinforcer probabilities were the same for correct yellow and blue choices. Across three experiments, Nevin et al. observed a consistent differential outcomes effect. Resistance to disruption in the same-outcomes component was greater than in the different-outcomes component only when total reinforcement rate in the same- outcomes component was greater than that in the different-outcomes component. In other words, resistance to change of delayed matching accuracy was not affected by whether accuracy was enhanced by differential outcomes. Nevin et al. also showed a positive relation between the magnitude of the differential outcomes effect and total reinforcers on

ifferent-outcome trials as a ratio of total reinforcers d on same-outcome trials.

Local Proactive Interference Choice accuracy in delayed matching to sample is lower when the sample on the current trial differs from the sample on the preceding trial than when samples are the same across consecutive trials (Grant, 1975, 2000; Hogan, Edwards, & Zentall, 1981; Roberts, 1980). This intertrial agreement effect is a form of proactive interference because performance on the current trial is influenced by events on the previous trial (A. A. Wright, Urcuioli, & Sands, 1986). I include it here as a condition of the choice response because reinforcers for choice responses on the prior trial influence the choice on the current trial. Edhouse and White (1988) termed it local proactive interference to distinguish it from general proactive interference, in which accuracy is lower with shorter intertrial intervals (see the section Intertrial Interval Conditions later in this chapter). Local proactive interference is manifest as a steeper rate of forgetting (slope) on trials in which consecutive samples differ compared with when they are the same. It is nicely illustrated by the results reported by Williams, Johnston, and Saunders (2006), replotted in Figure 18.10. Williams et al. studied adults with mental retardation in a delayed matching-to-sample task with either two samples in each session or unique samples throughout each session (cf. A. A. Wright, 2007). The exponential functions in the square root of time fitted to the data replotted in Figure 18.10 differ in slope but not intercept and account for 96% of the variance. 423

Discriminability Logit p

K. Geoffrey White

Same Consecutive Trials Different Consecutive Trials 2

Relative Reinforcer Probability

1

0

0

4 8 12 Delay Interval (s)

16

Figure 18.10. Discriminability as a function of delay for consecutive trials with same or different samples. Smooth curves are nonlinear leastsquares fits of y = a · exp(b · √t). From “Intertrial Sources of Stimulus Control and Delayed Matching-toSample Performance in Humans,” by D. C. Williams, M. D. Johnston, and K. J. Saunders, 2006, Journal of the Experimental Analysis of Behavior, 86, p. 256. Copyright 2006 by the Society for the Experimental Analysis of Behavior, Inc. Adapted with permission.

Earlier theories of short-term memory in nonhuman animals suggested that proactive interference results from competition between conflicting traces of sample stimuli established on successive trials (Grant, 1975; Roberts & Grant, 1976) or from failure to discriminate the most recently seen sample (D’Amato, 1973). When accuracy levels are high, however, the sample on the previous trial is confounded with the choice. Therefore, what may appear to be an influence of the sample on the prior trial is actually an effect of the prior choice, as demonstrated by Roberts (1980) and Edhouse and White (1988). In these studies, accuracy on the current trial was higher when samples on consecutive trials were the same than when they differed, but only when the choice on the previous trial was correct (and thus reinforced). White, Parkinson, Brown, and Wixted (2004) arranged a reinforcer probability of .75, thus allowing for correct choices that went unreinforced on the previous trial. Accuracy was lower on consecutive trials with different samples than on trials with same samples, but only when the previous correct choice was reinforced. 424

That is, local proactive interference results from reinforcers for correct choices on the previous trial influencing choices on the current trial.

According to the generalized matching law (Baum, 1974), the log ratio of responses on two choice alternatives is a linear function of the log ratio of reinforcers for the choice responses. The slope of the function estimates the sensitivity of the response ratio to changes in the reinforcer ratios. The intercept provides a measure of bias to one or the other choice alternative. By varying the probability of reinforcement for correct matching separately for choices following each of the two sample stimuli, a pair of matching law functions can be plotted for choices at each retention interval. Consistent with the intuitively plausible view that at long retention intervals, the samples are less effective and choice is predominantly governed by the reinforcers, Hartl and Fantino (1996), Jones and White (1992), White and Wixted (1999), and Sargisson and White (2007a) showed that the matching law functions were steeper at long retention intervals than at short intervals. That is, across retention interval durations, there was an inverse relation between discriminability and sensitivity to reinforcement, consistent with the more general relation in conditional discriminations (Nevin, Cate, & Alsop, 1993; White, 1986). Other studies reporting a direct relation between discriminability and sensitivity to reinforcement (McCarthy & Davison, 1991; McCarthy & Voss, 1995) used a specific procedure to control relative reinforcer probabilities. The controlled reinforcement procedure is designed to maintain equivalence between obtained and arranged reinforcer ratios. When a reinforcer for a correct choice becomes available, it is held until that correct choice occurs, whereas correct choices of the alternate comparison go unreinforced. This procedure, however, reduces the number of reinforcers obtained at long delays for which there is low discriminability and many errors and generates a left–right bias that constrains the sensitivity of the choice between comparison stimuli (e.g., red vs. green) to variation in the relative probability of reinforcers for choices ( Jones & White,

Remembering and Forgetting

1992). When the controlled reinforcement procedure is used, low or near-zero sensitivity to reinforcement at long retention intervals results from a bias to choose one key (e.g., left) when (color- correlated) comparison stimuli alternate, thus generating indifference between comparisons independently of the reinforcer ratio arranged for choices. In the extreme, when discriminability in delayed matching is zero, choice between comparisons is expected to follow the usual matching law pattern, as it does when reinforcer probabilities are independent for the two choices. Sargisson and White (2007a) varied both relative reinforcer probability and reinforcer delay in delayed matching to sample. An increase in the delay of reinforcers from the choice reduces discriminability (see the section Delay of Reinforcement earlier in this chapter). Sargisson and White asked whether this reduction was a result of weakened contingency discriminability (knowing “what reinforcer goes with what response”; Davison & Nevin, 1999, p. 445) or impaired conditional discrimination owing to weakened association between sample and choice (White, 2002). They observed increasing sensitivity to the biasing effects of reinforcement as both retention interval and reinforcer delay increased, consistent with the general principle that factors that weaken the discrimination by weakening the association between sample and choice will also increase the biasing effect of reinforcers on choice. Intertrial Interval Conditions In studies of human memory, accurate performance is facilitated by spaced learning (Baddeley, 1997). Similarly, longer intervals between trials in non human delayed matching to sample result in higher matching accuracy (Edhouse & White, 1988; Kraemer & Roberts, 1984; Nelson & Wasserman, 1978; Roberts, 1980; Roberts & Kraemer, 1982; White, 1985). This effect was thought to be the result of decreasing interference from events on the previous trial, either through diminishing influence of competing traces (Grant, 1975; Roberts & Grant, 1976) or enhanced temporal discrimination of the most recently experienced sample (D’Amato, 1973). Edhouse and White (1988) varied both intertrial

interval duration and intertrial agreement and argued that the two effects were independent. Whereas the intertrial agreement effect, or local proactive interference, is manifest as a difference in the slope of forgetting functions, the intertrial spacing effect, or general proactive interference, influences only the intercept of forgetting functions. This conclusion was confirmed by White’s (1985) fitting of simple exponential functions to data from the study by Roberts and Kraemer (1982). For intertrial intervals of 4, 8, 16, and 32 seconds, intercepts increased systematically, whereas slopes did not change. When a normally dark intertrial interval is illuminated, the trial-spacing effect is eliminated (Santi, 1984), but the intertrial agreement effect persists (Edhouse & White, 1988, Experiment 2). Grant (2000) demonstrated persistence of the intertrial agreement effect over intertrial intervals of as long as 60 seconds. His claim that the effect was underestimated at short intertrial intervals is difficult to evaluate, however, owing to the mixing of different intertrial intervals within sessions, a procedure that tends to result in the averaging of intervals (Roberts & Kraemer, 1982). What is needed is a study in which the intertrial agreement effect is examined over very long intertrial intervals. Applications: Drug Effects A main area in which the quantitative analysis of forgetting has been applied concerns the effects of various drugs on behavior (see also Chapter 23, this volume). Much of the work has been published in neuroscience journals. By fitting a mathematical function such as the exponential function to accuracy or discriminability measures, the forgetting function can be summarized in terms of its intercept (initial discriminability) and slope (rate of forgetting). Many drugs influence the neurotrans mitter mechanisms presumed to be associated with remembering. A good example is the cholinergic antagonist, scopolamine, which reduces initial discriminability without affecting rate of forgetting in many drug studies of the cholinergic hypothesis for Alzheimer’s disease (White & Ruske, 2002). The reduction in initial discriminability caused by administration of scopolamine can be reversed by 425

K. Geoffrey White

administration of agonists (Harper, 2000; Ruske, Fisher, & White, 1997). Glucose administration can also reverse scopolamine-induced deficits as well as the reduction in initial discriminability that results from reducing the sample-response ratio requirement from five to one (Parkes & White, 2000). A second area of interest concerns recreational drugs. For example, Harper, Wisnewski, Hunt, and Schenk (2005) studied the effects of amphetamine, cocaine, and 3,4-methylenedioxymethamphetamine (ecstasy) on delayed matching performance in rats. In all cases, initial discriminability decreased with increasing dose, without affecting rate of forgetting. Lane et al. (2005) reported the first quantitative analysis of the effects of marijuana on forgetting functions in human delayed matching to sample. It is noteworthy that this carefully conducted study is one of the few to have shown an effect of drug administration on rate of forgetting but not initial discriminability, an effect that Lane et al. attributed to disruption of cannabinoid receptor function in the hippocampus. A third area of interest concerns drugs used in clinical settings. Here, too, there are instances of change in rate of forgetting as well as in initial discriminability. Examples are the effects of the antipsychotic chlorpromazine (Watson & Blampied, 1989) and the barbiturate phenobarbital (Watson & White, 1994). Increasing doses of the dopamine agonist methylphenidate, widely used to treat attention deficit disorder, reduces initial discriminability without affecting rate of forgetting (F. K. Wright & White, 2003). Behavioral Theories of Remembering Cognitive theories of short-term remembering in nonhumans are not considered here. They rely on mechanisms such as trace decay and rehearsal and temporal distinctiveness, which remain in vogue in current theorizing about human short-term memory ( Jonides et al., 2008; Suprenant & Neath, 2009) but which have proven less fruitful in studies with nonhumans. Unlike cognitive theories, behavioral theories of remembering and forgetting are characterized 426

by inclusion of reinforcement as a major determinant of performance. Accordingly, they can account for reinforcer influences on the forgetting function, including the signaled magnitude effect, the differential outcomes effect, and the effects of absolute reinforcer probability. The three behavioral theories I briefly summarize in this section were all published in the Journal of the Experimental Analysis of Behavior. All incorporate well-established principles of reinforcer control—specifically delay reduction, behavioral momentum, and the matching law—and all are able to predict forgetting functions, that is, a reduction in discriminability with increasing retention interval. The mechanisms proposed by the different models to predict the forgetting functions differ, however, and might ultimately provide the main basis for comparisons between the models. These mechanisms are delay reduction (Wixted, 1989), diffusion (White, 2002; White & Wixted, 1999), and disruption of attention to the stimulus as coded during the retention interval (Nevin, Davison, Odum, & Shahan, 2007).

Wixted (1989) Fantino (1977) proposed that the discriminative strength of a stimulus is given by the extent to which its onset reduces the delay to primary reinforcement. Wixted (1989) recognized that the sample stimuli in delayed matching to sample signal the presentation of the comparison stimuli that are intermittently associated with reinforcement. That is, in relation to the overall delay between one reinforcement and the next, onset of the sample reduces the delay by an amount that approximates the delay interval t. Following the delay-reduction formulation, Wixted represented the discriminative strength of the sample as (dr + γ)/(t + α). The term dr is the delay reduction quantity and equals the total time T between reinforcements minus the average delay d from onset of the sample to the choice, and dr happens to equal the intertrial interval. The constants γ and α allow for differential effectiveness of the intertrial interval and delay. A third parameter accounts for the discriminative strength of all other stimuli. Wixted calculated the strength of a sample relative to all other stimuli and predicted the proportion of correct choices by weighting reinforcer proportions

Remembering and Forgetting

for correct matching responses by the discriminative strength of the sample. The model predicts that as discriminative control by the sample decreases, control by the reinforcement proportion increases (cf. Jones & White, 1992). Fits of the model to a wide range of data accounted for high proportions of the variance in the data. In most cases, the major independent variable was delay interval duration, and the model did an excellent job of predicting the forgetting functions. One prediction of interest was the linear relation between proportion correct and the ratio of the intertrial interval to the delay interval. This relation was reported by Roberts and Kraemer (1982) from a comprehensive manipulation of both parameters and was predicted by Wixted’s model with high accuracy. The relation seems to fall out naturally from the delay reduction approach because the delay reduction quantity (T − d) is in most cases equivalent to the intertrial interval.

White and Wixted (1999) In a blend of signal detection theory and matching law, White and Wixted (1999) proposed a model that does not include the decision criterion of signal detection theory and applies the matching law to discrete trial-by-trial choices based on reinforcer distributions. They assumed that on each trial, the choice between comparison stimuli matches the proportion of reinforcers obtained in the past by those choices given a particular value of stimulus effect. The stimulus effect dimension varies from trial to trial, the probability of which is given by a pair of normal distributions, one for each sample. The reinforcer distributions that predict the choice responses are derived by multiplying the stimulus effect distributions by the arranged reinforcer probabilities. The model can be implemented in a spreadsheet by using normal distribution functions (also see Wixted & Gaitan, 2002). With only two free parameters (the distance between the stimulus effect distributions and their variance), the model predicts the inverse relation between discriminability and sensitivity of the ratio of choice responses to variation in the reinforcer ratio reported by Jones and White (1992) and Sargisson and White (2007a). The model also predicts the proactive interference effects of reinforcing the choice on the previous trial

(White et al., 2004) and the asymmetrical effects of retention intervals in signal detection versus recognition procedures (White & Wixted, 2010). To predict a reduction in discriminability with increasing delay, that is, the forgetting function, White and Wixted (1999) assumed that the variance of the distributions increased with increasing delay. White (2002) addressed the model’s shortcoming in not specifying the precise relation between variance and delay. He showed that the mathematical form of the forgetting function could be predicted by a specific diffusion function describing the increase over time of the variances of the stimulus effect distributions. The resulting model retains only two parameters. The distance between distribution means predicts the intercept of the forgetting function, and the rate of diffusion of the variances of the distributions predicts the slope of the forgetting function. Empirical evidence for the diffusion function has not yet been reported, however. Another shortcoming of the White and Wixted (1999) model is that because choices are based on ratios of reinforcers, the model cannot predict the increase in discriminability when the absolute probability or magnitude of reinforcement is increased. Brown and White (2009b) addressed this problem by including a parameter for extraneous reinforcement. The model’s overall strength is that it is based solely on distributions of reinforcer probabilities as well as extraneous reinforcement. The model’s potential weakness is its inability to predict the forgetting function without making an additional assumption about a diffusion process.

Nevin et al. (2007) Behavioral momentum theory (Nevin & Grace, 2000) suggests that response rate relative to a baseline is inversely related to the ratio of the reinforcer rate rs correlated with the stimulus situation in which responding is measured and to the overall average reinforcer rate ra. The reinforcer ratio is raised to an exponent b, which measures resistance to change to the reinforcer ratio. Nevin et al. (2007) proposed that attending to samples, p(As), and to comparison stimuli, p(Ac), is given by the following equations in which x and z are general background 427

K. Geoffrey White

disruptors and qt and vt are disruptors specific to the retention interval t: p(As) = exp[(−x · qt)/(rs /ra)b] and p(Ac) = exp[(−z · vt)/(rc /rs)b]. With variation in a parameter for sample– stimulus discriminability ds and different levels of background disruptors, x and z, the model predicts forgetting functions that differ in intercept and that are generally exponential in form. With ds held constant, and variation in the retention interval disruptors, q and v, the model predicts forgetting functions that vary in slope. Varying the parameters to reflect disruption of attention to different components of the task allows the successful prediction of the effects of relative and absolute reinforcer probability. The model has difficulty in providing accurate quantitative predictions for the signaled magnitude effect (McCarthy & Voss, 1995) and the differential outcomes effect (Nevin et al., 2009), but over a wide range of other data, the model makes impressively accurate predictions by assuming different levels of attention to samples, coded representations of the samples in the retention interval, and the comparisons. The Nevin et al. (2007) model has several features in common with Wixted’s (1989) model. One similarity is in the effects of the sensitivity parameters γ and α in Wixted’s model and the effects of x and z in Nevin et al.’s model. Another is the partitioning of the probability of attending versus not attending. A difference, however, is that Wixted’s model is based on reinforcer proportions and does not predict the change in discriminability that occurs when absolute reinforcer rate is varied (Brown & White, 2009b), whereas the Nevin et al. model is able to satisfactorily predict the reduction in discriminability with reduced overall reinforcer probability (Brown & White, 2005a). Remembering As Discrimination White (1985, 2001, 2002) has argued that remembering is a discrimination specific to the retention interval at which it occurs. In effect, the discrimination involves a compound consisting of the sample and comparison stimuli and also the delay that forms part of the context for remembering. That is, remembering is specific to the delay. Remembering 428

at one delay may be independent of remembering the same event at a much longer, or shorter, delay. To study delay-specific discrimination, White and Cooney (1996) trained pigeons in delayed matching tasks with 0.1-second and 4-second delays mixed randomly within sessions. In one set of conditions, choices of red and green comparison stimuli at the short delay were reinforced with different probabilities, creating a strong bias to choose the comparison associated with the higher reinforcer probability. Choices at the long delay were nondifferentially reinforced, and the bias at the short delay did not generalize to choices at the long delay. In another set of conditions, strong reinforcer biases at the long delay did not generalize to the nondifferentially reinforced choices at the short delay. In other words, performance at one delay was independent of factors influencing remembering at another.

Temporal Independence The conclusion that the discrimination made at one time may be independent of the discrimination made at another, that is, temporal independence, was supported by the result of another delayed matching task in which reinforcers at a particular delay were omitted (White, 2001). The result was a reduction in discriminability at the delay without reinforcers. This result was not surprising. What was surprising, however, was the increase in discriminability at longer delays. Compared with functions for which reinforcers were included at all delays, the result demonstrated that performance at one delay was independent of whether discriminability was higher or lower at a preceding delay. Temporal independence was also reported by Nakagawa, Etheridge, Foster, Sumpter, and Temple (2004). In one condition, they reinforced correct choices at an intermediate delay, and choices at both shorter and longer delays went unreinforced. The result was a nonmonotonic forgetting function, with highest discriminability at the intermediate delay. Discriminations made at one retention interval may be independent of discriminations made at another retention interval, just as two discriminations about the spatial aspects of stimuli may be independent. Fetterman (1996) discussed the advantages of treating remembering in the same

Remembering and Forgetting

Proportion of Green Choices

terms as discriminations between proximal stimuli. Temporal distance is a dimension of the stimulus complex that influences behavior along with other physical aspects of the event to be remembered and the stimulus context. To illustrate, Sargisson and White (2007b) made the discrimination of delay intervals an explicit requirement in a delayed matching task in which sample stimuli were a cross and a square and comparison stimuli were red and green. Following the cross, choices of red were reinforced at 1-second delays and choices of green were reinforced at 4-second delays. Following the square, choices of green were reinforced at 1-second delays and choices of red were reinforced at 4-second delays. After extensive training in this procedure, probe tests were conducted at 10 delays between 1 second and 4 seconds. The results from a replication in a later study (White & Sargisson, 2011) in which probe tests were included in a maintained test, averaged over four pigeons, are shown in Figure 18.11. The functions demonstrate conjoint Square Cross

1.0

0.5

0.0

1

2 3 Retention Interval (s)

4

Figure 18.11. Proportion of choices of green given square and cross samples as a function of delay interval during maintained testing in which, given square, choices of green were reinforced at 1-second delay and choices of red were reinforced at 4-second delay, and given cross, choices of red were reinforced at 1-second delay and choices of green were reinforced at 4-second delay. Smooth curves are nonlinear leastsquares fits of y = a · exp(b · √t). From “Maintained Generalization of Delay-Specific Remembering,” by K. G. White and R. J. Sargisson, 2011, Behavioural Processes, 87, p. 312. Copyright 2011 by Elsevier. Reprinted with permission.

control of comparison-stimulus choice by both the sample and the delay duration.

Delay-Specific Remembering Evidence for delay-specific matching comes from two studies in which training in delayed matching tasks included two delays and two different cues during the sample and delay interval, one correlated with the short delay and the other with the long delay (MacDonald & Grant, 1987; Wasserman et al., 1982). When the relation between the delays and the cues was switched in probe tests, accuracy at the long delay when it was cued by the short cue was higher than when it was cued by the long cue. More interesting, in the miscue condition of the probe tests, accuracy at the long delay was actually higher than at the short delay. Accuracy does not depend on how much time has passed but on the combination of the delay duration and stimulus conditions at the time of remembering. As Wixted (1989) noted, This interesting finding suggests that the strength of a discriminative stimulus may be delay specific only when one retention interval is employed. That is, a generalization gradient of discriminative strength may be conditioned around a particular delay such that it is strongest at the baseline delay and weaker at other delays (longer or shorter). (p. 416) Wixted’s (1989) suggestion was later confirmed by Sargisson and White (2001). They trained inexperienced pigeons in delayed matching to sample with just one delay from the outset of training: 0, 2, 4, or 6 seconds for different groups. Once a discrimination criterion had been attained, a single session was conducted with reinforced training trials and unreinforced probe trials with different delays between 0 second and 10 seconds, including the training delay. The results are shown in Figure 18.12. The delayinterval functions tend to peak at the training delay, reminiscent of generalization gradients along spatial dimensions (Honig & Urcuioli, 1981). They also flatten as the training delay becomes longer, the likely result of the scalar property of time, where two intervals 429

K. Geoffrey White

Training delay T=0s

1

0

T=2s

Discriminability

1

0

T=4s

1

Direct Remembering

0

T=6s

1

0

0

2 4 6 8 Probe Delay (s)

10

Figure 18.12. Dis criminability, log d, as a function of delay in probe trials with different delays, after exclusive training with just one delay (T). Curves are predictions from an equation with temporal distance and generalization components (White, 2001). From “Generalization of Delayed Matching-to-Sample Performance Following Training at Different Delays,” by R. J. Sargisson and K. G. White, 2001, Journal of the Experimental Analysis of Behavior, 75, p. 12. Copyright 2001 by the Society for the Experimental Analysis of Behavior, Inc. Reprinted with permission.

at long delays are less discriminable than otherwise equally spaced intervals at short delays. The curve described by White (2001) fitted to the data is a combination of a negative exponential function, which 430

describes the effect of temporal distance, and a generalization component, which follows Shepard’s (1987) exponential law of generalization. The resulting double exponential function, similar to the mathematical forgetting function suggested by Wicklegren (1969), closely fits data that follow the exponential to √t and perfectly fits data that follow negative exponential functions when the training delay T = 0. The composite function retains the characteristics of independent variation in intercept and slope of forgetting functions. Both parameters are influenced by temporal distance and generalization components. The reasonable fit of the equation reinforces the notion that remembering is a delay-specific discrimination with generalization along the temporal dimension.

Considerable interest in the question of memory has been sparked by the theoretical question of how to bridge the temporal gap between events and subsequent behavior and also by the practical problems (neurological, legal, aging, everyday) that arise when memory goes wrong. An important task of psychology is to address the practical issues. Perhaps, however, new insights for dealing with the practical problems will follow from an approach that the temporal gap does not have to bridged. Watkins (1990) complained that mediationist theories of memory that rely on a representation of an event embodied in a memory trace to bridge the temporal gap are flawed. He argued in favor of bringing out the role of the stimulus environment in determining memory. In the study of perception, two very general approaches have been taken. In one, perception involves active construction and the processing of information by the brain. In the other, perception is direct, as advocated by James J. Gibson (1979). The notion that remembering, too, might be direct is consistent with an emphasis on environmental causes of remembering and forgetting (Hackenberg, 1993). Briefly, in a theory of direct remembering, the individual system is tuned to resonate to information available at the time of retrieval through prior learning and evolution (White, 1991). If remembering is direct, the forgetting function reflects increasing temporal distance in the same way that errors of depth perception reflect increasing

Remembering and Forgetting

spatial distance. Similarly, errors of memory follow the same principles as errors of perception such as geometrical illusions, for which, as Gibson explained, the information creating the error is actually in the environment. Gibson, a self- confessed behaviorist, made a significant contribution to psychology of perception, and the extension of his views to memory has the potential to bring new light to many unresolved questions about remembering. Conclusion The experimental analysis of remembering has succeeded in its description of the effects of a range of variables on the function defining the relation between accuracy and temporal distance. Different parameters of the sample stimulus, such as its duration, repetition, and complexity, influence the intercept of the forgetting function. Conditions during the retention interval and at the time of remembering influence the slope of the forgetting function. Reinforcement parameters can influence both intercept and slope. Reinforcement variables have similar effects on both accuracy of remembering and the strength of a single response. Together, these findings support a general view that the complex making up the sample, the delay interval, and the choice is an integrated behavioral unit. Thus, remembering is a discrimination at the time of the choice response and follows the same principles that govern discrimination and generalization of other behavior. In general, the effect of the delay between the sample and the comparison stimuli is to make the discrimination more difficult (by analogy with the effect of spatial distance). By treating remembering as an integrated unit of behavior, seeking processes that bridge the temporal gap becomes unnecessary. The temporal gap is a component of the compound discriminative stimulus and is the most relevant aspect of the individual’s environment when it comes to remembering.

Alsop, B., & Jones, B. M. (2008). Reinforcer control by comparison-stimulus color and location in a delayed matching-to-sample task. Journal of the Experimental Analysis of Behavior, 89, 311–331. doi:10.1901/ jeab.2008-89-311 Baddeley, A. (1997). Human memory: Theory and practice (Rev. edition). Hove, England: Psychology Press. Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242. doi:10.1901/jeab.1974.22-231 Berman, M. G., Jonides, J., & Lewis, R. L. (2009). In search of decay in verbal short-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 317–333. doi:10.1037/a0014873 Berryman, R., Cumming, W. W., & Nevin, J. A. (1963). Acquisition of delayed matching in the pigeon. Journal of the Experimental Analysis of Behavior, 6, 101–107. doi:10.1901/jeab.1963.6-101 Blough, D. S. (1959). Delayed matching in the pigeon. Journal of the Experimental Analysis of Behavior, 2, 151–160. doi:10.1901/jeab.1959.2-151 Blough, D. S. (1996). Error factors in pigeon discrimination and delayed matching. Journal of Experimental Psychology: Animal Behavior Processes, 22, 118–131. doi:10.1037/0097-7403.22.1.118 Brown, G. S., & White, K. G. (2005a). On the effects of signalling reinforcer probability and magnitude. Journal of the Experimental Analysis of Behavior, 83, 119–128. doi:10.1901/jeab.2005.94-03 Brown, G. S., & White, K. G. (2005b). The optimal correction for estimating extreme discriminability. Behavior Research Methods, 37, 436–449. doi:10.3758/BF03192712 Brown, G. S., & White, K. G. (2005c). Remembering: The role of extraneous reinforcement. Learning and Behavior, 33, 309–323. doi:10.3758/BF03192860 Brown, G. S., & White, K. G. (2009a). Measuring discriminability when there are multiple sources of bias. Behavior Research Methods, 41, 75–84. doi:10.3758/ BRM.41.1.75 Brown, G. S., & White, K. G. (2009b). Reinforcer probability, reinforcer magnitude, and the reinforcement context for remembering. Journal of Experimental Psychology: Animal Behavior Processes, 35, 238–249. doi:10.1037/a0013864

References

Catania, A. C. (1992). Learning (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall.

Adamson, C., Foster, T. M., & McEwan, J. S. A. (2000). Delayed matching to sample: The effects of sample-set size on human performance. Behavioural Processes, 49, 149–161. doi:10.1016/S03766357(00)00087-5

Cook, R. G. (1980). Retroactive interference in pigeon short-term memory by a reduction in ambient illumination. Journal of Experimental Psychology: Animal Behavior Processes, 6, 326–338. doi:10.1037/00977403.6.4.326 431

K. Geoffrey White

D’Amato, M. R. (1973). Delayed matching and shortterm memory in monkeys. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory (Vol. 7, pp. 227–269). New York, NY: Academic Press. D’Amato, M. R., & O’Neill, W. (1971). Effect of delayinterval illumination level on matching behavior in the capuchin monkey. Journal of the Experimental Analysis of Behavior, 15, 327–333. doi:10.1901/ jeab.1971.15-327 Davison, M., & Nevin, J. A. (1999). Stimuli, reinforcers, and behavior. Journal of the Experimental Analysis of Behavior, 71, 439–482. doi:10.1901/jeab.1999.71-439 Davison, M. C., & Tustin, R. D. (1978). The relation between the generalized matching law and signaldetection theory. Journal of the Experimental Analysis of Behavior, 29, 331–336. doi:10.1901/jeab.1978.29-331 Dougherty, D. H., & Wixted, J. T. (1996). Detecting a nonevent: Delayed presence-versus-absence discrimination in pigeons. Journal of the Experimental Analysis of Behavior, 65, 81–92. doi:10.1901/ jeab.1996.65-81 Dunnett, S. B., & Martel, F. L. (1990). Proactive interference effects on short-term memory in rats: I. Basic parameters and drug effects. Behavioral Neuroscience, 104, 655–665. doi:10.1037/0735-7044.104.5.655 Ebbinghaus, H. (1964). Memory: A contribution to experimental psychology (H. A. Ruger & C. E. Bussenius, Trans.). New York, NY: Dover. (Original work published 1885) Edhouse, W. V., & White, K. G. (1988). Sources of proactive interference in animal memory. Journal of Experimental Psychology: Animal Behavior Processes, 14, 56–70. doi:10.1037/0097-7403.14.1.56 Estévez, A. F., Overmier, B., & Fuentes, L. J. (2003). Differential outcomes effect in children: Demonstration and mechanisms. Learning and Motivation, 34, 148–167. doi:10.1016/S00239690(02)00510-6 Fantino, E. (1977). Conditioned reinforcement: Choice and information. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 313–339). Englewood Cliffs, NJ: Prentice-Hall. Fetterman, J. G. (1995). The psychophysics of remembered duration. Animal Learning and Behavior, 23, 49–62. doi:10.3758/BF03198015

Foster, T. M., Temple, W., Mackenzie, C., DeMello, L. R., & Poling, A. (1995). Delayed matching-to-sample performance of hens: Effects of sample duration and response requirements during the sample. Journal of the Experimental Analysis of Behavior, 64, 19–31. doi:10.1901/jeab.1995.64-19 Gaitan, S. C., & Wixted, J. T. (2000). The role of “nothing” in memory for event duration in pigeons. Animal Learning and Behavior, 28, 147–161. doi:10.3758/BF03200250 Gibson, J. J. (1979). The ecological approach to visual perception. Boston, MA: Houghton Mifflin. Goldiamond, I. (1966). Perception, language, and conceptualization rules. In B. Kleinmuntz (Ed.), Problem solving: Research method and theory (pp. 183–224). New York, NY: Wiley. Goto, K., Kurashima, R., & Watanabe, S. (2010). Delayed matching-to-position performance in C57BL/6N mice. Behavioural Processes, 84, 591–597. doi:10.1016/j.beproc.2010.02.022 Grant, D. S. (1975). Proactive interference in pigeon short-term memory. Journal of Experimental Psychology: Animal Behavior Processes, 1, 207–220. doi:10.1037/0097-7403.1.3.207 Grant, D. S. (1976). Effect of sample presentation time on long-delay matching in the pigeon. Learning and Motivation, 7, 580–590. doi:10.1016/00239690(76)90008-4 Grant, D. S. (1988). Sources of visual interference in delayed matching-to-sample with pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 14, 368–375. doi:10.1037/0097-7403.14.4.368 Grant, D. S. (2000). Influence of intertrial interval duration on the intertrial agreement effect in delayed matching-to-sample with pigeons. Animal Learning and Behavior, 28, 288–297. doi:10.3758/BF03200262 Grant, D. S. (2006). Asymmetrical sample training and asymmetrical retention functions in one-to-one and many-to-one matching in pigeons. Learning and Motivation, 37, 209–229. doi:10.1016/j. lmot.2005.06.003 Hackenberg, T. D. (1993). Commonsense and conventional wisdom. Journal of the Experimental Analysis of Behavior, 60, 457–460. doi:10.1901/jeab.1993. 60-457

Fetterman, J. G. (1996). Dimensions of stimulus complexity. Journal of Experimental Psychology: Animal Behavior Processes, 22, 3–18. doi:10.1037/00977403.22.1.3

Harnett, P., McCarthy, D. C., & Davison, M. C. (1984). Delayed signal detection, differential reinforcement, and short-term memory in the pigeon. Journal of the Experimental Analysis of Behavior, 42, 87–111. doi:10.1901/jeab.1984.42-87

Fetterman, J. G., & MacEwen, D. (1989). Short-term memory for responses: The “choose-small” effect. Journal of the Experimental Analysis of Behavior, 52, 311–324. doi:10.1901/jeab.1989.52-311

Harper, D. N. (2000). An assessment and comparison of the effects of oxotremorine, D-cycloserine, and bicuculline on delayed matching-to-sample performance in rats. Experimental and Clinical

432

Remembering and Forgetting

Psychopharmacology, 8, 207–215. doi:10.1037/10641297.8.2.207 Harper, D. N., & White, K. G. (1997). Retroactive interference and rate of forgetting in delayed matching-tosample performance. Animal Learning and Behavior, 25, 158–164. doi:10.3758/BF03199053 Harper, D. N., Wisnewski, R., Hunt, M., & Schenk, S. (2005). (±)3,4-methylenedioxymethamphetamine, d-amphetamine and cocaine impair delayed matching-to-sample performance via an increase in susceptibility to proactive interference. Behavioral Neuroscience, 119, 455–463. doi:10.1037/07357044.119.2.455 Hartl, J., & Fantino, E. (1996). Choice as a function of reinforcement ratios in delayed matching to sample. Journal of the Experimental Analysis of Behavior, 66, 11–27. doi:10.1901/jeab.1996.66-11 Herman, L. M. (1975). Interference and auditory shortterm memory in the bottlenosed dolphin. Animal Learning and Behavior, 3, 43–48. doi:10.3758/ BF03209097 Hogan, D. E., Edwards, C. A., & Zentall, T. R. (1981). Delayed matching in the pigeon: Interference produced by the prior delayed matching trial. Animal Learning and Behavior, 9, 395–400. doi:10.3758/ BF03197849 Honig, W. K., & Urcuioli, P. J. (1981). The legacy of Guttman and Kalish (1956): 25 years of research on stimulus generalization. Journal of the Experimental Analysis of Behavior, 36, 405–445. doi:10.1901/ jeab.1981.36-405 Jans, J. E., & Catania, A. C. (1980). Short-term remembering of discriminative stimuli in pigeons. Journal of the Experimental Analysis of Behavior, 34, 177–183. doi:10.1901/jeab.1980.34-177 Jones, B. M. (2003). Quantitative analysis of matchingto-sample performance. Journal of the Experimental Analysis of Behavior, 79, 323–350. doi:10.1901/ jeab.2003.79-323 Jones, B. M., & White, K. G. (1992). Stimulus discriminability and sensitivity to reinforcement in delayed matching-to-sample. Journal of the Experimental Analysis of Behavior, 58, 159–172. doi:10.1901/ jeab.1992.58-159

and brain of short-term memory. Annual Review of Psychology, 59, 193–224. doi:10.1146/annurev. psych.59.103006.093615 Kangas, B. D., Vaidya, M., & Branch, M. N. (2010). Titrating-delay matching-to-sample in the pigeon. Journal of the Experimental Analysis of Behavior, 94, 69–81. doi:10.1901/jeab.2010.94-69 Kendrick, D. F., Rilling, M. E., & Denny, M. R. (Eds.). (1986). Theories of animal memory. Hillsdale, NJ: Erlbaum. Kraemer, P. J., & Roberts, W. A. (1984). Short-term memory for visual and auditory stimuli in the pigeons. Animal Learning and Behavior, 12, 275–284. doi:10.3758/BF03199968 Lane, S. D., Cherek, L. M. L., & Tcheremissine, O. V. (2005). Marijuana effects on human forgetting functions. Journal of the Experimental Analysis of Behavior, 83, 67–83. doi:10.1901/jeab.2005.22-04 Lazareva, O. F., & Wasserman, E. A. (2009). Effects of stimulus duration and choice delay on visual categorization in pigeons. Learning and Motivation, 40, 132–146. doi:10.1016/j.lmot.2008.10.003 Legge, E. L. G., & Spetch, M. L. (2009). The differential outcomes effect (DOE) in spatial localization: An investigation with adults. Learning and Motivation, 40, 313–328. doi:10.1016/j.lmot.2009.03.002 Loftus, G. R. (1985). Evaluating forgetting curves. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 817–820. doi:10.1037/02787393.11.1-4.817 MacDonald, S. E., & Grant, D. S. (1987). Effects of signaling retention interval length on delayed matching-to-sample in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 13, 116–125. doi:10.1037/0097-7403.13.2.116 Macmillan, N. A., & Creelman, C. D. (1991). Detection theory: A user’s guide. New York, NY: Cambridge University Press. Mazur, J. E. (2006). Mathematical models and the experimental analysis of behavior. Journal of the Experimental Analysis of Behavior, 85, 275–291. doi:10.1901/jeab.2006.65-05

Jones, B. M., & White, K. G. (1994). An investigation of the differential-outcomes effect within sessions. Journal of the Experimental Analysis of Behavior, 61, 389–406. doi:10.1901/jeab.1994.61-389

McCarthy, D., & Davison, M. (1986). Delayed reinforcement and delayed choice in symbolic matching to sample: Effects on stimulus discriminability. Journal of the Experimental Analysis of Behavior, 46, 293–303. doi:10.1901/jeab.1986.46-293

Jones, B. M., White, K. G., & Alsop, B. (1995). On two effects of signaling the consequences for remembering. Animal Learning and Behavior, 23, 256–272. doi:10.3758/BF03198922

McCarthy, D. C., & Davison, M. (1991). The interaction between stimulus and reinforcer control on remembering. Journal of the Experimental Analysis of Behavior, 56, 51–66. doi:10.1901/jeab.1991.56-51

Jonides, J., Lewis, R. L., Nee, D. E., Lustig, C. A., Berman, M. G., & Moore, K. S. (2008). The mind

McCarthy, D., & Voss, P. (1995). Delayed matching-tosample performance: Effects of relative reinforcer 433

K. Geoffrey White

f requency and of signaled versus unsignaled reinforcer magnitudes. Journal of the Experimental Analysis of Behavior, 63, 33–51. doi:10.1901/ jeab.1995.63-33 Miyashita, Y., Nakajima, S., & Imada, H. (2000). Differential outcome effect in the horse. Journal of the Experimental Analysis of Behavior, 74, 245–253. doi:10.1901/jeab.2000.74-245 Nairne, J. S. (2002). Remembering over the short-term: The case against the standard model. Annual Review of Psychology, 53, 53–81. doi:10.1146/annurev. psych.53.100901.135131 Nakagawa, S., Etheridge, R. J. M., Foster, T. M., Sumpter, C. E., & Temple, W. (2004). The effects of changes in consequences on hens’ performance in delayedmatching-to-sample tasks. Behavioural Processes, 67, 441–451. doi:10.1016/j.beproc.2004.07.005 Nelson, K. R., & Wasserman, E. A. (1978). Temporal factors influencing the pigeon’s successive matchingto-sample performance: Sample duration, intertrial interval, and retention interval. Journal of the Experimental Analysis of Behavior, 30, 153–162. doi:10.1901/jeab.1978.30-153 Nevin, J. A. (1981). Psychophysics and reinforcement schedules. In M. L. Commons, J. E. Mazur, J. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior: Vol. 1. Discriminative properties of reinforcement schedules (pp. 3–27). Hillsdale, NJ: Erlbaum. Nevin, J. A., Cate, H., & Alsop, B. (1993). Effects of differences between stimuli, responses and reinforcer rates on conditional discrimination performance. Journal of the Experimental Analysis of Behavior, 59, 147–161. doi:10.1901/jeab.1993.59-147

Journal of Experimental Psychology: Human Experimental Psychology, 61(A), 1400–1409. doi:10.1080/17470210701557597 Nevin, J. A., Ward, R. D., Jimenez-Gomez, C., Odum, A. L., & Shahan, T. A. (2009). Differential outcomes enhance accuracy of delayed matching to sample but not resistance to change. Journal of Experimental Psychology: Animal Behavior Processes, 35, 74–91. doi:10.1037/a0012926 Odum, A. L., Shahan, T. A., & Nevin, J. A. (2005). Resistance to change of forgetting functions and response rates. Journal of the Experimental Analysis of Behavior, 84, 65–75. doi:10.1901/jeab.2005.112-04 Overmier, J. B., Bull, J. A., III, & Trapold, M. A. (1971). Discriminative cue properties of different fears and their role in response selection in dogs. Journal of Comparative and Physiological Psychology, 76, 478– 482. doi:10.1037/h0031403 Parkes, M., & White, K. G. (2000). Glucose attenuation of memory impairments. Behavioral Neuroscience, 114, 307–319. doi:10.1037/0735-7044.114.2.307 Peterson, L. R., & Peterson, M. (1959). Short-term retention of individual items. Journal of Experimental Psychology, 58, 193–198. doi:10.1037/h0049234 Rayburn-Reeves, R., & Zentall, T. R. (2009). Animal memory: The contribution of generalization decrement to delayed conditional discrimination retention functions. Learning and Behavior, 37, 299–304. doi:10.3758/LB.37.4.299 Roberts, W. A. (1972). Short-term memory in the pigeon: Effects of repetition and spacing. Journal of Experimental Psychology, 94, 74–83. doi:10.1037/h0032796

Nevin, J. A., Davison, M., Odum, A. L., & Shahan, T. A. (2007). A theory of attending, remembering, and reinforcement in delayed matching to sample. Journal of the Experimental Analysis of Behavior, 88, 285–317. doi:10.1901/jeab.2007.88-285

Roberts, W. A. (1980). Distribution of trials and intertrial retention in delayed matching to sample with pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 6, 217–237. doi:10.1037/00977403.6.3.217

Nevin, J. A., & Grace, R. C. (2000). Behavioral momentum and the law of effect. Behavioral and Brain Sciences, 23, 73–90. doi:10.1017/ S0140525X00002405

Roberts, W. A. (1998). Principles of animal cognition. Boston, MA: McGraw-Hill.

Nevin, J. A., & Grosch, J. (1990). Effects of signaled reinforcer magnitude on delayed matching-tosample performance. Journal of Experimental Psychology: Animal Behavior Processes, 16, 298–305. doi:10.1037/0097-7403.16.3.298 Nevin, J. A., Milo, J., Odum, A. L., & Shahan, T. A. (2003). Accuracy of discrimination, rate of responding, and resistance to change. Journal of the Experimental Analysis of Behavior, 79, 307–321. doi:10.1901/jeab.2003.79-307 Nevin, J. A., Shahan, T. A., & Odum, A. L. (2008). Contrast effects in response rate and accuracy of delayed matching to sample. Quarterly 434

Roberts, W. A., & Grant, D. S. (1976). Studies of shortterm memory in the pigeon using the delayed matching-to-sample procedure. In D. L. Medin, W. A. Roberts, & R. T. Davis (Eds.), Processes of animal memory (pp. 79–112). Hillsdale, NJ: Erlbaum. Roberts, W. A., & Grant, D. S. (1978). An analysis of light-induced retroactive inhibition in pigeon short-term memory. Journal of Experimental Psychology: Animal Behavior Processes, 4, 219–236. doi:10.1037/0097-7403.4.3.219 Roberts, W. A., & Kraemer, P. J. (1982). Some observations of the effects of intertrial interval and delay on delayed matching to sample in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 8, 342–353. doi:10.1037/0097-7403.8.4.342

Remembering and Forgetting

Roediger, H. L. (2008). Relativity of remembering: Why the laws of memory vanished. Annual Review of Psychology, 59, 225–254. doi:10.1146/annurev. psych.57.102904.190139 Rubin, D. C., & Wenzel, A. E. (1996). One hundred years of forgetting: A quantitative description of retention. Psychological Review, 103, 734–760. doi:10.1037/0033-295X.103.4.734 Ruske, A. C., Fisher, A., & White, K. G. (1997). Attenuation of scopolamine-induced deficits in delayed-matching performance by a new muscarinic agonist. Psychobiology, 25, 313–320. Ruske, A. C., & White, K. G. (1999). Facilitation of memory performance by a novel muscarinic agonist in young and old rats. Pharmacology, Biochemistry and Behavior, 63, 663–667. doi:10.1016/S00913057(99)00037-4 Sands, S. F., Lincoln, C. E., & Wright, A. A. (1982). Pictorial similarity judgments and the organization of visual memory in the rhesus monkey. Journal of Experimental Psychology: General, 111, 369–389. doi:10.1037/0096-3445.111.4.369 Santi, A. (1984). The trial spacing effect in delayed matching-to-sample by pigeons is dependent upon the illumination condition during the intertrial interval. Canadian Journal of Psychology/Revue Canadienne de Psychologie, 38, 154–165. doi:10.1037/h0080830 Sargisson, R. J., & White, K. G. (2001). Generalization of delayed matching-to-sample performance following training at different delays. Journal of the Experimental Analysis of Behavior, 75, 1–14. doi:10.1901/jeab.2001.75-1 Sargisson, R. J., & White, K. G. (2003). The effect of reinforcer delays on the form of the forgetting function. Journal of the Experimental Analysis of Behavior, 80, 77–94. doi:10.1901/jeab.2003.80-77 Sargisson, R. J., & White, K. G. (2007a). Remembering as discrimination in delayed matching to sample: Discriminability and bias. Learning and Behavior, 35, 177–183. doi:10.3758/BF03193053 Sargisson, R. J., & White, K. G. (2007b). Timing, remembering, and discrimination. Journal of the Experimental Analysis of Behavior, 87, 25–37. doi:10.1901/jeab.2007.25-05 Savage, L. M., & Parsons, J. (1997). The effect of delay interval, intertrial interval, amnestic drugs, and differential outcomes on matching-to-position in rats. Psychobiology, 25, 303–312.

intervals. Psychonomic Bulletin and Review, 5, 516– 522. doi:10.3758/BF03208831 Shimp, C. P., & Moffitt, M. (1977). Short-term memory in the pigeon: Delayed pair-comparison procedures and some results. Journal of the Experimental Analysis of Behavior, 28, 13–25. doi:10.1901/jeab.1977.28-13 Sidman, M. A. (1960). Tactics of scientific research. New York, NY: Basic Books. Spetch, M. L., & Wilkie, D. M. (1982). A systematic bias in pigeons’ memory for food and light durations. Behaviour Analysis Letters, 2, 267–274. Spetch, M. L., & Wilkie, D. M. (1983). Subjective shortening: A model of pigeons’ memory for event duration. Journal of Experimental Psychology: Animal Behavior Processes, 9, 14–30. doi:10.1037/00977403.9.1.14 Suprenant, A. M., & Neath, I. (2009). Principles of memory. New York, NY: Psychology Press. Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Review: Monograph Supplements, 2, i–109. Trapold, M. A. (1970). Are expectancies based upon different positive reinforcing events discriminably different? Learning and Motivation, 1, 129–140. doi:10.1016/0023-9690(70)90079-2 Urcuioli, P. J. (2005). Behavioral and associative effects of differential outcomes on discrimination learning. Learning and Behavior, 33, 1–21. doi:10.3758/ BF03196047 Urcuioli, P. J., & DeMarse, T. B. (1997). Memory processes in delayed spatial discriminations: Response intentions or response mediation? Journal of the Experimental Analysis of Behavior, 67, 323–336. doi:10.1901/jeab.1997.67-323 Vonk, J., & MacDonald, S. E. (2004). Levels of abstraction in orangutan (Pongo abelii) categorization. Journal of Comparative Psychology, 118, 3–13. doi:10.1037/0735-7036.118.1.3 Ward, R. D., & Odum, A. L. (2007). Disruption of temporal discrimination and the choose-short effect. Learning and Behavior, 35, 60–70. doi:10.3758/ BF03196075 Wasserman, E. A. (1993). Comparative cognition: Beginning the second century of the study of animal intelligence. Psychological Bulletin, 113, 211–228. doi:10.1037/0033-2909.113.2.211

Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317–1323. doi:10.1126/science.3629243

Wasserman, E. A., Grosch, J., & Nevin, J. A. (1982). Effects of signaled retention intervals on pigeon short-term memory. Animal Learning and Behavior, 10, 330–338. doi:10.3758/BF03213719

Sherburne, L. M., Zentall, T. R., & Kaiser, D. H. (1998). Timing in pigeons: The choose-short effect may result from “confusion” between delay and intertrial

Wasserman, E. A., Kiedinger, R. E., & Bhatt, R. S. (1988). Conceptual behavior in pigeons: Categories, subcategories, and pseudocategories. Journal of Experimental 435

K. Geoffrey White

Psychology: Animal Behavior Processes, 14, 235–246. doi:10.1037/0097-7403.14.3.235 Watkins, M. J. (1990). Mediationism and the obfuscation of memory. American Psychologist, 45, 328–335. doi:10.1037/0003-066X.45.3.328 Watson, J. E., & Blampied, N. M. (1989). Quantification of the effects of chlorpromazine on performance under delayed matching to sample in pigeons. Journal of the Experimental Analysis of Behavior, 51, 317–328. doi:10.1901/jeab.1989.51-317 Watson, J. E., & White, K. G. (1994). The effect of phenobarbital on rate of forgetting and proactive interference in delayed matching to sample. Psychobiology, 22, 31–36. Weavers, R., Foster, T. M., & Temple, W. (1998). Reinforcer efficacy in a delayed matching-to-sample task. Journal of the Experimental Analysis of Behavior, 69, 77–85. doi:10.1901/jeab.1998.69-77 White, K. G. (1974). Temporal integration in the pigeon. British Journal of Psychology, 65, 437–444. doi:10.1111/j.2044-8295.1974.tb01417.x White, K. G. (1985). Characteristics of forgetting functions in delayed matching to sample. Journal of the Experimental Analysis of Behavior, 44, 15–34. doi:10.1901/jeab.1985.44-15 White, K. G. (1986). Conjoint control of performance in conditional discriminations by successive and simultaneous stimuli. Journal of the Experimental Analysis of Behavior, 45, 161–174. doi:10.1901/jeab.1986.45-161 White, K. G. (1991). Psychophysics of direct remembering. In J. A. Commons, M. C. Davison, & J. A. Nevin (Eds.), Models of behavior: Signal detection (pp. 221– 237). New York, NY: Erlbaum. White, K. G. (2001). Forgetting functions. Animal Learning and Behavior, 29, 193–207. doi:10.3758/BF03192887 White, K. G. (2002). Psychophysics of remembering: The discrimination hypothesis. Current Directions in Psychological Science, 11, 141–145. doi:10.1111/1467-8721.00187 White, K. G., & Cooney, E. B. (1996). The consequences of remembering: Independence of performance at different retention intervals. Journal of Experimental Psychology Animal Behavior Processes, 22, 51–59. doi:10.1037/0097-7403.22.1.51 White, K. G., & Harper, D. N. (1996). Quantitative reanalysis of lesion effects on rate of forgetting in macaques. Behavioural Brain Research, 74, 223–227. doi:10.1016/0166-4328(95)00172-7 White, K. G., Juhasz, J. B., & Wilson, P. J. (1973). Is man no more than this? Evaluative bias in interspecies comparison. Journal of the History of the Behavioral Sciences, 9, 203–212. doi:10.1002/1520-6696(197307)9:33.0.CO;2-F 436

White, K. G., & McKenzie, J. (1982). Delayed stimulus control: Recall for single and relational stimuli. Journal of the Experimental Analysis of Behavior, 38, 305–312. doi:10.1901/jeab.1982.38-305 White, K. G., Parkinson, A. E., Brown, G. S., & Wixted, J. T. (2004). Local proactive interference in delayed matching to sample: The role of reinforcement. Journal of Experimental Psychology: Animal Behavior Processes, 30, 83–95. doi:10.1037/0097-7403.30.2.83 White, K. G., & Ruske, A. C. (2002). Memory deficits in Alzheimer’s disease: The encoding hypothesis and cholinergic function. Psychonomic Bulletin and Review, 9, 426–437. doi:10.3758/BF03196301 White, K. G., Ruske, A. C., & Colombo, M. (1996). Memory procedures, performance and processes in pigeons. Cognitive Brain Research, 3, 309–317. doi:10.1016/0926-6410(96)00016-X White, K. G., & Sargisson, R. J. (2011). Maintained generalization of delay-specific remembering. Behavioural Processes, 87, 310–313. doi:10.1016/j. beproc.2011.06.004 White, K. G., & Wixted, J. T. (1999). Psychophysics of remembering. Journal of the Experimental Analysis of Behavior, 71, 91–113. doi:10.1901/jeab.1999.71-91 White, K. G., & Wixted, J. T. (2010). Psychophysics of remembering: To bias or not to bias? Journal of the Experimental Analysis of Behavior, 94, 83–94. Wickelgren, W. A. (1969). Associative strength theory of recognition memory for pitch. Journal of Mathematical Psychology, 6, 13–61. doi:10.1016/0022-2496(69)90028-5 Wickens, T. D. (1998). On the form of the retention function: Comment on Rubin and Wenzel (1996): A quantitative description of retention. Psychological Review, 105, 379–386. Wilkie, D. M., Summers, R. J., & Spetch, M. L. (1981). Effect of delay-interval stimuli on delayed symbolic matching to sample in the pigeon. Journal of the Experimental Analysis of Behavior, 35, 153–160. doi:10.1901/jeab.1981.35-153 Williams, D. C., Johnston, M. D., & Saunders, K. J. (2006). Intertrial sources of stimulus control and delayed matching-to-sample performance in humans. Journal of the Experimental Analysis of Behavior, 86, 253–267. doi:10.1901/jeab.2006.67-01 Wixted, J. T. (1989). Nonhuman short-term memory: A quantitative reanalysis of selected findings. Journal of the Experimental Analysis of Behavior, 52, 409–426. doi:10.1901/jeab.1989.52-409 Wixted, J. T. (1990). Analyzing the empirical course of forgetting. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 927–935. doi:10.1037/0278-7393.16.5.927

Remembering and Forgetting

Wixted, J. T. (1993). A signal detection analysis of memory for nonoccurrence in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 19, 400–411. doi:10.1037/0097-7403.19.4.400 Wixted, J. T. (2004a). On common ground: Jost’s (1897) law of forgetting and Ribot’s (1891) law of retrograde amnesia. Psychological Review, 111, 864–879. doi:10.1037/0033-295X.111.4.864 Wixted, J. T. (2004b). The psychology and neuroscience of forgetting. Annual Review of Psychology, 55, 235–269. doi:10.1146/annurev.psych.55.090902. 141555 Wixted, J. T., & Carpenter, S. K. (2007). The Wickelgren power law and the Ebbinghaus savings function. Psychological Science, 18, 133–134. doi:10.1111/ j.1467-9280.2007.01862.x Wixted, J. T., & Ebbesen, E. B. (1991). On the form of forgetting. Psychological Science, 2, 409–415. doi:10.1111/j.1467-9280.1991.tb00175.x Wixted, J. T., & Gaitan, S. C. (2002). Cognitive theories as reinforcement history surrogates: The case of likelihood ratio models of human recognition memory. Animal Learning and Behavior, 30, 289–305. doi:10.3758/BF03195955

Woodworth, R. S., & Schlosberg, H. (1954). Experimental psychology (rev. ed.). New York, NY: Holt, Rinehart & Winston. Wright, A. A. (2007). An experimental analysis of memory processing. Journal of the Experimental Analysis of Behavior, 88, 405–433. doi:10.1901/jeab.2007.88-405 Wright, A. A., Urcuioli, P. J., & Sands, S. F. (1986). Proactive interference in animal memory research. In D. F. Kendrick, M. E. Rilling, & M. R. Denny (Eds.), Theories of animal memory (pp. 101–125). Hillsdale, NJ: Erlbaum. Wright, F. K., & White, K. G. (2003). Effects of methylphenidate on working memory in pigeons. Cognitive, Affective and Behavioral Neuroscience, 3, 300–308. doi:10.3758/CABN.3.4.300 Zentall, T. R., & Sherburne, L. M. (1994). The role of differential sample responding in the differential outcomes effect involving delayed matching by pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 20, 390–401. doi:10.1037/00977403.20.4.390 Zokoll, M. A., Naue, N., Herrmann, C. S., & Langemann, U. (2008). Auditory memory: A comparison between humans and starlings. Brain Research, 1220, 33–46. doi:10.1016/j.brainres.2008.01.049

437

Chapter 19

The Logic and Illogic of Human Reasoning Edmund Fantino and Stephanie Stolarz-Fantino

The Oxford Dictionary of Psychology defines logic in a general way as “rational thinking as distinct from irrationality” (Colman, 2009, p. 429). It defines rational choices or decisions as “those that are in the best interests of the agent who makes them” and that “maximize expected utility” (Colman, 2009, p. 636). It characterizes rational beliefs as being internally consistent, rational preferences as being transitive, and rational inferences as obeying the rules of logic—the set of rules according to which valid conclusions can be drawn from sets of premises. Reason was similarly defined by neurologist Donald B. Calne (1999) in his book Within Reason: Reason is built upon a platform of logical induction (observations lead to conclusions which allow predictions) and logical deduction (if a and b are two classes and a is contained in b, then x is in a implies that x is in b). Reason assigns priority to observation over theory (Galileo’s knife) and simplicity over complexity (Ockham’s [sic] razor); it also demands consistency, coherence, and efficiency. (p. 286) Given that human beings are considered the most rational of animals (e.g., Huber, 2009), their logical lapses can be astonishing. For example, in his book How We Know What Isn’t So, a discussion of the “fallibility of human reason in everyday life,” Gilovich (1991) pointed out that in the United States, more people believed in extrasensory

erception than in evolution and that astrologers p outnumbered astronomers 20-fold. Every day’s newspaper reveals examples of questionable beliefs held by experienced professionals as well as by poorly informed laypeople. These examples of illogic do not necessarily reflect lack of intellectual tools for dealing with the relevant data. Instead, as Gilovich argued, it is more likely that “our questionable beliefs derive primarily from the misapplication or overutilization of generally valid and effective strategies for knowing” (p. 2). In this chapter, we present examples that support this view and that illustrate factors that help or hinder human decision making and problem solving. Behavioral Approaches to DecisionMaking Errors Behavioral approaches have proven fruitful in studying errors of decision making. Several examples are discussed in the sections that follow.

Probability Matching Some decision-making problems to which participants respond in a nonoptimal fashion were developed in the laboratory. Perhaps the most intensely studied is probability matching, whose provenance dates back at least to the pioneering studies of Humphreys (1939). In probability matching (or probability learning) experiments, participants are typically given repeated trials of a binary choice procedure in which one outcome pays off with a probability of

DOI: 10.1037/13937-019 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

439

Fantino and Stolarz-Fantino

p and the other pays off with a probability of 1 − p. For values of p different than .5, the participant should always choose the outcome with the higher probability—that is, when p is more than .5, the outcome associated with this probability should always be chosen. Assume that a green button is associated with a payoff probability of .75, and a red button is associated with a probability of .25. The participant will be correct on 75% of trials by always choosing green. Any other systematic strategy will lead to a lesser payoff. Despite these costs, however, human participants typically match their choices to the respective probabilities, choosing the green button on 75% of the trials and the red button on the other 25% of trials. The result of this probability matching is a decrease in correct outcomes; instead of being correct on 75% of the trials, the probability matcher is correct on (.75 × .75) + (.25 × .25) = .625, or 63% of trials. This result has been shown in literally scores of studies with human participants (whereas pigeons and rats tend to respond optimally). Probability matching may persist over hundreds of trials (see Myers, 1976, for a discussion of the early work). Fantino and Esfandiari (2002) found that participants moved from matching to something between matching and maximizing over the course of two 96-trial blocks. The most optimal responding was by participants who received accurate taskrelated instructions regarding the actual probabilities of reinforcement associated with each of the two stimuli. A second group also showed significant improvement: They were asked to advise others on the best strategy to follow. Presumably, once they had articulated an effective strategy for increasing reinforcement, they began to follow it more consistently themselves. Other decision-making problems have been developed with paper-and-pencil scenarios, for example, those investigating the sunk-cost effect and base-rate neglect. To study these problems using a behavioral approach, it is desirable to devise behavioral analogues. We discuss these two problems because in each case analogous data have been collected from both humans and pigeons. Moreover, in one case the results for the two species appear comparable (sunk cost), whereas in the other case they are ostensibly not (base rate). 440

Sunk-Cost Effect People become more likely to persist in questionable courses of action once they have made an investment. This sunk-cost effect has interested researchers because it involves the inclusion of past costs into decision making, which counters the maxim that choices should be based on marginal costs and benefits—that is, on an assessment of costs and benefits from the current point onward. Although Arkes and Ayton (1999) showed that there are no clear examples of sunk-cost behavior among non humans, certain lines of research with human participants have suggested the possibility that nonhuman animals could display this effect. For example, reinforcement history has been shown to affect suboptimal persistence in an investment (Goltz, 1992, 1999). Both the partial reinforcement extinction effect (Goltz, 1992) and behavioral momentum (Goltz, 1999) have been implicated as mechanisms through which reinforcement history could result in persistence. Thus, to explore conditions of uncertainty and reinforcement history under which pigeons might persist in a losing course of action, Navarro and Fantino (2005) designed a procedure that models the sunk-cost decision scenario. They defined such a scenario as one in which an investment has been made toward a goal, negative feedback concerning the investment has been received, and the investor can persist in the investment or abandon it in favor of a new one. In their procedure, pigeons began a trial by pecking on a key for food. The schedule on the food key arranged a course of action with initially good prospects that turned unfavorable. On a given trial, one of four fixed-ratio (FR) schedules was in effect: short (10), medium (40), long (80), or extra long (160). On half the trials, the short ratio was in effect; on a quarter of the trials, the medium ratio was in effect; and on a quarter of the trials, either of the two long ratios was in effect. With these parameters, after the pigeons emitted the response number required by the short ratio, if no reinforcement had occurred (because one of the longer ratios happened to be in effect), then the initially easy endeavor became more arduous— the expected number of responses to food was now greater (70) than it had been at the onset of the trial (45).

The Logic and Illogic of Human Reasoning

Navarro and Fantino (2005) gave pigeons the option of escaping the now less favorable endeavor by allowing them to peck an escape key that initiated a new trial. If the short ratio (FR 10) was not in effect on a given trial, then once 10 responses had been emitted the optimal choice was to peck the escape key (and begin anew on the food key). That is, the expected ratio given escape was lower than the expected ratio given persistence. Notice that at this choice point, the pigeons encountered a sunk-cost decision scenario. Namely, they had made an initial investment, they had received negative feedback—no reinforcement—and they could either persist in the venture or abandon it in favor of a better one. This general procedure allowed examination of the role of uncertainty in the sunk-cost effect in two ways. One way was through the presence or absence of stimulus changes. If a stimulus change occurred at the moment when escape became optimal, then the economics of the situation should have been more salient than if no stimulus change had occurred. Navarro and Fantino (2005) hypo thesized that pigeons responding on this procedure with no stimulus change would persist more than pigeons responding on this procedure with a stimulus change present. The results supported their hypothesis—when stimulus changes were absent, most of the pigeons persisted to the end of every trial. When changes were present, however, all pigeons escaped as soon as it became optimal (this trend appeared once behavior had become stable). A second way to manipulate uncertainty is by varying the difference between the expected value of persisting and the expected value of escaping. The closer these expected values were to each other, the less salient the advantage of escaping was and the more likely the pigeons should have been to persist. The results again supported the hypothesis: As the advantage of escaping decreased (although still being optimal), persistence rose. Additionally, by modifying this procedure for use with human participants, previous findings with human participants could be extended to a novel format. The experiments with pigeons were replicated with humans (Navarro & Fantino, 2005, 2007) in a computer simulation. In the human

experiments, presses on the computer keys were the operant and hypothetical money was the reinforcer, and the same contingencies were used. The human data mirrored those of the pigeons. These results suggest that at least two factors that contribute to the sunk-cost effect—economic salience and presence of discriminative stimuli—may affect decision making of both nonhuman animals and humans in a similar manner. The sunk-cost effect is arguably of more than academic interest. All people have likely experienced situations in which they have persisted at an endeavor long after it was prudent to continue, and the sunk-cost effect has been used to help understand investments gone awry, such as the Concorde supersonic airplane (indeed, there is the phrase Concorde fallacy) and the Vietnam War. However, U.S. society highly values persistence in pursuit of one’s goals. Rachlin (2000) has argued that persistence is the backbone of self-control and the avoidance of impulsive decision making (see de la Piedad, Field, & Rachlin, 2006, for a discussion). The iconic American inventor Thomas Edison is said to have said, “Many of life’s failures are people who did not realize how close they were to success when they gave up.” The trick, of course, is in discriminating when to persist. People’s ability to discriminate well depends on how much relevant information they have in hand. Given adequate information (or discriminative stimuli), people and pigeons appear to avoid the sunk-cost effect.

Base-Rate Neglect This robust phenomenon results from the fact that people typically underweight the importance of base rates in decision tasks involving two or more sources of information (e.g., Goodie & Fantino, 1996; Tversky & Kahneman, 1982). In base-rate experiments, participants are generally provided with information about base rates, which concern how often each of two outcomes occurs in the general population, and case-specific information, such as witness testimony or the results of a diagnostic medical test. Typically, the participant’s task is to select the more likely of the two outcomes or to provide a verbal or written estimate of the probability of one or both outcomes. An iconic base-rate problem 441

Fantino and Stolarz-Fantino

described by Tversky and Kahneman (1982) is the taxicab problem; a variant appears next with values that should make the solution transparent: A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. You are given the following data: (a) 67% of the cabs in the city are Blue and 33% are Green. (b) A witness identified the cab as Green. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 50% of the time and failed 50% of the time. What is the probability that the cab involved in the accident was Blue rather than Green? For this example, if both pieces of information (base rates of the two cab types and witness accuracy) were considered, the probability that the cab was blue is 67% would be clear. For less transparent values, the information would be combined according to Bayes’s theorem to find the precise probability (Birnbaum, 1983). Participants in most studies are not expected to calculate the exact values using Bayes’s theorem. However, they might be expected to use both sources of information and come up with an approximation of the correct answer. Instead, in most studies participants overweight the case-specific information and ignore, or at least underweight (neglect), the base-rate information. Thus, in this simple example, participants who neglect the base rate tend to assert that the probability is 50% because the witness is uninformative. As with the sunk-cost effect, the robustness of base-rate neglect is not simply of academic interest. Striking examples have been reported involving assessments of school psychologists (Kennedy, Willis, & Faust, 1997), physicians (e.g., Eddy, 1982), and AIDS counselors (Gigerenzer, Hoffrage, & Ebert, 1998). Misunderstanding of the importance of base rates has real implications for people’s lives. For example, Eddy (1982) found that when 442

hysicians were asked to estimate the likelihood p that a woman with a positive mammogram had breast cancer, most participants estimated the likelihood as 75%, which was very near the test’s sensitivity. In fact, the correct response was close to 8% because of breast cancer’s low base rate. Can researchers learn something valuable about the variables that control base-rate neglect by adopting a behavioral approach? For example, what if—instead of giving participants the relevant information, as is done in paper-and-pencil versions of the task— researchers had participants experience both the base rate and the accuracy of the case-cue information in a behavioral task over many trials? Would base-rate neglect still occur? To this end, StolarzFantino and Fantino (1990) suggested using a modified matching-to-sample procedure as a base-rate analogue. In the typical matching-to-sample procedure, the sample appears on a single lit key and is one of two colors, here blue and green. After the sample is extinguished, two comparison stimuli appear, blue and green. The human’s or pigeon’s task is to pick the stimulus that matches the sample. In contrast, in the modified procedure used in the base-rate analogues from our laboratory, matching the sample 100% of the time is not the correct response. Instead, selection of the blue or green comparison stimulus is correct a certain percentage of the time (resembling the binary choices of probability-matching experiments, discussed earlier). Consider this example, which is illustrated in Figure 19.1: After a blue sample, selection of blue is correct 67% of the time, and selection of green is correct 33% of the time; after a green sample, selection of blue is again correct 67% of the time, and selection of green is correct 33% of the time. In this example, the sample is completely uninformative: It should not acquire discriminative stimulus properties in selecting either comparison stimulus. Note that the values used here are the same as those in the earlier taxicab problem in which the witness testimony (sample stimulus) was uninformative and the base rate of blue cabs in the city was 67%. Thus, Tversky and Kahneman’s (1982) taxicab problem was converted into a nonverbal delayed matchingto-sample procedure into which different base rates and case-cue reliabilities could be programmed, as

The Logic and Illogic of Human Reasoning

Start 50%

50%

blue

green

blue

green

blue

green

67%

33%

67%

33%

Percent choices rewarded

Figure 19.1. Matching-to-sample analogue of the base-rate problem. From “An Experientially Derived Base-Rate Error in Humans,” by A. S. Goodie and E. Fantino, 1995, Psychological Science, 6, p. 103. Copyright 1995 by John Wiley & Sons. Adapted with permission.

Percent Choices Matching Green Sample

illustrated in Figure 19.1. From a behavioral perspective, this problem involves multiple sources of stimulus control. Goodie and Fantino (1995, 1996) with humans and Hartl and Fantino (1996) with pigeons explored this behavioral base-rate problem, with a variety of values in different conditions. How did human participants do when the sample was uninformative, as

in Figure 19.1? Under these circumstances, they should never have picked green, even when the comparison stimulus was green, because blue was correct more often. With knowledge of probability matching, however, one might expect that green would have been chosen on 33% of trials because it was correct 33% of the time. Or, if participants’ choices mirrored those in a single-trial paper-andpencil version of the taxicab problem—that is, if they showed base-rate neglect—one might expect that green would have been chosen after a green sample on 50% of trials because, in this example, the sample was uninformative (correct 50% of the time). These possibilities, and the actual data obtained for this combination of base rate and sample reliability, are shown in Figure 19.2. In fact, the data show that the green sample was matched on 56% of trials, indicating base-rate neglect. This behavioral baserate neglect persisted over the 400 trials studied, even when the underweighting of base rates caused the participants to lose money (Goodie & Fantino, 1995, Experiment 2). Pigeons, however, chose optimally; they learned to respond exclusively on the stimulus that was more likely to result in reinforcement. The results from other conditions supported the same general pattern: For humans, the sample

70 60

56 50

50 40

33

30 20 10 0

0 Optimal

Probability Matching

Sample Accuracy

Observed

Figure 19.2. Matching to the green sample according to whether participants select optimally, probability match, or commit base-rate neglect. The observed data are shown on the right. From “An Experientially Derived Base-Rate Error in Humans,” by A. S. Goodie and E. Fantino, 1995, Psychological Science, 6, p. 104. Copyright 1995 by John Wiley & Sons. Adapted with permission. 443

Fantino and Stolarz-Fantino

information was overweighted and the base rates were neglected (although not generally ignored); for pigeons, choices were controlled by both sample accuracy and base rates. Hartl and Fantino (1996) and Stolarz-Fantino and Fantino (1995) proposed that differences in learning histories between humans and pigeons may have been responsible for the differences in the results. That is, from early childhood, humans are exposed to many situations in which matching things that are in some way the same is reinforced. Laboratory pigeons lack a comparable history, which enables them to learn the optimal pattern of choice in tasks such as that of Hartl and Fantino. To strengthen this interpretation, it would be desirable to show (a) that humans will not neglect base rates when tested on problems in which prior learning is not likely to interfere and (b) that pigeons would show base-rate neglect if given, for example, a history of matching comparable to that of humans. With respect to the former, Goodie and Fantino (1996, 1999) demonstrated that human participants would not display base-rate neglect when symbolic matching-to-sample tasks were used in place of the usual identity matching-to-sample tasks used in the prior research. For example, when the sample was a line orientation (vertical or horizontal) and the comparison stimuli were colors (blue and green), baserate neglect did not occur. Instead, participants displayed probability matching (Goodie & Fantino, 1996). When the symbolic matching-to-sample task involved a previously learned relationship, however, base-rate neglect occurred (e.g., when the sample was the word blue or the word green and the comparison stimuli were the colors blue and green). Similarly, when human participants had been exposed to base rates without samples (i.e., when there were no competing sources of stimulus control), they were later sensitive to base rates when a matching-to-sample procedure was introduced (Case, Fantino, & Goodie, 1999). To complete the story that base-rate neglect may result from prior learning, Fantino, Kanevsky, and Charlton (2005) gave pigeons an extensive history of pretraining (more than 100 sessions) with informative case cues. During these sessions, sample accuracy was 100%—that is, the pigeons’ matching 444

responses were always reinforced and nonmatches were never reinforced. After this pretraining, the pigeons displayed base-rate neglect when confronted with problems that varied base rates and sample accuracies. As Fantino, Kanevsky, and Charlton (2005) concluded, “After a substantial history of matching, pigeons are likely to neglect base rates, whereas the relatively ‘uneducated’ pigeon is aptly sensitive to the multiple sources of stimulus control present in the matching-to-sample task” (p. 825). How do results from behavioral analogues of base-rate problems compare with those from more typical paper-and-pencil tasks used in most research on this topic? Stolarz-Fantino, Fantino, and Van Borst (2006, Experiment 2) gave college students two sessions of training—providing computer-based practice, with feedback about the accuracy of each answer—on typical base-rate questions of the type used by Tversky and Kahneman (1982); the questions varied across four levels of base rate and two levels of witness accuracy. In a third session, participants were tested on the combinations used in training and on some novel base-rate problems. Accuracy improved over the course of training; however, this improvement was not reflected in performance on the novel problems. Analysis of explanations for participants’ answers showed that compared with an untrained control group, the trained students were more aware of the importance of considering both base rates and witness accuracy when estimating likelihood. However, this awareness was not reflected in the accuracy of their estimates. Given that hundreds of trials were needed for Goodie and Fantino (1999) to overcome their participants’ tendency to neglect base rates, perhaps this should not be surprising. In ordinary experience, base rates are learned by integrating events that occur over a long period of time, and the base rates of many important events are unknown (Fiedler, 2000). For example, a woman is unlikely to be aware of the base rate for having breast cancer that applies to a person of her age; she is more likely to be aware of the sensitivity of the mammogram (i.e., how likely it is to detect cancer if she has it). People usually have more experience with the more salient case-cue information,

The Logic and Illogic of Human Reasoning

much as Goodie and Fantino’s (1996) participants had extensive experience with matching similar stimuli. Although participants in the typical paperand-pencil laboratory task receive both pieces of information simultaneously, their past experience may lead them to rely more on the case cue (the witness) than on the base rate. Experimentation with a behavioral analogue suggests that making base rates more salient—as in the study by Case et al. (1999)—should be a productive way to induce their use, and research with the more traditional paper-and-pencil task (e.g., Stolarz-Fantino et al., 2006) supports this view.

Conjunction Effect The conjunction effect is another classic fallacy addressed by Tversky and Kahneman (1983). Under certain conditions, participants judge the conjunction of two events as more probable than the less likely of the two events assessed separately. One example is Tversky and Kahneman’s “Linda” problem, in which participants were asked to rank the relative likelihood of a set of statements about Linda after reading a description of her background: Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Linda is a teacher in elementary school. Linda works in a bookstore and takes Yoga classes. Linda is active in the feminist movement. Linda is a psychiatric social worker. Linda is a member of the League of Women Voters. Linda is a bank teller. Linda is an insurance salesperson. Linda is a bank teller and is active in the feminist movement. (p. 297) As reported by Tversky and Kahneman (1983), 85% of responding undergraduates reported the

f ollowing order of likelihoods: Linda is active in the feminist movement Linda is a bank teller and is active in the feminist movement Linda is a bank teller. It is a violation of the rules of class inclusion for Linda to be more likely to be a feminist bank teller than to be a bank teller (who may or may not be a feminist). It may also be considered a violation of the monotonicity axiom of probability, with implications for judgments of preference (Zizzo, Stolarz-Fantino, Wen, & Fantino, 2000). In this sense, the conjunction effect can be considered a fallacy of reasoning. University undergraduates are unlikely to be unfamiliar with the rules of class inclusion; however, the conjunction effect appears widespread. Tversky and Kahneman (1983) reported that samples of students (from the University of California, Berkeley, and from Stanford University) who had completed several statistics courses performed better than previous samples, but even then, 36% of respondents committed the fallacy. Tversky and Kahneman also reported that 60 of 62 physicians, when tested on a medical question similar in structure to the Linda problem, rated the conjunction of two symptoms as more likely than the less likely symptom alone. According to the investigators, when the logical inconsistency was pointed out, most of the physicians “appeared surprised and dismayed to have made an elementary error of reasoning” (p. 302). Logic students are also not immune to the effect. Stolarz-Fantino, Fantino, and Kulik (1996, Experiment 3) found conjunction errors in the responses of 43% of a sample of University of California, San Diego (UCSD), students who had just completed a course in logic and who were tested in their logic classroom. Thus, participants’ responses must be due at least in part to lack of recognition that the task calls for the application of principles of logic, class inclusion, or probability. As one of Tversky and Kahneman’s participants is reported to have said, “I thought you only asked for my opinion” (p. 300). The conjunction effect may occur because of the application (or, in this case, overapplication) of some form of averaging or weighted-averaging model (see Fantino, Kulik, Stolarz-Fantino, & Wright, 1997, for a discussion). For example, 445

Fantino and Stolarz-Fantino

articipants may average the likelihood scores they p have assigned to two statements to reach an estimated likelihood for their conjunction. Anderson (1981) has found that averaging the perceived value of stimuli is a strategy used in making many kinds of judgments; thus, it would not be surprising for it to be misapplied in situations that appear similar to those in which averaging would normally be used. There is also evidence from behavioral (Fantino & Savastano, 1996) and decision-making (Zizzo, 2001) tasks that human participants respond more to compound stimuli than they do to individual stimuli, and this tendency may contribute to the conjunction effect. In line with the possibility that stimulus control may affect the strategy participants apply to rating the conjunction questions, Stolarz-Fantino, Fantino, Zizzo, and Wen (2003, Experiments 3 and 4) reported that conjunction errors were least likely when the following two conditions were met: (a) The framing description was omitted (the personality sketch of Linda in the earlier example), and (b) the conjunction question was preceded by a set of logic questions (rather than opinion questions or no other questions at all). Presumably, the logic questions set the occasion for regarding the conjunction judgment as a problem in logic; in contrast, the personality sketch may set the occasion for treating the conjunction judgment as an impression formation task, one in which the averaging strategy is frequently applied (e.g., Anderson, 1965). According to Hertwig and Gigerenzer (1999), participants are far less inclined to rate conjunctions as more likely than their components when the task is presented in terms of frequencies rather than in terms of probabilities. They pointed out that the word probability has a wide variety of interpretations when used in colloquial speech. Thus, use of the terms likelihood and probability does not necessarily set the occasion for participants to apply the relevant problem-solving practices to typical conjunction questions. Fantino, Stolarz-Fantino, and Navarro (2003) reported on a repeated-trials study intended to induce students to treat the conjunction task as one to which the principles of probability, class inclusion, or both should be applied. Participants rated 40 conjunction questions presented by computer; 20 were presented in a likelihood format 446

and 20 in a frequency format. Some questions asked participants to rate the conjunction of two events; others involved rating the conjunction of three events. An example of a likelihood question is the following: Krista is 20 years old and is a student at a small, liberal arts college. On a scale of 0 to 100, there is a likelihood of 20 that Krista is majoring in education, a likelihood of 70 that she walks for exercise, and a likelihood of 30 that she plays in a campus musical group. Please indicate the likelihood of the statement about Krista that appears below by entering a number from 0 to 100 to represent likelihood—for example, “0” would be virtually impossible, and “100” virtually certain. What is the likelihood that Krista is an education major and also plays in a musical group? An example of a frequency question is the following: Eric is 23 years old and is a student at a university in North Carolina. Out of 100 people like Eric, 30 are majoring in biology, 70 play intramural sports, and 20 do volunteer work with children. Out of these 100 people, to how many would the statement below apply? Indicate your answer by entering a number from 0 to 100. Out of 100 people like Eric, how many do all three of the following: major in biology, play intramural sports, and do volunteer work with children? After each response, participants in the feedback condition saw a statement telling them whether their answer was “in the range that is considered correct,” or (if the conjunction was rated higher than the likelihood or frequency of one of the component statements) “too high to be in the range that is considered correct.” Those in the no-feedback condition did not receive feedback on their responses. In addition, participants received the

The Logic and Illogic of Human Reasoning

questions in two groupings (Phase 1 and Phase 2) in one of three orders: (a) likelihood questions first, followed by frequency questions; (b) frequency questions first, followed by likelihood questions; or (c) mixed presentation of the two formats. Would receiving feedback help participants avoid the conjunction effect? More important, would answering questions in the frequency format first help them avoid errors on the likelihood problems? The results were positive on both counts: Participants who received feedback answered more questions correctly in both phases than those who received no feedback. Participants who received frequency questions first scored higher than those who received likelihood questions first; in addition, likelihoodfirst participants improved when rating the frequency conjunctions in Phase 2, and frequency-first participants continued to do well on their Phase 2 likelihood questions. Most important, students in the mixed-presentation condition scored similarly to those in the frequency-first condition, supporting the possibility that presenting questions in the frequency format can lead participants to recognize the questions as requiring the application of class- inclusion principles, even when those questions appear in the less transparent likelihood format. Can Stronger Incentives Lead to More Optimal Performance? A first step toward understanding the irrationality observed in some domains of human reasoning and decision making that would be obvious to a behavior analyst, or nearly any psychologist for that matter, is to ascertain whether the person is adequately motivated. An inadequately motivated participant may pay insufficient attention to the task at hand or might be distracted by a variety of competing activities and alternative sources of reinforcement in the immediate environment. This concern is especially apt considering that many studies involving decision-making errors are carried out with paper and pencil in classrooms filled with dozens—or even hundreds—of students. The competing activities include using laptops and iPods, talking with friends, completing reading for an upcoming class, and many more; any of these are arguably more

important to participants than determining the answers to arcane questions posed by a visiting experimenter. Perhaps a more rational view of the human thinker and decision maker would emerge if the experimental questions were posed in a more personal manner with adequate incentives and more restricted access to alternative activities. The issue of adequate incentives was taken up by Hertwig and Ortmann (2001) in a Behavioral and Brain Sciences target article that focused on differences between the experimental techniques typically used by economists and those used by psychologists studying, among other areas, behavioral decision making and reasoning. Two of the techniques cited as more common among economists—the use of repeated trials and pay for clearly defined performance criteria—are also typical of research in the tradition of behavior analysis. Thus, an examination of these topics seems especially relevant. The issue of incentives has spurred a lively debate among psychologists and economists. Many economists have doubted the value of results obtained without adequate incentives. Indeed, Camerer and Hogarth (1999) asserted that unless a study was financially motivated, it was “essentially impossible to have it published in an economics journal” (p. 35). Psychologists, however, have often been content to conduct their studies of logical problems without incentives or with hypothetical incentives. The virtues of omitting incentives are obvious. For example, where an economist might use real monetary incentives to encourage rational decision making, the psychologist would save that money by omitting the incentive or asking the participants to imagine receiving money. Moreover, where hypothetical money is concerned, there is no limit to the amounts that can be used. For example, Rachlin, Brown, and Cross (2000) were able to investigate temporal and probability discounting using standard amounts that ranged from $10 to $1,000,000. So the issue of the role of incentives potentially has both theoretical and methodological implications. In a commentary to the Hertwig and Ortmann (2001) target article, Fantino and Stolarz-Fantino (2001) took the position that hypothetical rewards could be as effective as real ones and that their use incurred no loss in external validity. We now review the studies that led to this 447

Fantino and Stolarz-Fantino

conclusion as well as newer data that convinced us that—for certain kinds of studies—the economists were on solid ground after all.

When Monetary Rewards Do Not Improve Performance In our human decision-making laboratory, we have had three opportunities to compare performance in comparable situations in which some participants received hypothetical money and others received real money for optimal responding. These studies have involved three different issues: observing behavior, base-rate neglect, and the conjunction fallacy. We discuss each briefly in turn. In studies of observing, participants have the opportunity to produce discriminative stimuli that are correlated with the schedules of reinforcement in effect. For example, in a typical procedure, two schedules alternate; the same stimulus is associated with each (a mixed schedule). In the procedure developed by Wyckoff (1952), performance of an observing response produces a stimulus associated with the schedule in effect. Producing the observing stimulus has no effect on the receipt of reinforcement; it only signals which schedule is currently active. In behavioral terminology, it changes the mixed schedule to a multiple schedule. Results from numerous studies have shown that under most circumstances, only the stimulus associated with the richer of the two schedules maintains observing (for recent discussions and data, see Escobar & Bruner, 2009; Fantino & Silberberg, 2010). In one of several studies undertaken to ascertain the generality of this finding, Case and Fantino (1989) found that it did not matter if points were worth money (either $0.05 or $0.25). There was no difference in the rate or pattern of observing among participants whose points were exchangeable for money and those whose points were not. Likewise, in studies of base-rate neglect with human participants, reinforcing correct answers with a variety of incentives—ranging from points to varying amounts of real money—had no significant effect on participants’ tendency to neglect the base rates (Goodie & Fantino, 1995, Experiment 2). As robust as the conjunction effect has proven to be, one wonders how much careful consideration 448

participants give their responses, given the aforementioned difficulties with studies carried out in large classrooms. In one of our studies on the conjunction fallacy (Stolarz-Fantino et al., 2003, Experiment 5), we studied participants individually for six trials (rather than the usual one trial) and varied the incentive for correct answers. In one condition, the experimenter sat next to the participant with a roll of $1 bills. Participants received $3 for each answer that did not constitute an example of the conjunction fallacy. Participants could earn up to $18 in just a few minutes by not committing the fallacy. However, participants in this high-incentive condition did no better than those in the no-incentive control group or in any other of the groups we studied. Moreover, there was no sign of improvement over the six trials. Thus, inattentiveness to the task owing to a lack of appropriate incentive does not appear to be responsible for the prevalence of the conjunction effect (seen in as many as 80% of participants in some studies). Although the rewards offered human research participants are meager compared with those offered pigeons maintained at 80% of their free-feeding weights, on the basis of the results from these three sets of studies, concluding that real money was not critical in generating meaningful responding in studies with human participants seemed reasonable (Fantino & Stolarz-Fantino, 2001). Presumably, there are other reinforcers for human participants in these situations, including, in different studies, the receipt of points, compliance with the experimenter’s instructions, or feedback that the response was correct (Goodie, 2001). Moreover, some experimental tasks, particularly those involving reasoning, may be intrinsically reinforcing to college students. For example, some studies have used video games, which are known to maintain high rates of participation (e.g., Case, Ploog, & Fantino, 1990). In a review of 74 studies that varied in the amount of incentive, Camerer and Hogarth (1999) found that incentives improved performance of easy, effort-responsive tasks but did not help performance of more difficult tasks. At any rate, in our studies the use of real money as a replacement or additional reward appeared to provide no additional benefit to performance. That the effects of rewards may not be

The Logic and Illogic of Human Reasoning

additive is consistent with a classic literature on incentives that suggests that there is an optimal level of motivation beyond which performance declines (Yerkes & Dodson, 1908). Indeed, recent work by Mobbs et al. (2009) found that participants were less successful at capturing “prey” in a reward- pursuit task under conditions of higher reward (about $10.00 per successful capture) than under conditions of lower reward (about $1.00 per successful capture).

The Special Case of Economic Games It may be that monetary payments are not necessary for optimizing participants’ performance when other incentives are present. However, economists who stress the importance of carrying out studies with real money have often focused on a different area of research: economic games. Literally hundreds of studies have been published on the Prisoner’s Dilemma game and its variants, conducted by social scientists of all stripes as well as by biologists (see Barash, 2003, Chapter 3, and Rachlin, 1997, for discussions of the Prisoner’s Dilemma). Economists have developed other games involving the distribution of money. Most common among these are the dictator game and the ultimatum game, which we introduce briefly (for a fuller treatment, see Camerer, 2003; see also Chapter 20, this volume). These games ask a participant to divide an amount of money, usually between the participant and a second person. Economists have argued that the distribution of funds when hypothetical money is the currency may tell researchers little about the distribution of real money. Therefore, they would argue that studies using hypothetical money have weak external validity (i.e., limited applicability to real-world decisions). If correct, this weak external validity would diminish the value of studies of economic distribution games conducted with hypothetical rewards. In the dictator game, a participant (the allocator) is asked to divide a sum of money, say $100, between him- or herself and another person (the recipient, typically an unseen stranger). The game is usually a one-trial game, and the allocator’s decision is honored (hence the dictator aspect of the game). Dictator game offers vary across participants; they are sometimes equitable (here $50 for the allocator

and $50 for the recipient) and sometimes completely self-interested ($0 for the recipient). Other participants offer the recipient around $30, which approximates the mean offer (30%) in many studies (Camerer, 2003; Forsythe, Horowitz, Savin, & Sefton, 1994). The ultimatum game is more complex because the recipient plays an active role. The allocator (again in a one-trial game) is told to make an offer to divide the money; if the offer is accepted by the recipient, the offered distribution will be honored. However, the recipient has the option to reject the offer, in which case neither the allocator nor the recipient receives anything. No negotiation is permitted. Participants tend to be somewhat more generous in the ultimatum game (certainly there are no offers of $0) than in the dictator game. Recipients tend to accept offers that are at least 30% of the total (here $30) and to reject offers below 20% (Camerer, 2003). In neither the ultimatum game nor the dictator game do participants make their decision on the strict basis of maximizing their dollar amount as would be expected by a strict adherence to economic rationality (Forsythe et al., 1994; Güth, Schmittberger, & Schwarze, 1982). The rationale for the prediction is simple. In the dictator game, the offer should be $0, but only a minority of offers are at or near zero. In the ultimatum game, the offer should be just above zero, say $1, because the recipient should accept any nonzero offer as being better than nothing. In fact, though, recipients nearly always reject near-zero offers, and the proposers, presumably anticipating this, nearly always offer considerably more (often 40%–50% of the pie). Although from a strict economic perspective these results suggest nonrational decision making, whether anyone would really expect the kind of rational behavior posited by the theory is not clear. A wealth of other variables (including cultural and historical ones) are presumably modulating the influence of the distribution of dollars. As Güth et al. (1982) pointed out, “Subjects often rely on what they consider a fair or justified result. Furthermore, the ultimatum aspect cannot be completely exploited since subjects do not hesitate to punish if their opponent asks for ‘too much’” (p. 384). 449

Fantino and Stolarz-Fantino

Research on economic distribution games permits researchers to ask whether the distributions differ depending on whether the money being divided is real or hypothetical. Camerer and Hogarth (1999) reported that the answer depends on the game. For the Dictator Game, less generous offers are made with real money. Camerer and Hogarth speculated that with real money, the nonfinancial reinforcers in the situation are relatively less effective. Research in our laboratory has compared the effects of real and hypothetical money in a series of studies using the sharing game, a variant of the dictator game developed by Kennelly and Fantino (2007). In this game, the participant makes a series of binary choices (for self and the other player) in which one outcome yields higher amounts for both players (with the chooser receiving the smaller of the payoffs) and the other outcome yields lower amounts for both players (with the chooser receiving the larger of the payoffs). For example, the choice might be between (a) $7 for oneself and $9 for the other or (b) $5 for oneself and $3 for the other. On any given trial, the participant is constrained to select between the two displayed outcomes. Unlike the ultimatum and dictator games, which do not constrain participants’ choices in this way, the sharing game allows one to distinguish between those who prefer to maximize their earnings and those who prefer a maximized relative advantage over the other. Thus, in the preceding example, choosing the $7–$9 option can be considered optimal (earnings maximization) and choosing the $5–$3 option can be considered competitive (sacrifice $2 to obtain more than the other participant). What about participants who prefer an exactly equal distribution? Kennelly and Fantino (2007) gave participants 20 trials; each choice was always repeated once. Thus, the first and second trials offered the same set of outcomes, as did the third and fourth, and so on, which enabled participants to alternate choices on successive trials, enabling a more equal division of the money. Half of the participants were distributing real money, and half were distributing hypothetical money. Half of the participants in each of these conditions were 450

told that the other participant was a student, and the other half were told that the other participant was a computer. The results showed that male participants chose optimally (i.e., maximized their earnings) more frequently than female participants, who tended to distribute the funds more equally. Surprisingly, in each of three experiments, whether participants were told that the anonymous other was another student or a computer did not matter, even when real money was being divided. The fact that participants would be equitable or competitive with a computer (rather than maximizing their realmoney earnings) may or may not be labeled irrational, but in any case it underscores the complexity of choice. In addition, using such terminology as game and player in one experiment was associated with an increased frequency of competitive behavior relative to other experiments, which used more neutral language. Relevant to our discussion of real and hypothetical outcomes, participants in the monetary conditions acted more optimally than those in the nonmonetary conditions; overall, they significantly more often selected optimally: 68% of the time, compared with 50% of the time in the hypothetical money condition. In a subsequent experiment, Fantino and Kennelly (2009) found a within-subjects effect of real versus hypothetical money in which participants who were switched from a hypothetical to a real-money condition became more optimal (and less likely to equalize). Overall, though, equality was the modal strategy used, especially in the hypothetical money conditions (and especially for female participants). To recapitulate, the role of monetary incentives on the quality of performance in tasks involving decision making appears to depend on the nature of the task. When the task is an economic distribution activity in which money is being divided, it is clear that the distributions selected will be different with real, as opposed to hypothetical, money. This does not mean that it is always necessary to use real money, a daunting prospect when large sums are being distributed. First, in situations for which qualitative assessments—for example, determining the direction of an effect—are adequate, hypothetical money may be used profitably. Second, there are naturalistic rewards other than real money. For

The Logic and Illogic of Human Reasoning

example, Fantino, Gaitan, Kennelly, and StolarzFantino (2007) used time out from a tedious task as a resource to be divided in the ultimatum and dictator games and found that it could produce distributions comparable in optimality to those made with real money (and more optimal than those made with hypothetical money). As an explanation for human participants’ often poor performance on a host of decision and logical tasks, however, it appears that lack of adequate incentive is not a major culprit. Altruism: Does It Exist and Is It Rational? The sharing game discussed earlier is potentially valuable as a tool to study altruistic behavior. For example, consider the following choice: Player 1 receives $10 and Player 2 receives $10, or Player 1 receives $0 and Player 2 receives $100. Suppose further that real money is involved (the experimenter has a cashbox filled with $10 and $100 bills) and that the two players will never meet and would remain anonymous. We have in fact now done an approximation of this study three times (although once with hypothetical money only) conducted by three different experimenters (Shawn Charlton, Jamaal Clarke, and Art Kennelly). All three times about 20% of our participants (UCSD students) chose altruistically—that is, distributed $100 to the anonymous second participant while taking nothing for themselves. Male participants were far more likely to select the altruistic option. Perhaps this gender difference reflects our earlier finding that female participants are more likely to make egalitarian selections in the sharing game. We do not believe that these results necessarily argue for true altruism in those 20% of participants who select the altruistic option. Several explanations of this choice are consistent with traditional reinforcement views. To begin with, following the Golden Rule (“Do unto others as you would have them do unto you”) may have been reinforced by parents, friends, members of the clergy, and so forth. Principles of conditioned reinforcement, stimulus control, and rule-governed behavior are among those that could be brought to bear (see Fantino & Stolarz-Fantino, 2002, for a discussion).

As an example of how humans may reason, consider the following short passage from the Nobel Laureate Jose Saramago’s (1997) masterpiece Blindness. An individual goes suddenly blind in the first page of the novel. A good Samaritan helps him arrive home safely, and this dialogue occurs (note that the dialogue shifts between speakers at the capitalizations): “You cannot imagine how grateful I am, Don’t thank me, today it’s you, Yes, you’re right, tomorrow it might be you” (Saramago, 1997, p. 4). This is a particularly ironic example of karma because moments later the good Samaritan steals the blind man’s car and becomes blind himself moments after the theft. We suspect most readers interpret this last attack of blindness as fitting, as in the familiar expression, “What goes around comes around.” If this interpretation is to be taken seriously, however, one would need to know whether people really do hold such opinions. In other words, if this explanation of instances of altruistic behavior has merit, then at a minimum, researchers need to explore the prevalence of the belief that what goes around comes around. The answer to this exploration has important methodological and theoretical implications. If many people believe in the kind of reciprocity expressed by “What goes around comes around,” then it would appear impossible to demonstrate altruism. Some investigators (e.g., Hoffman, McCabe, & Smith, 1998) have pointed out that if the potential altruists and their recipients are anonymous to one another, then no possibility of reciprocity exists. In that event, it could at least be argued that any altruistic choices must represent true altruism (i.e., altruism not explainable in terms of benefit to the putative altruist). If people generally believe that reciprocity is a fact of life, however, then it would appear impossible to demonstrate true altruism whether or not the anonymity of the altruist is preserved. In fact, we questioned several hundred UCSD undergraduates about their agreement with the statement “What goes around comes around” (Fantino & Stolarz-Fantino, 2010). Of respondents, 86% agreed with this statement, suggesting at least implicit agreement with the idea that altruistic acts may eventually be reciprocated, even when the 451

Fantino and Stolarz-Fantino

recipient is totally unknown to the actor. Moreover, we found at least a weak correlation between degree of belief in this notion and the degree of altruism expressed in an economic distribution scenario. Despite our contention that it may be impossible to demonstrate true altruism, it is clearly of interest to identify the conditions (including reinforcement history) that promote altruistic choices and to identify the characteristics of individuals who make altruistic choices. Important too is a determination of generalized altruism—that is, are those who make altruistic choices in one set of circumstances (e.g., in the sharing game) more likely to make them in other contexts (e.g., when responding to social dilemma scenarios)? Is altruism rational? Perhaps it is not rational (in an economic sense) in the case in which no reciprocity is possible. Where a broader view of reciprocity exists, however, giving up something to help someone else may be regarded as a good idea—and a rational act. Rule-Governed Behavior Accounting for such seemingly altruistic acts in behavioral terms requires a consideration of the impact of rules on behavior. Behavioral psychologists have shown interest in the role of rules in choice and problem solving. B. F. Skinner (1969) defined a rule as a contingency-specifying stimulus and argued that behavior under the control of rules may differ from behavior controlled directly by the behavioral contingencies specified by the rules. Thus, when one learns to speak Italian by following a set of grammatical rules in a textbook, one’s Italian will differ from spoken Italian learned directly by the consequences supplied by an Italian-speaking community (see Hineline & Wanchisen, 1989). Specifically, rule-governed (instructed) behavior may be less sensitive to changes in environmental contingencies than is contingency-shaped behavior. Research by Catania and his colleagues (e.g., Catania, Shimoff, & Matthews, 1989; Shimoff, Catania, & Matthews, 1981) has shown this to be the case with respect to responding to unannounced changes in schedules of reinforcement; Baron and Galizio (1983) found evidence that instructions can exert strong control over responding, even overriding 452

schedule contingencies. In a study with 3.5- to 5.5-year-old children, Vaughan (1985) found that children instructed about the correct sequence of a chain of responses made fewer errors in acquisition; however, their behavior did not hold up well when the instruction stimuli were discontinued. In another study with children, Michael and Bernstein (1991) studied preschool children over many sessions on a matching-to-sample task. The children were assigned to one of three training conditions: instructed, contingency shaped, and imitation. They found that participants in the instructed and imitation conditions acquired the relations in fewer sessions than did those in the contingency-shaped condition; however, when the rules changed, children in the instructed and imitation conditions were slower to adapt. Michael and Bernstein pointed out that children in the instructed and imitation conditions behaved nearly identically and that this might be an indication that they engaged in self- instruction. This conceptualization is consistent with findings by Lowe and his colleagues (Bentall & Lowe, 1987; Horne & Lowe, 1993) that verbally proficient humans form covert rules about situations they encounter (see also Rosenfarb, Newland, Brannon, & Howey, 1992). Kudadjie-Gyamfi and Rachlin (2002) investigated rule-governed and contingency-governed behavior on a self-control task. They found that participants provided with a hint about the contingencies performed better on the self-control task (and were thus closer to maximizing their earnings) but were less sensitive to an unsignaled change in the contingencies than participants who were not given the hint. Rules do not always produce optimal behavior, however. Hackenberg and Joker (1994) examined the effects of instructions on humans’ choices in a diminishing-returns procedure. Instructions were initially accurate in that they corresponded to the optimal rate of reinforcement (points later exchangeable for money). Across blocks of sessions, however, the contingencies were altered systematically in ways that departed from the instructions. Participants continued to follow the instructions, despite their growing inaccuracy with respect to the contingencies, which resulted in substantial losses in potential earnings. Rules can therefore be either

The Logic and Illogic of Human Reasoning

beneficial or detrimental, depending on the contingencies they specify. Indeed, many people acquire useful rules from their parents, teachers, and other role models. However, history is replete with examples of rule following going dreadfully wrong. It is important to assess behaviorally the conditions under which rules are aptly and flexibly applied. What is the effect of rule learning on problemsolving tasks? Research by Luchins (1942) investigated this using the classic water jar problem. In the typical Luchins task, adults and children were presented with the case of three water jars that held specified amounts of water. They were asked to determine how to obtain a particular amount of water by using only the three jars. For example, participants might be told that Jars A, B, and C held 14, 163, and 25 quarts of water, respectively, and that their task was to determine how to measure out exactly 99 quarts of water. Participants were given 2.5 minutes in which to find the solution (B − A − C − C, or B − A − 2C); they then solved several more problems (involving different sized jars) that could be solved using the same rule. Later in the task, participants encountered a problem that could be solved either by the same rule or by a more direct method: A + C. Most participants failed to see the shortcut—which was used readily by participants without experience with the original rule—and continued to make use of the more complex rule. Many were embarrassed at having missed the obvious solution when it was pointed out to them. Luchins and Luchins (1950) conducted additional experiments that supported the view that experience with a rule, even when it is self-generated, can hinder subsequent problem solving. Fantino, Jaworski, Case, and Stolarz-Fantino (2003, Experiment 1) assessed the effects of rule use in a task that resembled that of Luchins (1942). During Phase 1, participants (college students) were randomly assigned to one of three groups. The instructed-rule group was given an equation and told that it could be used to solve all of the problems. The induced-rule group received the same problems and general instructions but were not told that a single rule could solve the problems and were not given an equation. Their instructions were more analogous to those received by participants in the

Luchins study. The changing-rules group received the same general instructions, but different problems, each of which required a unique solution. Participants solved as many problems as they could during the allotted time. During Phase 2, participants in each group were assigned to one of three test conditions; again, they solved as many problems as they could within the allotted time. In one condition, the problems could be solved by a novel rule that was different from the one used by those in the instructed- or induced-rule groups during Phase 1. In a second condition, the Phase 1 rule would work to solve the problems, but they could be solved more efficiently by using a shortcut. This condition was most similar to that of the Luchins studies. In the third condition, all problems had unique solutions. Was there evidence of rigidity as a result of previous problem solving? Some rigidity occurred in the second condition (most similar to that of Luchins, 1942), in which the old rule still applied but for which a shortcut was available. However, this impairment was generally short lived. Of greater interest were the results from the first condition (all problems could be solved by a single rule different from the one used in Phase 1). Students from the instructed-rule group performed better in this condition than did those from the other two groups. Perhaps the instruction in the Phase 1 rule made these participants more sensitive to the possibility of finding an effective rule during Phase 2, at least when there was an effective rule to be found; they were no better at solving changing-rules problems in Phase 2 than were participants from the other Phase 1 conditions. Similar studies have been carried out with schoolchildren as participants. Kanevsky (2006) studied the effect of instructed and induced rules on the solution of math problems by sixth-grade students. For example, in one study students practiced solving problems that involved making round trips of some distance a certain number of times per hour, day, week, and so forth (X = 2 × distance × number of trips). Students in the instructed-rule group were told how to solve such problems, whereas those in the induced-rule group were not. Students in the changing-rules group solved a set of problems 453

Fantino and Stolarz-Fantino

that all involved different rules. Students in all three groups received feedback on their answers. Then students in all three groups solved transfer problems that required a novel rule; these problems involved finding out how much time was spent per unit of work (X = 60 T/N, where T is time and N is the number of work units). Kanevsky found that students in the induced-rule group did best on the novel problems during the transfer phase. Students in the instructed-rules group and the changing-rules group performed similarly. However, although students in the instructed-rule and induced-rule groups both improved over the course of the transfer phase, those in the changing-rules condition did not. Sasada (2003) used a similar design with fourthgrade students and verbal analogy problems and also found better transfer to problems solved by a novel rule after induced-rule training than after instructed-rule training. Sasada found no difference between instructed- and induced-rule conditions in college-age participants. Overall, these results suggest that rule-based problem solving (whether instructed or induced) need not be inflexible, but that induced rule learning may be especially effective in promoting flexible problem solving in children. The Kanevsky (2006) and Sasada studies measured accuracy of subsequent problem solving rather than efficiency (the measure used by Fantino, Jaworski, et al., 2003). The effect of the dependent measure and the possibility of developmental differences in the utility of instructed rules are subjects of current investigation.

competing sources of stimulus control. As with the sunk-cost effect and base-rate neglect (and other fallacies in decision making), the case can be made that gambling, as with persistence and focusing on case cues, is sometimes a rational course of action. For example, when resources are insufficient to sustain life, gambling what is left may be more optimal than not doing so. Indeed, the budget rule of behavioral ecology predicts that organisms should shift from being risk averse to being risk prone as budgets are depleted (e.g., Caraco, 1981). A review of the budget rule and its relation to gambling may be found in Fantino, Navarro, and O’Daly (2005). Here, we focus on some of the factors that appear to maintain problem gambling. One potent factor is immediacy of reinforcement. Preference for variable over fixed delays with the same arithmetic means has been demonstrated in countless studies with nonhumans but can be shown with humans as well (see Fantino, Navarro, & O’Daly, 2005, for a review). For example, consider a pigeon choosing between a variable-interval (VI) 15-second schedule and a fixed-interval (FI) 15-second schedule as the outcomes in a concurrentchains schedule (see Figure 19.3). The VI schedule consists of some interreinforcement intervals that are considerably longer or shorter than 15 seconds.

What Can Gambling Behavior Tell Researchers About Nonoptimal Decision Making? In studies of the sunk-cost effect and base-rate neglect described earlier, behavioral methodology was applied to problems in decision making that had not previously been viewed from a behavioral perspective. The area of gambling is perhaps different in the sense that the gambling context can be seen as an obvious showcase for basic principles of reinforcement including incentives, schedules of reinforcement, discriminative stimuli, conditioned reinforcement, the partial reinforcement effect, and 454

Figure 19.3. Concurrent-chains schedule in which the organism chooses between reinforcement outcomes arranged by variable-interval (VI) and fixed-interval (FI) schedules with the same mean interreinforcement interval.

The Logic and Illogic of Human Reasoning

If the occasional short interreinforcement intervals in the VI schedule have a disproportionate effect on choice, then the pigeon should prefer the VI 15 seconds even though the mean interreinforcement intervals on the two schedules are equal. Herrnstein (1964) found just that: a strong preference for the VI schedule. Researchers are finding the same effect with human participants (college students) choosing between mixed- and fixed-ratio schedules of reinforcement (Meyer, Schley, & Fantino, 2011; see also Locey, Pietras, & Hackenberg, 2009). This prospect of an immediate payoff is what may underlie much of the appeal of gambling (see Madden, Ewan, & Lagorio, 2007).

Does Lack of Self-Control Contribute to Nonoptimal Decision Making? Many of society’s problems stem from a preoccupation with short-term gain. Large-scale problems are exacerbated by the tendency to place too much weight on the immediate contingencies associated with a societal challenge (be it mitigating global warming and environmental pollution, improving infrastructure, or working to reorganize the health care system) and give too little consideration to the long-term contingencies, both positive and negative. Likewise, people often act against their own best interests by making choices that serve immediate desires or convenience rather than those that serve their future health and happiness. Insofar as lack of self-control can be viewed as underlying many nonadaptive behaviors, then discounting may be seen as a mechanism by which impulsive behavior can be justified (Fantino & Stolarz-Fantino, 2008). One can argue that problem gambling results from impulsiveness. If this were true, then problem gamblers would be expected to display a tendency to discount delayed rewards more steeply; conversely, one would also expect them to show shallower discounting for less probable rewards. Support for this idea comes from research showing that people with pathological gambling and other addictions do, in fact, show steeper discounting functions than control participants (e.g., Alessi & Petry, 2003; Bickel, Odum, & Madden, 1999; Dixon, Marley, & Jacobs, 2003; Petry, 2001). Pathological gambling tends to

occur in association with substance abuse, nicotine dependence, and mood, anxiety, and personality disorders (Petry, Stinson, & Grant, 2005). What kinds of discounting functions are observed in gamblers who do not meet the criteria for pathological gambling? In a study comparing gambling and nongambling college students, Holt, Green, and Myerson (2003) found no difference between the groups in the steepness of their temporal discounting functions. They did, however, find a difference between the groups in probability discounting, with the gambling students discounting unlikely rewards less steeply, thus showing less sensitivity to risk. Similar differences in probability discounting across people diagnosed with pathological gambling and matched control participants have been reported by Madden, Petry, and Johnson (2009). Holt et al. interpreted these results as evidence against a simple view of impulsivity as a trait encompassing both risk taking and the inability to delay gratification. Shead, Callan, and Hodgins (2008) also gave a probability-discounting task (in this case, with a real monetary incentive) to college students who had varying degrees of gambling experience. They found no significant correlation between participants’ degree of problem gambling severity and their probability discounting of either gains or losses, a result consistent with their participants’ low to moderate gambling experience. However, participants had also completed the Gambling Expectancy Questionnaire (Stewart & Wall, 2005) and had been categorized as either relief expectancy gamblers (those who expected gambling to relieve negative mood), reward expectancy gamblers (those who expected gambling to enhance positive mood), or nonexpectancy gamblers (those who had neither expectation). Stewart and Wall (2005) found that reward expectancy gamblers discounted probabilistic gains less steeply and probabilistic losses more steeply than did participants in the other categories. Thus, there seems to be some evidence for a relationship between gambling experience and probability discounting in people who do not display pathological gambling. With temporal discounting, the relationship with gambling may be more complex because the steeper temporal discounting found among some problem 455

Fantino and Stolarz-Fantino

gamblers may result from associated problems such as substance abuse (Petry & Casarella, 1999). However, recent work by Weatherly, Marino, Ferraro, and Slagle (2008) investigated the question of whether participants’ rates of temporal discounting are associated with their behavior in an actual gambling task. Participants were given the equivalent of $10 in tokens that they could use to play a slot machine. Weatherly et al. found that temporal discounting rates were significantly predictive of the number of tokens inserted into the machine. In any event, discounting continues to be an area of interest as investigators search for explanations of how gambling develops into a problem behavior. Little has been done to support the popular notion that individuals who are lucky enough to win early gambles are more likely to develop into problem gamblers than those who are unlucky at the outset. If anything, Goltz’s (1992) research on persistence of commitment suggested that gamblers should be more likely to continue in the face of intermittent wins. Kassinove and Schare (2001) found no effect of a big win (in this case, $10) on resistance to extinction—that is, whether participants persisted playing a slot machine simulation in the face of repeated no-win trials. Weatherly, Sauter, and King (2004) also studied the effect of a big win on simulated slot machine play; the students in their study were selected to have little or no gambling experience. Weatherly et al. found no significant difference in resistance to extinction among students who had a big win on the fifth trial, those who had a small win on the 25th trial, and those who never had a winning trial. However, participants who had a big win on the first trial persisted far less than those who had a big win on the fifth trial, much as would be expected on the basis of Goltz’s research. Manipulating an individual’s early gambling history in a naturalistic setting is difficult. However, it is very easy to do so for the pigeon. In our laboratory, we gave three groups of pigeons different historical gambling contexts (Fantino & Stolarz-Fantino, 2009). One group chose between an FR 10 and a mixed-ratio schedule that was FR 10 on half the trials and FR 90 on the other half (presented as the outcomes on concurrent chains; Figure 19.4). Predictably, these pigeons more often chose 456

Figure 19.4. Concurrent-chains procedure in which the organism chooses between reinforcement outcomes arranged by a fixed-ratio (FR) 10 on one side and an outcome that has an equal probability of FR 10 and FR 90 on the other side. VI = variable interval.

the simple FR 10. A second group chose between an FR 90 and the same mixed FR 10–FR 90. These pigeons chose the mixed schedule more often. A third group had no pretraining. All three groups were then exposed to the choice of FR 50 versus mixed FR 10–FR 90. These conditions remained in effect until their preferences were stable. The results are shown in Figure 19.5: The group that had chosen FR 10 most often in pretraining had the lowest preference for the mixed schedule; the participants chose it significantly less often than the participants in the other two conditions. The group that had chosen the mixed schedule more often in pretraining showed the highest preference for the mixed schedule, and preferences for control participants 0.8

Phase 2: 10/90 Preference

0.7 0.6 0.5

0.65

0.58

0.49

0.4 0.3 0.2 0.1 0

10 vs 10/90

90 vs 10/90

no training

Phase 1: Training condition

Figure 19.5. Mean response proportion for the mixed (gambling) option as a function of a history of preference for a fixed option (left bar) or a mixed option (center bar) and with no relevant history (right bar).

The Logic and Illogic of Human Reasoning

were intermediate. Thus, it appears, at least for pigeons in this task, that historical context does make a difference in future preference for risky outcomes. Summary and Conclusions: Factors Promoting Logical and Illogical Reasoning As illustrated in the examples discussed previously, it is not the case that human reasoning is invariably logical—or illogical—even on different examples of the same task. However, we can make five generalizations that may help to reconcile some seemingly contradictory findings. First, suboptimal responding can be lessened by making task contingencies more transparent, as Fantino and Esfandiari (2002) found when they informed some participants of the probabilities of reinforcement associated with each stimulus in their investigation of probability matching. Similarly, Navarro and Fantino (2005, 2007) found that both pigeons and human participants learned to avoid the sunk-cost effect when distinctive stimuli allowed them to discriminate the point at which they should escape a situation in which the prospects for reinforcement were becoming unfavorable. Second, feedback as to the correctness or accuracy of responses appears to be helpful in improving performance on problem-solving tasks. However, it needs to be reasonably specific to be effective. For example, Fantino, Stolarz-Fantino, and Navarro (2003) found that feedback was useful in helping participants avoid the conjunction effect; this feedback was more specific than that used by StolarzFantino et al. (2003) in their (unsuccessful) attempt to reinforce correct responding with financial rewards. For feedback to work, participants must understand the task requirements well enough to make use of it. Third, financial incentives are evidently not necessary or sufficient for making participants’ reasoning more logical, although—as discussed by Camerer and Hogarth (1999)—they may help performance of easy tasks that are responsive to effort. However, as observed by Fantino and Kennelly (2009), and widely believed by economists, behavior

in economic distribution games may display greater optimality when real money is at stake. Fourth, rule-governed responding is acquired more quickly than contingency-shaped responding but is less sensitive to changes in contingencies. Thus, rules can help or hinder correct responding, depending on their applicability to the problem at hand. Finally, some nonoptimal behavior results from generalization of responses that are effective in other contexts. Thus, for example, a history of matching led to base-rate neglect by human participants in Goodie and Fantino’s (1995, 1996, 1999) studies, and pigeons given a similar history by Fantino, Kanevsky, and Charlton (2005) behaved in the same way. Similarly, misapplication of an averaging rule can lead to apparently illogical responding in judging the likelihood of conjunctions. The responses of human participants on reasoning tasks may seem irrational, especially in instances in which nonhumans respond more optimally. However, application of behavioral methods, including the study of verbal and rule-governed behavior, has great potential to help illuminate some of the puzzling findings of research in human reasoning.

References Alessi, S. M., & Petry, N. M. (2003). Pathological gambling severity is associated with impulsivity in a delay discounting procedure. Behavioural Processes, 64, 345–354. doi:10.1016/S0376-6357(03)00150-5 Anderson, N. H. (1965). Averaging versus adding as a stimulus-combination rule in impression formation. Journal of Experimental Psychology, 70, 394–400. doi:10.1037/h0022280 Anderson, N. H. (1981). Foundations of information integration theory. New York, NY: Academic Press. Arkes, H. R., & Ayton, P. (1999). The sunk cost and Concorde effects: Are humans less rational than lower animals? Psychological Bulletin, 125, 591–600. doi:10.1037/0033-2909.125.5.591 Barash, D. P. (2003). The survival game. New York, NY: Times Books. Baron, A., & Galizio, M. (1983). Instructional control of human operant behavior. Psychological Record, 33, 495–520. Bentall, R. P., & Lowe, C. F. (1987). The role of verbal behavior in human learning: III. Instructional effects in children. Journal of the Experimental Analysis of Behavior, 47, 177–190. doi:10.1901/jeab.1987.47-177 457

Fantino and Stolarz-Fantino

Bickel, W. K., Odum, A. L., & Madden, G. J. (1999). Impulsivity and cigarette smoking: Delay discounting in current, never, and ex-smokers. Psychopharmacology, 146, 447–454. doi:10.1007/ PL00005490 Birnbaum, M. (1983). Base rates in Bayesian inference: Signal detection analysis of the cab problem. American Journal of Psychology, 96, 85–94. doi:10.2307/1422211 Calne, D. B. (1999). Within reason: Rationality and human behavior. New York, NY: Pantheon Books. Camerer, C. F. (2003). Behavioral game theory: Experiments in strategic interaction. New York, NY: Russell Sage Foundation. Camerer, C. F., & Hogarth, R. M. (1999). The effects of financial incentives in experiments: A review and capital-labor-production framework. Journal of Risk and Uncertainty, 19, 7–42. doi:10.1023/A:1007850605129 Caraco, T. (1981). Energy budgets, risk and foraging preferences in dark-eyed juncos (Junco hyemalis). Behavioral Ecology and Sociobiology, 12, 213–217. doi:10.1007/BF00299833 Case, D. A., & Fantino, E. (1989). Instructions and reinforcement in the observing behavior of adults and children. Learning and Motivation, 20, 373–412. doi:10.1016/0023-9690(89)90003-9 Case, D. A., Fantino, E., & Goodie, A. S. (1999). Baserate training without case cues reduces base-rate neglect. Psychonomic Bulletin and Review, 6, 319–327. doi:10.3758/BF03212337 Case, D. A., Ploog, B. O., & Fantino, E. (1990). Observing behavior in a computer game. Journal of the Experimental Analysis of Behavior, 54, 185–199. doi:10.1901/jeab.1990.54-185 Catania, A. C., Shimoff, E., & Matthews, B. A. (1989). An experimental analysis of rule-governed behavior. In S. C. Hayes (Ed.), Rule-governed behavior: Cognitions, contingencies, and instructional control (pp. 119–150). New York, NY: Plenum Press. Colman, A. M. (2009). Oxford dictionary of psychology (3rd ed.). Oxford, England: Oxford University Press. de la Piedad, X., Field, D., & Rachlin, H. (2006). The influence of prior choices on current choice. Journal of the Experimental Analysis of Behavior, 85, 3–21. doi:10.1901/jeab.2006.132-04 Dixon, M. R., Marley, J., & Jacobs, E. A. (2003). Delay discounting by pathological gamblers. Journal of Applied Behavior Analysis, 36, 449–458. doi:10.1901/ jaba.2003.36-449 Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and opportunities. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases 458

(pp. 249–267). Cambridge, England: Cambridge University Press. Escobar, R., & Bruner, C. A. (2009). Observing responses and serial stimuli: Searching for the reinforcing properties of the S−. Journal of the Experimental Analysis of Behavior, 92, 215–231. doi:10.1901/jeab.2009.92215 Fantino, E., & Esfandiari, A. (2002). Probability matching: Encouraging optimal responding in humans. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 56, 58–63. doi:10.1037/h0087385 Fantino, E., Gaitan, S., Kennelly, A., & Stolarz-Fantino, S. (2007). How reinforcer type affects choice in economic games. Behavioural Processes, 75, 107–114. doi:10.1016/j.beproc.2007.02.001 Fantino, E., Jaworski, B. A., Case, D. A., & StolarzFantino, S. (2003). Rules and problem solving: Another look. American Journal of Psychology, 116, 613–632. doi:10.2307/1423662 Fantino, E., Kanevsky, I. G., & Charlton, S. (2005). Teaching pigeons to commit base-rate neglect. Psychological Science, 16, 820–825. doi:10.1111/ j.1467-9280.2005.01620.x Fantino, E., & Kennelly, A. (2009). Sharing the wealth: Factors influencing resource allocation in the sharing game. Journal of the Experimental Analysis of Behavior, 91, 337–354. doi:10.1901/jeab.2009.91-337 Fantino, E., Kulik, J., Stolarz-Fantino, S., & Wright, W. (1997). The conjunction fallacy: A test of averaging hypotheses. Psychonomic Bulletin and Review, 4, 96–101. doi:10.3758/BF03210779 Fantino, E., Navarro, A., & O’Daly, M. (2005). The science of decision-making: Behaviours related to gambling. International Gambling Studies, 5, 169–186. doi:10.1080/14459790500303311 Fantino, E., & Savastano, H. (1996). Humans’ responses to novel stimulus compounds and the effects of training. Psychonomic Bulletin and Review, 3, 204–207. doi:10.3758/BF03212419 Fantino, E., & Silberberg, A. (2010). Revisiting the role of bad news in maintaining human observing behavior. Journal of the Experimental Analysis of Behavior, 93, 157–170. doi:10.1901/jeab.2010.93-157 Fantino, E., & Stolarz-Fantino, S. (2001). Behavioral and economic approach to decision making: A common ground. Behavioral and Brain Sciences, 24, 407–408. Fantino, E., & Stolarz-Fantino, S. (2002). The role of negative reinforcement; or: Is there an altruist in the house? Behavioral and Brain Sciences, 25, 257–258. doi:10.1017/S0140525X02290056 Fantino, E., & Stolarz-Fantino, S. (2008). Gambling: Sometimes unseemly, not what it seems. Analysis of Gambling Behavior, 2, 61–68.

The Logic and Illogic of Human Reasoning

Fantino, E., & Stolarz-Fantino, S. (2009, May). Principles of choice and their applications. Paper presented at the 35th Annual Convention of the Association for Behavior Analysis International, Phoenix, AZ. Fantino, E., & Stolarz-Fantino, S. (2010). Grandparental altruism: Expanding the sense of cause and effect. Behavioral and Brain Sciences, 33, 22–23. Fantino, E., Stolarz-Fantino, S., & Navarro, A. (2003). Logical fallacies: A behavioral approach to reasoning. Behavior Analyst Today, 4, 109–117. Fiedler, K. (2000). Beware of samples! A cognitive– ecological sampling approach to judgment biases. Psychological Review, 107, 659–676. doi:10.1037/0033-295X.107.4.659 Forsythe, R., Horowitz, J. L., Savin, N. E., & Sefton, M. (1994). Fairness in simple bargaining experiments. Games and Economic Behavior, 6, 347–369. doi:10.1006/game.1994.1021 Gigerenzer, G., Hoffrage, U., & Ebert, A. (1998). AIDS counseling for low-risk clients. AIDS Care, 10, 197–211. doi:10.1080/09540129850124451 Gilovich, T. (1991). How we know what isn’t so: The fallibility of human reason in everyday life. New York, NY: Free Press. Goltz, S. M. (1992). A sequential learning analysis of decisions in organizations to escalate investments despite continuing costs or losses. Journal of Applied Behavior Analysis, 25, 561–574. doi:10.1901/ jaba.1992.25-561 Goltz, S. M. (1999). Can’t stop on a dime: The roles of matching and momentum in persistence of commitment. Journal of Organizational Behavior Management, 19, 37–63. doi:10.1300/J075v19n01_05 Goodie, A. S. (2001). Are scripts or deception necessary when repeated trials are used? On the social context of psychological experiments. Behavioral and Brain Sciences, 24, 412. Goodie, A. S., & Fantino, E. (1995). An experientially derived base-rate error in humans. Psychological Science, 6, 101–106. doi:10.1111/j.1467-9280.1995. tb00314.x Goodie, A. S., & Fantino, E. (1996). Learning to commit or avoid the base-rate error. Nature, 380, 247–249. doi:10.1038/380247a0 Goodie, A. S., & Fantino, E. (1999). What does and does not alleviate base-rate neglect under direct experience. Journal of Behavioral Decision Making, 12, 307–335. doi:10.1002/(SICI)10990771(199912)12:43.0.CO;2-H Güth, W., Schmittberger, R., & Schwarze, B. (1982). An experimental analysis of ultimatum games. Journal of Economic Behavior and Organization, 3, 367–388. doi:10.1016/0167-2681(82)90011-7

Hackenberg, T. D., & Joker, V. R. (1994). Instructional versus schedule control of humans’ choices in situations of diminishing returns. Journal of the Experimental Analysis of Behavior, 62, 367–383. doi:10.1901/jeab.1994.62-367 Hartl, J., & Fantino, E. (1996). Choice as a function of reinforcement ratios in delayed matching to sample. Journal of the Experimental Analysis of Behavior, 66, 11–27. doi:10.1901/jeab.1996.66-11 Herrnstein, R. J. (1964). Aperiodicity as a factor in choice. Journal of the Experimental Analysis of Behavior, 7, 179–182. doi:10.1901/jeab.1964.7-179 Hertwig, R., & Gigerenzer, G. (1999). The “conjunction fallacy” revisited: How intelligent inferences look like reasoning errors. Journal of Behavioral Decision Making, 12, 275–305. doi:10.1002/(SICI)10990771(199912)12:43.0.CO;2-M Hertwig, R., & Ortmann, A. (2001). Experimental practices in economics: A methodological challenge for psychologists? Behavioral and Brain Sciences, 24, 383–403. Hineline, P., & Wanchisen, B. (1989). Correlated hypothesizing and the distinction between contingencyshaped and rule-governed behavior. In S. C. Hayes (Ed.), Rule-governed behavior: Cognition, contingencies, and instructional control (pp. 221–268). New York, NY: Plenum Press. Hoffman, E., McCabe, K. A., & Smith, V. L. (1998). Behavioral foundations of reciprocity: Experimental economics and evolutionary psychology. Economic Inquiry, 36, 335–352. doi:10.1111/j.1465-7295.1998. tb01719.x Holt, D. D., Green, L., & Myerson, J. (2003). Is discounting impulsive? Evidence from temporal and probability discounting in gambling and non-gambling college students. Behavioural Processes, 64, 355–367. doi:10.1016/S0376-6357(03)00141-4 Horne, P. J., & Lowe, C. F. (1993). Determinants of human performance on concurrent schedules. Journal of the Experimental Analysis of Behavior, 59, 29–60. doi:10.1901/jeab.1993.59-29 Huber, L. (2009). Degrees of rationality in human and non-human animals. In S. Watanabe, A. P. Blaisdell, L. Huber, & A. Young (Eds.), Rational animals, irrational humans (pp. 3–21). Tokyo, Japan: Keio University Press. Humphreys, L. G. (1939). Acquisition and extinction of verbal expectations in a situation analogous to conditioning. Journal of Experimental Psychology, 25, 294–301. doi:10.1037/h0053555 Kanevsky, I. G. (2006). Role of rules in transfer of mathematical word problems. Dissertation Abstracts International: Section B. Sciences and Engineering, 67(6), 3491. 459

Fantino and Stolarz-Fantino

Kassinove, J. I., & Schare, M. (2001). Effects of the “near miss” and the “big win” on persistence at slot machine gambling. Psychology of Addictive Behaviors, 15, 155–158. doi:10.1037/0893-164X.15.2.155

Myers, J. L. (1976). Probability learning and sequence learning. In W. K. Estes (Ed.), Handbook of learning and cognitive processes (Vol. 3, pp. 171–205). Hillsdale, NJ: Erlbaum.

Kennedy, M. L., Willis, W. G., & Faust, D. (1997). The base-rate fallacy in school psychology. Journal of Psychoeducational Assessment, 15, 292–307. doi:10.1177/073428299701500401

Navarro, A. D., & Fantino, E. (2005). The sunk cost effect in pigeons and humans. Journal of the Experimental Analysis of Behavior, 83, 1–13. doi:10.1901/jeab.2005.21-04

Kennelly, A., & Fantino, E. (2007). The sharing game: Fairness in resource allocation as a function of incentive, gender, and recipient types. Judgment and Decision Making, 2, 204–216.

Navarro, A. D., & Fantino, E. (2007). The role of discriminative stimuli in the sunk cost effect. Revista Mexicana de Análisis de la Conducta, 33, 19–29.

Kudadjie-Gyamfi, E., & Rachlin, H. (2002). Rulegoverned versus contingency-governed behavior in a self-control task: Effects of changes in contingencies. Behavioural Processes, 57, 29–35. doi:10.1016/S03766357(01)00205-4 Locey, M. L., Pietras, C. J., & Hackenberg, T. D. (2009). Human risky choice: Delay sensitivity depends on reinforcer type. Journal of Experimental Psychology: Animal Behavior Processes, 35, 15–22. doi:10.1037/ a0012378 Luchins, A. S. (1942). Mechanization in problem-solving: The effect of Einstellung. Psychological Monographs, 54(6, Whole No. 248). Luchins, A. S., & Luchins, E. J. (1950). New experimental attempts at preventing mechanization in problem solving. Journal of General Psychology, 42, 279–297. doi:10.1080/00221309.1950.9920160 Madden, G. J., Ewan, E. E., & Lagorio, C. H. (2007). Toward an animal model of gambling: Delay discounting and the allure of unpredictable outcomes. Journal of Gambling Studies, 23, 63–83. doi:10.1007/ s10899-006-9041-5 Madden, G. J., Petry, N. M., & Johnson, P. S. (2009). Pathological gamblers discount probabilistic rewards less steeply than matched controls. Experimental and Clinical Psychopharmacology, 17, 283–290. doi:10.1037/a0016806 Meyer, S. F., Schley, D. R., & Fantino, E. (2011). The role of context in risky choice. Behavioural Processes, 87, 100–105. Michael, R. L., & Bernstein, D. J. (1991). Transient effects of acquisition history on generalization in a matching-to-sample task. Journal of the Experimental Analysis of Behavior, 56, 155–166. doi:10.1901/ jeab.1991.56-155 Mobbs, D., Hassabis, D., Seymour, B., Marchant, J. L., Weiskopf, N., Dolan, R. J., & Frith, C. D. (2009). Choking on the money: Reward-based performance decrements are associated with midbrain activity. Psychological Science, 20, 955–962. doi:10.1111/ j.1467-9280.2009.02399.x 460

Petry, N. M. (2001). Pathological gamblers, with and without substance use disorders, discount delayed rewards at high rates. Journal of Abnormal Psychology, 110, 482–487. doi:10.1037/0021-843X.110.3.482 Petry, N. M., & Casarella, T. (1999). Excessive discounting of delayed rewards in substance abusers with gambling problems. Drug and Alcohol Dependence, 56, 25–32. doi:10.1016/S0376-8716(99)00010-1 Petry, N. M., Stinson, F. S., & Grant, B. F. (2005). Comorbidity of DSM-IV pathological gambling and other psychiatric disorders: Results from the National Epidemiologic Survey on Alcohol and Related Conditions. Journal of Clinical Psychiatry, 66, 564–574. doi:10.4088/JCP.v66n0504 Rachlin, H. (1997). Self and self control. Annals of the New York Academy of Sciences, 818, 85–97. Rachlin, H. (2000). The science of self-control. Cambridge, MA: Harvard University Press. Rachlin, H., Brown, J., & Cross, D. (2000). Discounting in judgments of delay and probability. Journal of Behavioral Decision Making, 13, 145–159. doi:10.1002/ (SICI)1099-0771(200004/06)13:23.0.CO;2-4 Rosenfarb, I. S., Newland, M. C., Brannon, S. E., & Howey, D. S. (1992). Effects of self-generated rules on the development of scheduled-controlled behavior. Journal of the Experimental Analysis of Behavior, 58, 107–121. doi:10.1901/jeab.1992.58-107 Saramago, J. (1997). Blindness. New York, NY: Harcourt Brace. Sasada, K. (2003). The effects of different instructions on learning and transfer with verbal analogy problems. Unpublished master’s thesis, University of California, San Diego. Shead, N. W., Callan, M. J., & Hodgins, D. C. (2008). Probability discounting among gamblers: Differences across problem gambling severity and affect-regulation expectancies. Personality and Individual Differences, 45, 536–541. doi:10.1016/j. paid.2008.06.008

The Logic and Illogic of Human Reasoning

Shimoff, E., Catania, A. C., & Matthews, B. A. (1981). Uninstructed human responding: Sensitivity of lowrate performance to schedule contingencies. Journal of the Experimental Analysis of Behavior, 36, 207–220. doi:10.1901/jeab.1981.36-207 Skinner, B. F. (1969). Contingencies of reinforcement: A theoretical analysis. New York, NY: AppletonCentury-Crofts. Stewart, S. H., & Wall, A. (2005). Ontario Problem Gambling Research Centre final report: Mood priming of reward and relief gambling expectancies in different subtypes of gamblers. Guelph, Ontario, Canada: Ontario Problem Gambling Research Centre. Stolarz-Fantino, S., & Fantino, E. (1990). Cognition and behavior analysis: A review of Rachlin’s Judgment, decision, and choice. Journal of the Experimental Analysis of Behavior, 54, 317–322. doi:10.1901/ jeab.1990.54-317 Stolarz-Fantino, S., & Fantino, E. (1995). The experimental analysis of reasoning: A review of Gilovich’s “How we know what isn’t so.” Journal of the Experimental Analysis of Behavior, 64, 111–116. doi:10.1901/jeab.1995.64-111 Stolarz-Fantino, S., Fantino, E., & Kulik, J. (1996). The conjunction fallacy: Differential incidence as a function of descriptive frames and educational context. Contemporary Educational Psychology, 21, 208–218. doi:10.1006/ceps.1996.0017 Stolarz-Fantino, S., Fantino, E., & Van Borst, N. (2006). Use of base rates and case cue information in making likelihood estimates. Memory and Cognition, 34, 603–618. doi:10.3758/BF03193583 Stolarz-Fantino, S., Fantino, E., Zizzo, D. J., & Wen, J. (2003). The conjunction effect: New evidence for robustness. American Journal of Psychology, 116, 15–34. doi:10.2307/1423333 Tversky, A., & Kahneman, D. (1982). Evidential impact of base rates. In D. Kahneman, P. Slovic, & A.

Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 153–160). Cambridge, England: Cambridge University Press. Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90, 293–315. doi:10.1037/0033-295X.90.4.293 Vaughan, M. E. (1985). Repeated acquisition in the analysis of rule-governed behavior. Journal of the Experimental Analysis of Behavior, 44, 175–184. doi:10.1901/jeab.1985.44-175 Weatherly, J., Marino, J., Ferraro, F. R., & Slagle, B. (2008). Temporal discounting predicts how people gamble on a slot machine. Analysis of Gambling Behavior, 2, 135–141. Weatherly, J. N., Sauter, J. M., & King, B. M. (2004). The “big win” and resistance to extinction when gambling. Journal of Psychology, 138, 495–504. doi:10.3200/JRLP.138.6.495-504 Wyckoff, L. B., Jr. (1952). The role of observing responses in discrimination learning: Part I. Psychological Review, 59, 431–442. doi:10.1037/ h0053932 Yerkes, R. M., & Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology, 18, 459–482. doi:10.1002/cne.920180503 Zizzo, D. J. (2001). Choices between simple and compound lotteries: Experimental evidence and neural network modelling (Department of Economics Discussion Paper 57). Oxford, England: University of Oxford. Zizzo, D. J., Stolarz-Fantino, S., Wen, J., & Fantino, E. (2000). A violation of the monotonicity axiom: Experimental evidence on the conjunction fallacy. Journal of Economic Behavior and Organization, 41, 263–276. doi:10.1016/S01672681(99)00076-1

461

Chapter 20

Self-Control and Altruism Matthew L. Locey, Bryan A. Jones, and Howard Rachlin

Self-control and altruism are both dimensions of choice. When a person chooses a distant or temporally extended reward such as good health over a relatively closer and temporally constricted reward such as smoking a cigarette, the person’s act is said to be self-controlled rather than impulsive. Correspondingly, when a person chooses a reward for another person or a group of other people at a cost to him- or herself, as when giving anonymously to charity, the person is said to be behaving altruistically rather than selfishly. Our main purpose in this chapter is to compare these two dimensions of behavior—to show how they are and are not analogous. We make this comparison in two conceptual contexts: the economic view and teleological behaviorism. Actually, both contexts are economic, and both are teleological. They are economic in the sense that they both rely on the principle of maximization of value; they are both teleological in the sense that they both focus on choice as a function of outcome—on final rather than efficient causes. The main difference between the two contexts is that from the economic view, both behavior and outcome are individual discrete events, whereas from the view of teleological behaviorism, both behavior and outcome are patterns of events. Given that any act, no matter how brief (even a pigeon’s peck), may be subdivided into subacts and that any pattern, no matter how extended, may be taken as a unit, the difference is more a matter of perspective than of essence. Teleological behaviorism is thus an extension of the basic economic view rather than an

opposing account. Nevertheless, we show that the two contexts have different (if not incompatible) implications. From the economic view, both self-control and altruism may be understood in terms of discount functions—reward value as a function of the delay to its receipt or as a function of social distance to the receiver—and their interaction. We summarize research on delay and social discount functions, use discount functions to describe self-control and altruism in the laboratory, and relate those studies to self-control and altruism in people’s everyday lives. Discount functions (their interaction and their integration), however, are not sufficient to account for many complex cases of self-control and altruism. Such cases are better understood from a teleological viewpoint—as patterns of behavior in which the value of the pattern may be greater (or less) than the sum of the values of its component acts. Economic View According to the economic view of choice, an organism’s preferences may be expressed in terms of a utility function. Utility functions specify the value of choice alternatives in terms of their amounts, probabilities, and delays. From the economic viewpoint, organisms faced with constraints (contingencies) that limit available alternatives maximize utility by choosing the highest utility alternative. If the utility function and the constraints are known, it should be possible—according to the economic view—to

This article was prepared with the assistance of National Institute on Drug Abuse Grant DA02652021. DOI: 10.1037/13937-020 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

463

Locey, Jones, and Rachlin

predict which alternative will be chosen (Kagel, Battalio, & Green, 1995; Rachlin, 1989). According to this view, an alternative consists not of a single isolated event but of a series of events strung out in time, each with its own amount, probability, and delay. A person with alcoholism deciding whether to drink a glass of scotch considers not just the immediate consequences but also tomorrow’s hangover; the effect that drinking this particular scotch will have on the tendency to drink another one and another one; the enjoyment obtained from those future drinks; the series of hangovers they entail; their effects on health, social acceptance, and work performance; and so on for the remainder of the person’s life (Becker & Murphy, 1988). Fortunately for the economist trying to explain choice (if not for the person with alcoholism), delayed events are reduced in effective amount (discounted) often to the degree that in practice only the initial consequences of a given choice (only the initial events within each alternative) need be considered. That is, delayed rewards are discounted—the greater the delay, the greater the discount. For example, a pigeon will typically choose the sequence of four food pellets, 10-second delay, two food pellets (4p–10s–2p) over the sequence of three food pellets, 10-second delay, and eight food pellets (3p–10s–8p). The initial rewards count for much more than the delayed ones. The delay discounting of both humans and pigeons may be expressed in terms of this hyperbolic discount equation (Mazur, 1987):  delay =

V , 1 + kdelay D s

(1)

where V is the value of the reward if it were obtained immediately, D is the delay of the reward, kdelay is a constant measuring the degree of delay discounting (the greater kdelay is, the more a given delayed reward is discounted, i.e., the less it is worth at the present moment), and s is a constant measuring subjective delay (Green & Myerson, 2004). If an alternative consists of more than one reward in sequence, then vdelay is the sum of the values of each reward in the sequence discounted by its delay from the present moment (Mazur, 1986). For the pigeon’s alternatives in the preceding paragraph (assuming for 464

simplicity that V = number of pellets obtained, the time taken to consume the pellets may be ignored, and kdelay = s = 1.0), the value of the 4p–10s–2p sequence would be [4/(1 + 0) + 2/(1 + 10)] or 4.18 units; the value of the 3p–10s–8p sequence would be [3/(1 + 0) + 8/(1 + 10)] or 3.73 units. The pigeon is predicted to choose the 4p–10s–2p alternative totaling six pellets and forgo the 3p–10s–8p alternative totaling 11 pellets. According to Equation 1, which roughly predicts choice in such situations, the initial one-pellet difference in reward amount overwhelms a six-pellet difference 10 seconds later. Although humans are apparently much less myopic than pigeons (delaying rewards for days, weeks, and years), when humans are thirsty or hungry and choosing among water or food rewards (as pigeon subjects usually are), their myopia (as measured by the obtained kdelay in Equation 1) is comparable to that of pigeons (Forzano & Logue, 1994).

Hyperbolic Delay Discounting by Humans Rachlin, Raineri, and Cross (1991) obtained delay discount functions with humans by asking them to make a series of choices among hypothetical rewards of different delays and amounts of money. For example, Rachlin et al. asked, “Which would you prefer, A) $1,000 now or B) $1,000 one year from now?” Of course, everyone preferred A. Then they asked, “Which would you prefer, A) $950 now or B) $1,000 one year from now?” If the person still preferred alternative A, they asked, “Which would you prefer, A) $900 now or B) $1,000 one year from now?” and so on, going down in steps all the way to “Which would you prefer, A) $0 now or B) $1,000 one year from now?” At this point, people preferred B. The crucial datum in this procedure was the point at which the person crossed over from preference for A to preference for B. Rachlin et al. assumed that the person was indifferent between the crossover amount now and $1,000 1 year from now. For example, if a person preferred $800 now to $1,000 a year from now but $1,000 a year from now to $750 now, the crossover point was $775—indifference between $775 now and $1,000 a year from now. Rachlin et al. then repeated the procedure with a series of delays ranging from 1 month through 50 years. In terms of Equation 1, V = $1,000, D = the

Self-Control and Altruism

1000

800

$ Amount

tested delay, vdelay = the obtained crossover point, and kdelay and s were constants to be estimated from the data. Figure 20.1 shows a delay discount function obtained by Rachlin et al. (1991) using this method.1 The points are median crossover points (of 40 Stony Brook University undergraduates) at each delay. As one would expect, the longer the delay, the lower the crossover point. The fit of the line to the points is, as you can see, very good.2 Most laboratory studies of human delay discounting have used hypothetical monetary rewards (Green & Myerson, 2004), but the results with small amounts of real cash have been similar to those obtained with equally small hypothetical rewards (Madden et al., 2003). This is not to say that type or amount of reward makes no difference in delay discounting. A weak currency (the Polish zloty) was discounted more steeply than a relatively strong currency (the U.S. dollar; Ostaszewski, Green, & Myerson, 1998). With noncash rewards, luxuries were discounted more steeply than necessities (Chapman & Elstein, 1995; Raineri & Rachlin, 1993). Large monetary amounts or highly valued commodities (large V values) are discounted relatively less steeply than small amounts of money or less valued commodities (Green, Myerson, & McFadden, 1997; Kirby, 1997; Raineri & Rachlin, 1993); for example, $100 delayed by a year is equivalent on average to $58 now (a discount of 42%), but $10,000 delayed by a year is equivalent to $7,500 now (a discount of 25%), and $1 million delayed by a year is equivalent to $900,000 now (a discount of only 10%). This decrease in relative delay discounting with increase in amount is called the amount effect. This effect is highly reliable and is found with hypothetical money and hypothetical commodities as well as with small amounts of real money (Kirby, 1997). However, a delay-discounting

600

400

200

0

0

100

200

300

400

500

600

Delay (Months)

Figure 20.1. Delay discounting: The median amount of immediate money equivalent to $1,000 after various delays. The solid line is the best-fitting version of Equation 1 (kdelay = 0.02, s = 0.9, R2 = .996). From “Subjective Probability and Delay,” by H. Rachlin, A. Raineri, and D. Cross, 1991, Journal of the Experimental Analysis of Behavior, 55, p. 240. Copyright 1991 by the Society for the Experimental Analysis of Behavior, Inc. Adapted with permission.

amount effect has not generally been found with nonhuman subjects (Green, Myerson, Holt, Slevin, & Estle, 2004).

Self-Control in the Economic View With such strong preferences for a slightly larger immediate reward and such weak preferences for a much larger delayed reward, how is self-control possible in the everyday lives of humans and nonhumans? People (and even pigeons) do often choose larger–later over smaller–sooner rewards. Colleges are full of students even though the rewards of obtaining a college education are delayed by years (assuming that going to college is not governed solely by love of knowledge); people do walk past bakeries although tempted to walk in; and people with alcoholism do refuse a drink. Given the economic model, how is such behavior possible?

Although there are problems with hypothetical rewards in the laboratory, real rewards create their own set of problems. With hypothetical rewards, participants are asked to imagine a real-world context and to choose within that context. Real monetary rewards obtained in the laboratory may impose a context of narrow maximization of monetary amount that may differ from the context, implied by the instructions, of primary interest to the experimenter. A participant in the laboratory, trying to imagine what she or he would do in a hypothetical real-world situation, may come closer to that situation than one simply trying to maximize monetary reward. Moreover, comparisons between delay discounting of real and hypothetical rewards have generally failed to find a difference in either degree or functional form (Kirby, 1997; Madden, Begotka, Raiff, & Kastern, 2003). A comparison of real and hypothetical rewards in social discounting also failed to find any significant differences (Locey, Jones, & Rachlin, 2011).

1

The function drawn in Figure 20.1 is fitted to the median crossover points of the group. However, in all discounting studies with humans reported in this chapter, discount functions were also obtained for each participant individually. In all cases, the median R2 for the individual participant fits was greater than .90 and the medians of the parameters of the individual functions closely matched the parameters of the median functions (such as those in Figure 20.1). This concordance suggests that the results are not artifacts of averaging.

2

465

Locey, Jones, and Rachlin

According to the economic view, the answer is by prior commitment (Ainslie, 1992). According to Equation 1 (assuming again that kdelay = s = 1.0), pigeons prefer the sequence with the higher immediate reward and the lower overall reward (4p–10s–2p) to the sequence with the higher overall reward and the lower immediate reward (3p–10s–8p). Equation 1, however, predicts that if a long-enough common delay is added before both sequences, preference will reverse. For example, suppose a 60-second delay is added before both the 4p–10s–2p and 3p–10s–8p sequences. Recalculating with the extra delay added in, the value of the 60s– 4p–10s–2p sequence becomes [4/(1 + 60) + 2/(1 + 70)], or 0.10 units, and the value of the 60s–3p–10s– 8p sequence becomes [3/(1 + 60) + 8/(1 + 70)], or 0.16. The initial 60-second delay greatly reduced the value of both sequences, but the relative values of the two have reversed—from a slight (53%) preference for 4p–10s–2p over 3p–10s–8p to a stronger (62%) preference for 60s–3p–10s–8p over 60s–4p–10s– 2p—and become close to a preference based on the overall amount of food, [11/(6 + 11)], or 65%, in favor of the larger overall reward. Such reversals do occur. They are powerful evidence for the hyperbolic form of Equation 1 as opposed to the exponential form (v = V/ekD), which predicts no such reversals (Ainslie, 1992; Rachlin, 2006). Now suppose a pigeon starts with a 60-second delay and time passes so that the common delay before the two reward sequences diminishes with time. At the beginning, with the full 60-second delay still to come, the pigeon prefers 60s–3p–10s–8p to 60s–4p–10s–2p; however, 60 seconds later, if offered a new choice between the now immediate 3p–10s–8p and 4p–10s–2p sequences, the pigeon will change its mind and choose the 4p–10s–2p sequence (the sequence with fewer total pellets). Similarly, in human life, people promise themselves in the morning to eat and drink moderately at a party that evening. When evening comes, however, they change their minds and overeat or overdrink. (That evening is only part of the sequence; the next morning when people step on the scale or have the hangover, they experience the next part.) Suppose, however, that at the very beginning (at the point at which the common delay was still 466

60 seconds), the pigeon could commit itself to the sequence it then prefers (3p–10s–8p). At that point, suppose the pigeon is given the opportunity (by pecking a third key) to prevent the 4p–10s–2p sequence from being offered later, after the 60-second delay has elapsed. Research has shown that pigeons tend to take this kind of option, thus committing to the larger overall reward (Rachlin & Green, 1972). Procedures such as this, in which an organism may choose to completely avoid a later alternative, use the strict commitment method. Another method of commitment studied with pigeons is called punishment commitment (Green & Rachlin, 1996). At the beginning of the common delay, at the point at which the pigeon prefers the larger overall reward sequence, the pigeon is offered the option of attaching punishment to the more tempting but smaller overall reward sequence. If the pigeon does choose the punishment option, it will not be strictly prevented from choosing the tempting sequence later, but if it does choose and obtain that sequence (4p–10s–2p in our example), it will be punished afterward (by a long blackout in which all lights in the chamber are turned off and no rewards are obtainable). The addition of punishment to the tempting sequence makes it much less tempting. With the punishment in place, pigeons will usually choose and obtain the larger overall reward. The pigeon is essentially in the same position as the person with alcoholism offered the option of taking the drug Antabuse before attending a party. If the person takes the drug and then has an alcoholic drink, a very painful reaction ensues. Unlike people with alcoholism, who typically refuse to take the drug, pigeons, before the long common delay, tend to choose the punishment option, the only effect of which is to add punishment to one alternative. Given their own later preference for the smaller–sooner alternative, the pigeons can obtain the overall larger reward—the reward they preferred earlier—only by choosing the punishment option. Similarly, humans commit themselves to exercise regularly by subscribing to lengthy health club memberships in which they lose money if they stop going. People may make bets with their friends that they will achieve a certain weight at a certain time; they sign contracts with punitive clauses for failing to complete a job by a certain time.

Self-Control and Altruism

at self-control. If, for example, a person with alcoholism could decide now how much he or she would drink over the next month rather than whether to have a drink now—and be strictly committed to that decision—he or she could more easily control the alcoholism. Everyday life, however, rarely offers people the opportunity to commit themselves in this strict way. In later sections, we outline another kind of commitment, soft commitment, that is more common in everyday life. First, we must make specific the relationship between discounting and self-control.

Individual Differences in Self-Control Imagine two people, John and Mary; John’s kdelay is 0.1 and Mary’s is 0.3 (delay measured in days). Figure 20.2 (axis reversed from that in Figure 20.1 to show time passing rather than delay) shows two alternative rewards, each with two discount functions, one set for John (solid lines) and one set for 100

80 John's crossover point

$ Amount

Commitment in one form or another is the fundamental way the economic view sees self-control (Ainslie, 1992, 2001). Consider the following pair of alternatives for a pigeon: a four-pellet reward after a 4-second blackout versus a one-pellet reward before a 4-second blackout (4s–4p vs. 1p–4s). According to Equation 1 (still assuming that consumption time may be ignored and that s and kdelay = 1), the value of the former sequence (larger–later reward) is 0.80 units, and that of the latter sequence (smaller– sooner) is 1.00 unit (the blackout after the small reward is ignored). The pigeon prefers the smaller– sooner reward. Now consider a choice between triplets of these alternatives strung together over the same time period: 4s–4p–4s–4p–4s–4p versus 1p–4s–1p–4s–1p–4s. Starting at the beginning of the larger–later sequence, there are three four-pellet rewards: one after 4 seconds, one after 8 seconds, and one after 12 seconds. Summing the values of the three rewards (0.80 + 0.44 + 0.31), the larger–later sequence increases in value to 1.55 units. Summing the values of the three one-pellet rewards—one immediately, one after 4 seconds, and one after 8 seconds (1 + 0.20 + 0.11)—the smaller–sooner sequence increases in value to 1.31 units, less than the value of the larger–later sequence. As the second and third rewards are added, the initial delay to the larger–later reward weighs less relative to its amount, and the pigeon comes to prefer the larger–later sequence. Now suppose that it was not possible to choose between the two individual rewards; suppose that the pigeon was committed to choose between whole sequences. The pigeon would then choose the larger–later sequence—the self-controlled choice. Ainslie (2001) called commitment to such sequences bundling. In laboratory experiments, rats and humans tended to increase choices of larger–later over smaller–sooner rewards when rewards were bundled as described in the preceding paragraph, that is, when subjects were required to choose not between individual rewards but between reward sequences (Ainslie & Monterosso, 2003, with rats, and Kirby & Guastello, 2001, and Kudadjie-Gyamfi & Rachlin, 1996, with humans). If, in everyday human life, it was possible to commit oneself to choose among bundles rather than on a case-by-case basis, people would certainly do better

60

40

20

0 NOW 2

4

6

8

10

Time (Days)

12

14

Figure 20.2. Differences in discounting: Delay discounting for two different monetary rewards: $60 in 7 days (shorter bar) and $100 in 15 days (taller bar). The curved lines indicate the increasing value of those rewards as time passes and the rewards become more imminent. The dashed lines indicate reward values with a steep discount function (kdelay = 0.3 for Mary). The solid lines indicate reward values with a more shallow discount function (kdelay = 0.1 for John). The circle indicates the point at which the $60 reward becomes equally as valuable to John as the $100 reward (2 days before the $60 reward delivery). 467

Locey, Jones, and Rachlin

Mary (dashed lines). The two alternative rewards (the vertical bars) loom in the future—a smaller– sooner reward ($60 in 1 week) and a larger–later reward ($100 in 15 days). Because John’s kdelay is less than Mary’s, John’s discount functions (the solid lines) are shallower than Mary’s (the dashed lines). At the point marked NOW, John prefers the larger– later reward to the smaller–sooner reward; the solid line subtended from the $100 is now higher than the one subtended from the $60 reward. At NOW, he can instate a commitment procedure to remove or attach punishment to the smaller–sooner alternative. If he institutes either of these commitment procedures, he will actually obtain the larger–later reward, the one he now prefers. However, if he waits 5 days, the solid lines will have crossed, and he will prefer the smaller–sooner reward. At that point, it will be too late for him to commit himself; he will have no incentive to do it. For Mary, it is already too late. She now prefers the smaller–sooner reward and, given the alternatives as stated, will obtain that reward. (Had she been offered the alternatives 2 days earlier, her discount functions would have crossed, and she would have been able to commit to the larger reward.) The underlying reason that John can now use commitment and Mary cannot is that delay has less of a discounting effect on reward value for John than for Mary. According to the economic view, the lower a person’s kdelay is, the shallower that person’s delay discount functions are, the more time the person has to institute commitment procedures, and the more self-control the person should show. On the basis of this argument, people with low kdelays should be expected to have more self-control than people with high kdelays. This expectation has been reliably confirmed in many settings. The kdelays of people addicted to heroin, cocaine, or alcohol; people who smoke; and people who gamble, as measured by the procedure described earlier, are significantly higher than the kdelays of those who do not (Alessi & Petry, 2003; Baker, Johnson, & Bickel, 2003; Bickel, Odum, & Madden, 1999; Coffey, Gudleski, Saladin, & Brady, 2003; Kirby, Petry, & Bickel, 1999; Madden, Bickel, & Jacobs, 1999; Madden, Petry, Badger,

& Bickel, 1997; Mitchell, 1999; Petry, 2001; Vuchinich & Simpson, 1998; see also the recent volume edited by Madden & Bickel, 2009). Moreover, the kdelays of young children are higher than those of young adults, and the kdelays of young adults are higher than those of older adults (Green, Fry, & Myerson, 1994). The simple test of delay discounting described earlier correlates remarkably with failure of self-control in everyday human life (see Volume 2, Chapters 7 and 8, this handbook). Teleological Behaviorism Certainly, it is possible to achieve self-control by strict commitment or punishment commitment. People can avoid going into the bakery by walking down another street; they can avoid eating the pint of ice cream by locking the refrigerator and giving the key to their spouse. However, another person may actually walk right by the bakery without going in and leave the pint of ice cream untouched in an unlocked refrigerator. How do people manage to do this? The answer is, by establishing patterns in their lives that are valuable as such—independent of the sum of the values of their individual components. The kind of behaviorism that sees patterns of behavior as crucial in everyday life as well as in the psychology laboratory is called teleological behaviorism (Rachlin, 1992, 1994, 1995b).

Teleological View of Self-Control Consider still another experiment with pigeons (Siegel & Rachlin, 1995): A hungry pigeon faces a pair of lit keys, one red and one green.3 If the pigeon pecks the red key, it receives two food pellets (2p); if the pigeon pecks the green key, a brief blackout of 4 seconds ensues, but after the blackout, four food pellets are delivered (4s–4p). Calculating values from Equation 1, as before, the value of the immediate two-pellet reward is 2 units [2/(1 + 0)]; the value of the delayed four-pellet reward is 0.8 units [4/(1 + 4)], a strong relative value of 71% [2/(2 + 0.8)] in favor of the smaller–sooner reward over the larger–later one. Now consider the following change: The pigeon must peck 30 times distributed

In this experiment, as in all other experiments with pigeon subjects described in this chapter, behavior is reported after many weeks of daily sessions and hundreds of exposures to the experimental contingencies.

3

468

Self-Control and Altruism

in any way between the two keys; after those 30 pecks, the very next peck produces a reward. If that 31st peck was on the red key, the pigeon obtains the immediate two-pellet reward; if that 31st peck was on the green key, the pigeon obtains the delayed four-pellet reward. It takes about 30 seconds for a pigeon to make the initial 30 pecks. Using Equation 1 to calculate the value of the two rewards before the pigeon starts pecking, the value of the smaller– sooner reward (30s–2p) is 0.065 units [2/(1 + 30)], and the value of the larger–later reward (30s–4s–4p) is 0.11 units [4/(1 + 34)], a relative value of 64% [0.11/(0.11 + 0.065)] in favor of the larger–later reward. Thus, with an added 30-second common delay, the larger–later reward is nearly twice as valuable as the smaller–sooner one—a complete reversal of the pigeon’s preference without the common 30-second delay. Indeed, pigeons begin the 30-peck sequence by preferring the larger–later reward in a ratio of 2:1. However, the economic theory predicts that as the 30-peck sequence progresses, preference for the larger–later reward should weaken and eventually reverse. That is, the pigeons should begin by pecking the green key (30s–4s–4p) but, as the 30 pecks progress, should eventually switch over, end up pecking the red key, and obtain the smaller– sooner reward (2p). However, contrary to economic theory, the pigeons almost never switched over. The pigeons persisted in pecking on the key they started on and thereby obtained the larger reward. Siegel and Rachlin call this persistence soft commitment. Rachlin (1995a, 2000) argued that it is by means of soft commitment, the persistence of behavioral patterns, that humans solve most of the self-control problems in their everyday lives (i.e., if they do solve them). The gestalt psychologists said that the whole is greater than the sum of its parts (Koffka, 1935). Teleological behaviorism extends that gestalt dictum. It says that the value of an activity may be greater than the sum of the values of its parts. A person with alcoholism prefers to be sober, healthy, and socially accepted and to perform well at his or

her job than to be drunk all the time, unhealthy, and socially rejected and to perform poorly at his or her job. At the same time, over the next few minutes, he or she prefers to have a drink than to not have one. If, over successive brief intervals, the person with alcoholism always does what he or she prefers at the moment, she or he will always be drinking. A quotation from the comedian Dick Cavett makes a corresponding point about smoking: Once, as [Cavett and Jonathan Miller of “Beyond the Fringe”] waited backstage together at the 92nd St. Y in New York City, [Cavett] pointed disapprovingly at [Miller’s] lit cigarette. [Miller said,] “I know these will kill me, I’m just not convinced that this particular one will kill me.” (Cavett, 2009, p. 10) The problem for the person with alcoholism as well as for the person who smokes is how to make choices over the longer time span and avoid making choices, as Jonathan Miller did, on a case-by-case basis. The reason why people have trouble bringing their behavioral patterns into line with abstract and temporally extended behavioral contingencies is that the value of a desired pattern’s particular component (refusing the drink or the cigarette) may be much less than that of its alternative (drinking the drink or smoking the cigarette).4 As Jonathan Miller implied, each cigarette refusal has virtually no value in itself relative to its alternative. Refusing a particular cigarette is worth nothing compared with smoking it. Moreover, individual cigarette refusals are almost never reinforced—not immediately, not conditionally, not after a delay. If a person refuses a single cigarette, he or she does not wake up 3 weeks later, suddenly a healthier and happier person. To realize the value of a cigarette refusal, the person must put together a long string of them. The concept that the value of a behavioral pattern may differ from the sum of the values of its parts is not just an empty slogan borrowed from gestalt psychology. It is the very basis of a consistent

Many behavioral patterns are encoded as verbal rules in everyday speech and in texts. Such rules may be nested within metarules such as “Obey your parents.” Occasionally, such rules are enforced wholly by external rewards and punishments of individual components. More frequently, verbal rules serve as complex discriminative stimuli for the intrinsically valuable patterns of behavior, such as living a healthy life, being discussed here (Baum, 1994; Rachlin, 2000).

4

469

Locey, Jones, and Rachlin

approach to the problem of self-control and, as we show, to altruism.

Delay Discount Functions in Teleological Behaviorism In previous sections, we used Equation 1 to calculate the value of individual food rewards. Equation 1 expresses the value of a single reward at a single delay. Where rewards occurred in sequence, we added up their values to arrive at the value of the sequence as a whole. For pigeons in the laboratory presented with simple sequences of discrete and clearly identifiable food rewards, Equation 1 predicts choice well. When the value of a sequence of rewards differs from the sum of the values of its components (as teleological behaviorism asserts is often the case), however, Equation 1 cannot be straightforwardly applied. We have seen that Equation 1 could not predict the behavior of pigeons in the Siegel and Rachlin (1995) soft commitment procedure (in which 30 pecks were required for reinforcement on either key). The same is true for many complex human choice situations, in the laboratory as well as in the real world. Consider a series of rewards arranged in either increasing or decreasing order. Equation 1 predicts that the decreasing order, in which the larger rewards are less delayed than the smaller rewards, will be preferred to the increasing order, in which the smaller rewards are less delayed than larger rewards. Yet people often prefer an increasing series of monetary rewards to a decreasing series (Loewenstein & Prelec, 1992). On the negative side, people prefer a sequence of more intense pain followed by less intense pain to a sequence of less intense pain followed by more intense pain (Kahneman, 1999), even though Equation 1 predicts the reverse. As one goes from the laboratory to more complex real-life situations, Equation 1 breaks down still further. The order of dishes in a preferred restaurant meal, of notes in a good song, of movements in a good dance, or of chapters in a good book does not go from best to worst over time or more exciting to less exciting, as Equation 1 predicts. Rather, the overall patterns of these events are crucial to their value. Then what, it may justifiably be asked, is the point of measuring people’s kdelays by Equation 1? 470

Let us take a step back and approach this question from a theoretical perspective. Behaviorists do not believe that hyperbolic discount functions obtained in the laboratory are represented as such in people’s nervous systems. Hyperbolic discount functions are not used by people to evaluate alternatives. Rather, a discount function is a convenient measure by which the experimenter or another observer may gauge the temporal extent of the contingencies that control a person’s choices. In one of Jerry Seinfeld’s stand-up routines, he imagines himself as two people: One is the nighttime Jerry who lives carelessly, spends money, drinks, and stays up late; the other is next-morning Jerry who has a hangover and needs to get up early to go to work. Nighttime Jerry has contempt for next-morning Jerry; he cares nothing for him and makes no sacrifices for him. Next-morning Jerry of course hates nighttime Jerry, the cause of all his suffering. As Seinfeld says, the only way next-morning Jerry can take revenge against nighttime Jerry is to stay in bed, lose his job, and run out of money so that nighttime Jerry will not be able to live his wild life. (In real cases of addiction, this is exactly what happens.) What would it mean then if this imagined Jerry learned self-control? For a teleological behaviorist, self-control consists of temporally extended patterns of behavior conforming to abstract environmental contingencies (e.g., “Always try to understand the other person’s point of view”). Formation of such patterns involves cooperation between nighttime Jerry, next-morning Jerry, and next-day, next-week, next-month Jerry. These guys need to make friends with each other. The wider and more abstract the pattern necessary to describe Jerry’s behavior is (the more self-control Jerry has) the shallower his delay discount function is. It is important to note, though, the shallowness of Jerry’s delay discount function is not the immediate cause of his self-control but a measure of it (just as the temperature of a gas is not the cause of molecular motion but a measure of it). Social Discounting The view of delay discounting expressed in the previous section might be called a social theory of delay

Self-Control and Altruism

discounting.5 It sees people at future times connected to their present selves as analogous to different people connected to each other socially. People have overlapping interests, with their relatives and friends separated from them by social space. The greater the social distance is between a person and any particular other person and the fewer common interests they have between them, the less cooperative each will be with the other. Similarly, individuals may be seen as having overlapping interests with their future selves— their selves separated from them by time; the farther into the future (or the past) and the fewer common interests between them and their future selves, the less cooperative they will be with their future selves and the less self-controlled their behavior will be. The fictional Jerry of the previous section would constitute an extreme case in which a few hours is far enough into the future to prevent cooperation. This idea, that temporal separation is analogous to social distance, may be found in Aristotle’s Nicomachean Ethics (Rachlin, 1994). In modern philosophy, this idea has been advanced by Parfit (1984); in social psychology, by Trope (Trope & Liberman, 2003) and others; in behavioral psychology, by Ainslie (1992) and Rachlin (1989); and in economics, by Simon (1995). Simon suggested extending the economic model to three dimensions: (a) current consumption by the person, (b) consumption by the same person at later times (delay discounting), and (c) consumption by other people (social discounting). For Simon, discount functions such as that shown in Figure 20.1 should be three dimensional: “Instead of a one-dimensional maximizing entity, or even the two-dimensional individual who allocates intertemporally, this model envisages a threedimensional surface with an interpersonal ‘distance’ dimension replacing the concept of altruism” (p. 367). For Simon, generosity to another person (or altruism) should be no more surprising than generosity to one’s own self at future times (self-control).

Hyperbolic Social Discounting With Humans Rachlin and colleagues (Rachlin, 1989; Raineri & Rachlin, 1993) speculated that given the parallel

between delay and social distance, social discount functions would have the same hyperbolic form as delay discount functions: vsocial =

V , 1 + ksocial N

(2)

where V is the value of an immediate reward given to Person A; vsocial is the value to Person A of that reward given to Person B; N is the social distance between Person A and Person B; and ksocial is a constant that differs among individuals and measures degree of social discounting. (The exponent s in Equation 1 has been found to equal 1.0 in social discount functions.) Experiments in the behavioral laboratories at Stony Brook have reliably obtained hyperbolic social discount functions that conform to Equation 2 (Jones & Rachlin, 2006; 2009; Rachlin & Jones, 2008a, 2008b, 2009). The general procedure of these experiments was as follows: Students (Stony Brook University undergraduates) were asked to imagine that they had made a list of the 100 people closest to them (but not to actually make the list). Order on the list (N, ranging from 1 to 100) was taken as a measure of social distance. Students were then asked to choose hypothetically between a smaller amount of money for themselves and a larger amount for a person at a given social distance. For example, a student might prefer $75 for him- or herself to $75 for the 10th person on his or her list but prefer $75 for the 10th person on the list to $5 for him- or herself. Just as with delay discounting, at some crossover amount between $75 and $5 for him- or herself, the student must be indifferent between an amount for him- or herself and a (usually) larger amount ($X) for the 10th person on the list. In our studies, as N increased, the crossover amount decreased. That is, as social distance increased, students were willing to forgo less money for themselves to give a fixed amount to another person. The function thus obtained, relating social distance to amount of money forgone, is a social discount function. Figure 20.3 shows functions from two experiments.

The focus of this and the following sections is social discounting and its relation to altruism. Experimental work on social discounting has so far been done almost wholly in our laboratory at Stony Brook University. However, an extensive literature also exists on the experimental analysis of social cooperation. An excellent review of this work can be found in Schmitt (1998).

5

471

Locey, Jones, and Rachlin

$ Forgone to Give $75 to Person N

80 Jones & Rachlin (2006)

70 60 50 40 30 20

Rachlin & Jones (2008a)

10 0

0

20

40 60 Social Distance (N)

80

100

Figure 20.3. Social discounting: The median amount of hypothetical money forgone to give $75 to another person at various social distances using the Jones and Rachlin (2006) procedure (open squares) and the Rachlin and Jones (2008b) procedure (filled circles). The dashed and solid lines are the best-fitting versions of Equation 2 for each of those experiments, respectively. From “Social Discounting and Delay Discounting,” by H. Rachlin and B. A. Jones, 2008, Journal of Behavioral Decision Making, 21, p. 32. Copyright 2008 by John Wiley & Sons. Adapted with permission.

In Rachlin and Jones’s (2008b) Experiment 1, the hypothetical alternatives were $75 for Person N and nothing for the participant versus $X for the participant and nothing for Person N (as in the preceding example). In Jones and Rachlin’s (2006) earlier experiment, the hypothetical alternatives were a shared $150 ($75 for the participant and $75 for Person N) versus $X (plus $75) for the participant and nothing for Person N. The points in Figure 20.3 are median amounts of money forgone in the Rachlin and Jones (2008b) experiment as functions of social distance. The solid line is Equation 2 fitted to those points. The dashed line is Equation 2 fitted in the same way to the data of the Jones and Rachlin (2006) experiment. The two social discount functions virtually overlap (ksocial ≈ 0.05 in both experiments). The congruence of the functions of Figure 20.3, obtained with differing procedures, indicates that hyperbolic social discounting, as with hyperbolic delay discounting, is a robust measure.

Social Distance and Physical Distance One obvious difference between delay discounting and social discounting is the difference in the kind 472

of scale used to measure delay and social distance. Delay is measured on a ratio scale that can be infinitely divided and with which all the operations of arithmetic apply; on a ratio scale, the difference between 1 and 5 equals the difference between 6 and 10. Social distance, however, is measured on an ordinal scale with which only the order of the numbers has any meaning. The person at N = 1 is closer to the participant than the person at N = 2, and the person at N = 2 is closer than the person at N = 3, but the difference in closeness between the first and second person may be twice as big as, half as big as, nearly equal to, a very tiny fraction of, or a gigantic multiple of the difference between the second and third person (Stevens, 1946). To get some idea of those differences, Rachlin and Jones (2009) asked 44 Stony Brook undergraduates as before to imagine they had made a list of the 100 people closest to them. They then told them to imagine standing on a vast field with those 100 people, each person at a physical distance proportional to their social distance. Then, for a series of social distances (1, 2, 5, 10, 20, 50, and 100), we asked them to tell us their physical distance from that person. The obtained relation between ordinal and physical social distances took the form of the power function F = 0.19N2.2, where F is distance in feet. The greater-than-1.0 exponent of this function indicates that as ordinal distance (N) increased, the physical difference between adjacent values of N increased proportionally. People seem to have tried, as it were, to push more distant people further away and keep closer people still closer. Solving the above equation for N, N = 3.3F0.45. Then, substituting in Equation 2, v=

V . 1 + k ' F 0.45

(3)

Equation 3 predicts that when social distance is measured by physical distance on a ratio scale, social discounting will be hyperbolic, but the social distance term will have a fractional exponent (as does the ratio-scale delay term in delay discounting). To test this prediction, Rachlin and Jones (2009) obtained social discount functions with a new group of 64 Stony Brook students. Again, instead of a series of ordinal distances, Jones and Rachlin (2007) asked students to imagine themselves on a vast field

Self-Control and Altruism

with all of their friends and relatives and, instead of obtaining crossover points at a series of N values, they obtained crossover points for individuals at the following series of distances on the field: 1 foot, 2 feet, 10 feet, 100 feet, 100 yards, 1 mile, and 10 miles. Figure 20.4 shows Equation 3 fit to the median crossover points with V fixed at 90 (and the exponent fixed at 0.45 as predicted from the prior experiment).6 The good fit of the line to the points in Figure 20.4 (R2 = .98) indicates that social discounting is not an artifact of an ordinal scale and that the social discounting measure is precise enough and broad enough to predict results in one set of conditions from results obtained in another set of conditions. In the next section, we further extend the predictive ability of the social discounting measure. On the basis of delay discounting and social discounting individually, results are predicted of a procedure that pits one kind of discounting against the other.

In the typical delay discounting procedure, the amount of an immediate monetary reward is titrated

$ Forgone to Give $75 to Person N

90 80 70 60 50 40 30 20 10 1

10

100

1000

vdelay = vsocial , V V , and = s 1 + kdelay D 1 + ksocial N D = cN 1/ s ,

Social and Delay Discounting

0

against a fixed amount of money at a series of delays. In the social discounting procedure, the amount of an immediate monetary reward is titrated against a fixed amount of money at a series of social distances. Rachlin and Jones (2008b) eliminated the immediate monetary reward and directly titrated the delay of a fixed amount of money ($75) against that same fixed amount ($75) at a series of social distances. The crossover point in this experiment represented the delay (D) that reduced the value of $75 as much as did a social distance of N. In other words, the delay obtained makes the value of a delayed reward equal to the value of a reward to another person at a given social distance. Setting these values equal to each other in Equations 1 and 2,

10000

Social Distance (feet)

Figure 20.4. Social discounting with ratio scale: The median amount of hypothetical money forgone to give $75 to another person at various social distances measured in feet. The curved line is the best-fitting version of Equation 3 with V fixed at 90 and s fixed at 0.45 (kdelay = 0.21, R2 = .98). Note the logarithmic x-axis.

(4)

where c = (ksocial / kdelay)1/s. The relation between the delay and the social distance at the crossover point in this experiment is predicted to be a power function. A power function plotted in log-log coordinates is a straight line with a slope equal to the exponent. Because the exponent s in delay discounting has been found to be less than 1.0, the slope of the function with an exponent of 1/s should be greater than 1.0. Figure 20.5 shows the median crossover point as a function of social distance plotted on log-log coordinates. The points are fitted very well by a straight line (r2 = .96) with a slope greater than 1.0 (1/s = 1.5) as predicted. The obtained value of s of 0.75 (1/1.5) is within the range of s values obtained for undergraduates and young adults in other delay discounting studies (Green, Myerson, & Ostaszewski, 1999; Myerson & Green, 1995). Again, Rachlin and Jones were able to predict parameters of behavior in one condition from behavior in other conditions, validating the social discounting measurement procedure.

Fixing V at 90 reflects the fact that many participants preferred to give $75 to people at very close social distances (N = 1 or 2) than to receive $75 themselves. This resulted in y-axis intercepts (at N = 0) greater than the nominal $75. See Rachlin and Jones (2008b) for a discussion of this phenomenon.

6

473

Locey, Jones, and Rachlin

10000

Delay (days)

1000

100

10

1 1 10 100 Social Distance (N)

Figure 20.5. Delay and social distance: The median delay at which $75 was equivalent in value to $75 given immediately to another person at a social distance of N. (Note log scale.) From “Social Discounting and Delay Discounting,” by H. Rachlin and B. A. Jones, 2008, Journal of Behavioral Decision Making, 21, p. 40. Copyright 2008 by John Wiley & Sons. Adapted with permission.

Social Discounting in Dictator, Ultimatum, and Public Goods Games Dictator games are very simple. Two people play. One player is given an endowment of a certain amount of money and must decide how much of it, if any, to give to the other player. The second player has no recourse if she or he feels she or he has been treated unfairly. Economic theory is clear on what the first player should do—he or she should give nothing. A gift of any amount above zero violates basic economic maximization principles (simply understood). Yet in this game, people (usually undergraduates) do give significant amounts of their initial endowment to other people. For example, with an endowment of $10, the mean amount given is roughly $2, and a significant number of players split the endowment 50:50 (Camerer, 2003). Such

giving is generally regarded as altruistic, but it is not out of line with the data of social discounting. Undergraduates typically place a random classmate at around the 75th person on their list (N = 75).7 Note in Figure 20.3 that the crossover point (the amount subjects were willing to forgo to give $75 to that person) was around $15, or 20% of the $75. Social discounting results are thus consistent with dictator game results (see also Chapter 19, this volume, for an alternative interpretation). The ultimatum game is similar to the dictator game except that in the ultimatum game, if the second person (the receiver of the money) refuses the money, neither giver nor receiver gets anything. Economic theory says that givers should give receivers some minimal amount (say $0.25 out of a $10 endowment) and that receivers should take it; otherwise, they get nothing. Yet, in these situations, undergraduates usually give about $4 of a $10 endowment to classmates. That is a wise decision; if givers give less than 40% of their endowment, receivers are likely to refuse the money (Camerer, 2003). The difference between the 40% to 50% of endowment typically given to classmates in ultimatum games and the 20% to 30% given in dictator games is easily explained as the result of the possibility that low donations will be punished in the former and the lack of such possibility in the latter. Rachlin and Jones (2009) found that as social distance increased, the amount given in both dictator and ultimatum games decreased hyperbolically—as Equation 2 predicts—but the slope of the ultimatum hyperbola was less than the slope of the dictator hyperbola; the amount given was greater for ultimatum games than for dictator games across all social distances tested. Receivers expressed willingness to accept lower amounts from those close to them than from those at greater social distances. For example, in an ultimatum game, your mother (presumably close to you) would give you a very high percentage of her endowment, but you would accept virtually any

Fifty Stony Brook undergraduates were given instructions similar to those given in Rachlin and Jones (2009)—asking them to imagine everyone they knew to be on a vast field in which physical distance was based on social distance. These students were then asked to assign a physical distance measure for people at various social distances (1, 10, 100) and for a randomly selected classmate. On average, a random classmate was placed at a distance corresponding to the ordinal social distance (an N value) of 75.

7

474

Self-Control and Altruism

amount she gave you, no matter how low. However, a distant cousin (presumably not close to you) would give you much less than your mother would, but you would quickly punish your cousin (at a cost to yourself) if he or she gave you too much less. This phenomenon, called altruistic punishment, is pervasive in ultimatum games. It occurs even when the giver is a perfect stranger with whom the receiver can expect to have no further contact (Camerer, 2003). What do receivers gain to offset their own loss from punishing givers for unfair distributions? The effect that such punishment may have on a giver’s behavior may benefit people who interact with him or her down the line, but how would it benefit a receiver who will likely never see the giver again? Behavioral theory, simply considered, seems to predict the same selfish behavior as does economic theory. Why choose no reward over a small one? Teleological behaviorism, however, looks beyond maximization of reward at a narrow time, place, and social space. According to teleological behaviorism, people may learn to choose long-term patterns of behavior as wholes and refuse to make decisions on a case-by-case basis, not just because it is more efficient to make decisions wholesale than retail, but because if people make decisions on a case-by-case basis, no matter how far into the future they integrate the particular delayed alternatives—even from now to the end of their lives—they may fail to exhibit the highest valued behavioral patterns. As we argued previously, a series of high-valued particular choices may result in a low-valued pattern of choices; conversely, a series of low-valued particular choices may result in a high-valued pattern. Thus, in individual cases, people will often behave altruistically (nonmaximally for them as individuals), but the long-term patterns of their behavior may be highly valued (not just for society, but for them as individuals). A common feature of particular acts of self-control and particular acts of altruism is that they are both low-valued components of high-valued abstract patterns (such as moderate eating and drinking or obeying the golden rule). Decision theorists (e.g., Fehr & Fischbacher, 2003) have explained particular altruistic acts by

givers in dictator games and by receivers in ultimatum games in terms of an inherited sense of fairness. It is as though each particular act is reinforced by so much money plus so much fairness. This, however, is a category mistake (Ryle, 1949)—as with comparing a collie with a dog or a dance step with a dance. It is certainly possible for people to find innate value in patterns of acts that are fair. We argue in the next section that such is the case. Individual acts, though, may be fair only provisionally. A 50:50 split might not be fair if the previous split was 100:0 either way. In general, fairness is a property of patterns of acts (of acts in their temporal as well as social context), not of individual acts. As in dictator and ultimatum games, players in public goods games receive an initial endowment and may contribute any part of it. In a public goods game, however, there are many players. The contribution goes into a common pool that is augmented proportionally to the total contribution and then distributed equally to all players. The contingencies in such games parallel those in everyday life of contributing to public radio or television, voting, not littering, recycling, and so forth. In a public goods game played by 98 Stony Brook undergraduates (Jones & Rachlin, 2009), each player received (a hypothetical) $100. Then (again hypothetically) a box was passed around the room, and the player could put any fraction of the $100 (from all to none) into the box. Then, as the instructions said, the experimenter would double the amount in the box and distribute it equally to all of the players regardless of how much money they initially contributed. The median amount students indicated that they would put into the box was $25, but a significant number of students contributed $50 or $100, and many indicated $0. The amount contributed may be seen to be a measure of degree of altruism—the same trait measured by social discount functions. Therefore, it should not be surprising that the amount contributed by a participant in the public goods game correlated significantly (and negatively) with the slope of that participant’s social discount function. The more generous a person was in the public goods game, the shallower that person’s social discount function tended to be. (However, there was no correlation between public goods game 475

Locey, Jones, and Rachlin

contribution and the slope of a person’s individual delay discount function.)8

Altruism and Self-Control in Evolution and Learning Fehr and Fischbacher (2003) defined altruism as “costly acts that confer economic benefits on other individuals” (p. 786). Given this definition, and evolutionary theory as it stands, it is hard to see how people might have inherited an altruistic tendency. People with a tendency to sacrifice their own fitness for that of others must by definition reduce their chances of survival, and hence their chances of reproduction, relative to selfish people. Altruists should thus die out. Some evolutionary biologists have argued, however, that although, in evolutionary history, altruists would indeed die out vis-à-vis selfish individuals, groups composed mostly of altruists (within which individuals cooperate with each other) would have a better chance of surviving than groups composed mostly of selfish people (just as, e.g., basketball teams composed of unselfish players would score more points, all else equal, than teams composed only of selfish players). A conflict thus arises between inherited behavioral patterns beneficial to the individual and inherited behavioral patterns beneficial to the group. The notion that patterns beneficial to the group may win out over patterns beneficial to the individual is called group selection. The crucial variable for group selection is rate of reproduction (or replacement). When the rate of replacement of individuals within groups is low relative to the rate of replacement of groups within the larger population, group selection has been shown to work in the evolution of altruism (Boyd, Gintis, Bowles, & Richerson, 2005). One may then ask, what is the inherited tendency in individuals that results in altruistic behavioral patterns? One possibility is innately shallow social discount functions. If discount functions were

the kind of things one could directly inherit, people born with shallow functions would just tend to be more altruistic than people born with steep functions. However, as we argued previously, this seems unlikely. Individual social discounting is not as rigid as such a tendency would imply. Social discounting varies with circumstances. A more likely candidate is a tendency to ignore the consequences of individual acts and to behave in accordance with the consequences of temporally extended patterns of acts. The innate behavior of organisms is often patterned (eating, sleeping, aggression, sex, etc.); these patterns can evolve over the lifetimes of individual organisms into highly complex forms in response to environmental contingencies (Rachlin, 1995b; Teitelbaum, 1977). Since Thorndike (1911), behavior has been noted to evolve over the course of individual lifetimes by a process akin to natural selection of species. Staddon and Simmelhag (1971) made a modern argument to this effect. Baum (1994) and Rachlin (2000) extended this argument to selection, within organisms’ lifetimes, of behavioral patterns (groups of acts). That is, organisms may learn to choose the highest valued pattern over the highest valued single alternative through a reinforcement process akin to group selection of organisms in evolutionary biology. Such learning would serve for self-control as we conceive it here as well as for altruism.

Limitations of the Analogy Between Self-Control and Altruism We have argued here that self-control may be viewed as a kind of altruism in which the place of other people is taken by the person at other times. If this was more than a common tendency to group behavior into patterns, there could exist a single fundamental discounting process governing both kinds of discounting, or one kind of discounting might be fundamental and the other derived from it.

In this experiment, both public goods and social discounting contributions were completely anonymous. Participants made their choices together in a large classroom and were identified only by number. However, one may question whether participants in this and other social discounting experiments understood that their contributions would be anonymous as regards the receiver. Some unpublished data from our laboratory indicates that they did understand this (Locey & Rachlin, 2008). Social discounting was measured in three ways: (a) the standard way described previously, (b) with instructions indicating explicitly that contributions would be anonymous, and (c) with instructions asking participants to imagine that the receiver was looking over their shoulder while they made their choices. As expected, participants were significantly more generous when they imagined they were observed than when their choices were anonymous. Contributions of the standard group were also significantly less than those of the observed group and did not differ from those of the anonymous group. This evidence shows that participants in standard social discounting experiments assume that receivers will not know who they are.

8

476

Self-Control and Altruism

1

SOCIAL

0.01

DELAY

k-value

0.1

0.001

low

medium

high

Magnitude

Figure 20.6. Magnitude effects: The medians of the best-fitting k values of individual participant discount functions at each of three magnitudes ($15, $150, and $150,000 for kdelay in Equation 1 and $7.50, $75, and $75,000 for ksocial in Equation 2). From “Social Discounting and Delay Discounting,” by H. Rachlin and B. A. Jones, 2008, Journal of Behavioral Decision Making, 21, p. 37. Copyright 2008 by John Wiley & Sons. Adapted with permission.

Some lines of evidence, which we summarize in this section, have shown that the two kinds of discounting, although both hyperbolic, measure the extent of different types of behavioral patterns. First, the steepness of both delay and social discounting varies with the magnitude of the undiscounted amount (V in Equations 1 and 2). As indicated previously, Raineri and Rachlin (1993) found that smaller undiscounted amounts of money and other commodities were discounted more steeply by delay than were larger amounts. The exact opposite is the case for social discounting. Rachlin and Jones (2008b) found steeper social discounting with larger rewards than with smaller. Figure 20.6 shows median k values for delay and social discounting as a function of amount (V). Note that the lines go in opposite directions.9 Second, there is a difference of scale in the degree to which people cooperate with themselves at later times and the degree to which they cooperate with others. The unitary exponent s in Equation 2 (i.e., if

Equation 2 had a parameter for sensitivity to social distance that corresponded to the delay sensitivity parameter, s, in Equation 1) means that (beyond a certain point) doubling social distance (N) halves vsocial, but because of the fractional s in Equation 1, doubling delay has less of an effect on vdelay. This would seem to indicate a greater sensitivity, on average, to social distance than to delay. As we said previously, though, arithmetic operations such as multiplication and division of an ordinal scale have little meaning. Delays, whether measured on ordinal or ratio scales, are incommensurate with social distances. Comparing the two types of discounting directly is therefore not possible. We can compare the two forms of discounting indirectly, however, in terms of self-control and social cooperation. Brown and Rachlin (1999) devised a repeated Prisoner’s Dilemma–type game that could be played in two ways, alone or together (i.e., by one person or two people). In Prisoner’s Dilemma games, a player may cooperate or defect. In a two-player game, if both players cooperate, both gain moderately high rewards; if both players defect, both gain moderately low rewards; if one cooperates and one defects, the defector gains a very high reward and the cooperator gains a very low reward. The best pattern of behavior for both players over a series of Prisoner’s Dilemma trials is to always cooperate and always earn the moderately high reward. However, because it is to the interest of each player that the other player cooperates, each player may punish the other player’s defection by defecting him- or herself (as with Punch and Judy, an arms race, or a rocket exchange between Israel and Hamas). In that case, both players will end up repeatedly defecting and earning the moderately low reward. Students in the alone version of the Brown and Rachlin (1999) game played a Prisoner’s Dilemma game against a tit-for-tat strategy. A tit-for-tat strategy simply and mechanically reflects a player’s choice on the previous trial; if the player cooperates on trial n, the other player will cooperate on trial n + 1; if the player defects on trial n, the other player

The same holds for probability discounting vis-à-vis delay discounting. That is, the effect of amount on the steepness of probability discounting, as with that of social discounting, goes in the reverse direction of the effect of amount on steepness of delay discounting (Green & Myerson, 2004).

9

477

Locey, Jones, and Rachlin

478

themselves at later times overlap more with their present interests than with those of even their closest friends or relatives. Rachlin and Jones’s (2008a) undergraduate students’ interests and those of their later selves would certainly fall out of congruence over time, but it would take a very long time for them to fall as far out of congruence as they must be with those of a random classmate. Thus, it is far easier to cooperate with oneself (even at different times) than it is to cooperate with another person. Finally, in several of the experiments we have described, both social and delay discount functions were obtained for all subjects. One might expect individuals with steep delay discount functions to also have steep social discount functions. To a certain extent, this was the case. Figure 20.7 plots the correlation (on log-log axes) between ksocial and kdelay for 33 students (Rachlin & Jones, 2008b). Steepness of social discounting correlated significantly (but barely) with steepness of delay discounting (r = .35). In hindsight, the weakness of the correlation should have been expected. Delay discounting measures self-control; social discounting measures social cooperation. In real life, these two traits often do not go together. (Some people are Scrooge-like and abstemious and tend to save their money; they 1

ksocial

will defect on trial n + 1. When playing against titfor-tat, players are essentially playing against themselves on the next trial (their next-trial self). Playing a Prisoner’s Dilemma game against tit-for-tat is thus a self-control problem; the best choice on the present trial is to defect, but the best choice over a series of trials is to cooperate. Students playing the alone version of the game eventually learned to cooperate most of the time. In the crucial condition of the experiment, students first played in the alone condition for a number of trials, eventually learning to cooperate most of the time versus their future selves. Then Brown and Rachlin formed pairs of students (each of whom had been cooperating in the alone condition) and paired them in a together (twoperson) game. They almost immediately began defecting against each other at a high rate. In other words, although students could learn to cooperate with their future (next-trial) selves, such learning did not transfer to cooperation with other people. The self-control learned in this situation did not transfer to social cooperation. In retrospect, such behavior makes sense. Hamilton’s (1964a, 1964b) kin selection theory (1964) predicts that altruism will be greater with greater genetic overlap (degree of kinship) between giver and receiver. Except for clones or people who are severely inbred, people’s genetic overlap with themselves is at least twice as great as their genetic overlap with any other person. Kinship correlates strongly with social distance. The greater the genetic overlap with someone else is, the closer the person should feel to the other person (all else being equal), and the more altruistic the person should be toward him or her. Rachlin and Jones (2008a) found that steepness of social discounting varied directly with genetic distance; social discount functions with nonrelatives as receivers were steeper than social discount functions with relatives as receivers. Students were willing to forgo significantly more money for the benefit of relatives than for the benefit of nonrelatives, even at the same social distance. People’s high generosity to themselves (100% genetic overlap) relative to their generosity to other people (50% or less genetic overlap) might thus have been expected. Even ignoring genetic tendencies, it seems safe to assume that people’s common interests with

0.1

0.01

0.001

0.01 kdelay

0.1

Figure 20.7. Delay and social discounting: ksocial as a function of kdelay for each of 33 participants. The solid line is the best-fit straight line using simple linear regression.

Self-Control and Altruism

are generous to their future selves but not to others. Some people are generous spendthrifts—buying drinks for everyone at the bar, say, but careless about their own future welfare.) Even within the areas of self-control and social cooperativeness, subpatterns do not completely overlap. Not all people with alcoholism are addicted to drugs or bad at saving money or gamble recklessly. As Mischel (2004) has pointed out, a person may show a high degree of self-control in one context and be impulsive in another. Similarly, a person may act altruistically in one social setting and selfishly in another. Behavioral patterns of self-control and altruism may be learned and innate to different degrees. The contexts and contingencies of the evolutionary processes underlying both largely learned and largely innate patterns may differ within as well as between categories. Different aspects of the environment gain control over behavior at different times. Nevertheless, steepness of delay discounting does indeed correlate with many kinds of dysfunctional personal behavior (see Volume 2, Chapters 7 and 8, this handbook), and we expect steepness of social discounting to correlate with many kinds of dysfunctional social behavior. Conclusion Another title for this chapter might have been “The Nature and Uses of Delay and Social Discount Functions.” The empirical work examined indicates that delay and social discount functions are highly useful in predicting self-control and altruism in the laboratory and (for self-control, at least) in everyday life. However, this success should not lead researchers to enshrine discount functions as entities in themselves, having an existence as such either in organisms’ nervous systems or in internal cognitive structures. Rather, delay and social discount functions are convenient measures of the degree to which patterns of overt behavior extend in time or in social space. They enable prediction of behavior in situations that may differ from those in which they were originally measured. However, such predictability has limits from one situation to another both within the realms of self-control and altruism as well as between those realms. Exploration of

these applications and limitations is a direction for future work.

References Ainslie, G. (1992). Picoeconomics: The strategic interaction of successive motivational states within the person. Cambridge, England: Cambridge University Press. Ainslie, G. (2001). Breakdown of will. Cambridge, England: Cambridge University Press. Ainslie, G., & Monterosso, J. (2003). Building blocks of self-control: Increased tolerance of delay with bundled rewards. Journal of the Experimental Analysis of Behavior, 79, 37–48. doi:10.1901/jeab.2003.79-37 Alessi, S. M., & Petry, N. M. (2003). Pathological gambling severity is associated with impulsivity in a delay discounting procedure. Behavioural Processes, 64, 345–354. doi:10.1016/S0376-6357(03)00150-5 Baker, F., Johnson, M. W., & Bickel, W. K. (2003). Delay discounting in current and never-before cigarette smokers: Similarities and differences across commodity, sign, and magnitude. Journal of Abnormal Psychology, 112, 382–392. doi:10.1037/0021-843X.112.3.382 Baum, W. B. (1994). Understanding behaviorism: Science, behavior, and culture. New York, NY: HarperCollins. Becker, G. S., & Murphy, K. M. (1988). A theory of rational addiction. Journal of Political Economy, 96, 675–700. doi:10.1086/261558 Bickel, W. K., Odum, A. L., & Madden, G. J. (1999). Impulsivity and cigarette smoking: Delay discounting in current, never, and ex-smokers. Psychopharmacology, 146, 447–454. doi:10.1007/ PL00005490 Boyd, R., Gintis, H., Bowles, S., & Richerson, P. (2005). The evolution of altruistic punishment. In H. Gintis, S. Bowles, R. Boyd, & E. Fehr (Eds.), Moral sentiments and material interests: The foundations of cooperative and economic life (pp. 215–228). Cambridge, MA: MIT Press. Brown, J., & Rachlin, H. (1999). Self-control and social cooperation. Behavioural Processes, 47, 65–72. doi:10.1016/S0376-6357(99)00054-6 Camerer, C. F. (2003). Behavioral game theory: Experiments in strategic interaction. New York, NY: Russell Sage Foundation. Cavett, D. (2009, May 31). Week in review. New York Times, p. 10. Chapman, G. B., & Elstein, A. S. (1995). Valuing the future: Temporal discounting of health and money. Medical Decision Making, 15, 373–386. doi:10.1177/0 272989X9501500408 Coffey, S. F., Gudleski, G. D., Saladin, M. E., & Brady, K. T. (2003). Impulsivity and rapid discounting 479

Locey, Jones, and Rachlin

of delayed hypothetical rewards in cocainedependent individuals. Experimental and Clinical Psychopharmacology, 11, 18–25. doi:10.1037/ 1064-1297.11.1.18 Fehr, E., & Fischbacher, U. (2003). The nature of human altruism. Nature, 425, 785–791. doi:10.1038/ nature02043 Forzano, L. B., & Logue, A. W. (1994). Self-control in adult humans: Comparison of qualitatively different reinforcers. Learning and Motivation, 25, 65–82. doi:10.1006/lmot.1994.1004 Green, L., Fry, A. F., & Myerson, J. (1994). Discounting of delayed rewards: A life-span comparison. Psychological Science, 5, 33–36. doi:10.1111/ j.1467-9280.1994.tb00610.x Green, L., & Myerson, J. (2004). A discounting framework for choice with delayed and probabilistic rewards. Psychological Bulletin, 130, 769–792. doi:10.1037/0033-2909.130.5.769 Green, L., Myerson, J., Holt, D. D., Slevin, J. R., & Estle, S. J. (2004). Discounting of delayed food rewards in pigeons and rats: Is there a magnitude effect? Journal of the Experimental Analysis of Behavior, 81, 39–50. doi:10.1901/jeab.2004.81-39 Green, L., Myerson, J., & McFadden, E. (1997). Rate of temporal discounting decreases with amount of reward. Memory and Cognition, 25, 715–723. doi:10.3758/BF03211314 Green, L., Myerson, J., & Ostaszewski, P. (1999). Discounting of delayed rewards across the life span: Age differences in individual discount functions. Behavioural Processes, 46, 89–96. doi:10.1016/ S0376-6357(99)00021-2 Green, L., & Rachlin, H. (1996). Commitment using punishment. Journal of the Experimental Analysis of Behavior, 65, 593–601. doi:10.1901/jeab.1996.65-593 Hamilton, W. D. (1964a). The genetical evolution of social behaviour. I. Journal of Theoretical Biology, 7, 1–16. doi:10.1016/0022-5193(64)90038-4 Hamilton, W. D. (1964b). The genetical evolution of social behaviour. II. Journal of Theoretical Biology, 7, 17–52. doi:10.1016/0022-5193(64)90039-6 Jones, B. A., & Rachlin, H. (2006). Social discounting. Psychological Science, 17, 283–286. doi:10.1111/ j.1467-9280.2006.01699.x Jones, B. A., & Rachlin, H. (2007). [A ratio scale for social discounting]. Unpublished raw data. Jones, B. A., & Rachlin, H. (2009). Delay, probability, and social discounting in a public goods game. Journal of the Experimental Analysis of Behavior, 91, 61–73. doi:10.1901/jeab.2009.91-61 Kagel, J., Battalio, R., & Green, L. (1995). Economic choice theory: An experimental analysis of animal 480

behavior. Cambridge, England: Cambridge University Press. doi:10.1017/CBO9780511664854 Kahneman, D. (1999). Objective happiness. In D. Kahneman, E. Diener, & N. Schwarz (Eds.), Wellbeing: The foundations of hedonic psychology (pp. 3–25). New York, NY: Russell Sage Foundation. Kirby, K. N. (1997). Bidding on the future: Evidence against normative discounting of delayed rewards. Journal of Experimental Psychology: General, 126, 54–70. doi:10.1037/0096-3445.126.1.54 Kirby, K. N., & Guastello, B. (2001). Making choices in anticipation of future choices can increase selfcontrol. Journal of Experimental Psychology: Applied, 7, 154–164. doi:10.1037/1076-898X.7.2.154 Kirby, K. N., Petry, N. M., & Bickel, W. K. (1999). Heroin addicts have higher discount rates for delayed rewards than non-drug-using controls. Journal of Experimental Psychology: General, 128, 78–87. doi:10.1037/0096-3445.128.1.78 Koffka, K. (1935). Principles of Gestalt psychology. New York, NY: Harcourt-Brace. Kudadjie-Gyamfi, E., & Rachlin, H. (1996). Temporal patterning in choice among delayed outcomes. Organizational Behavior and Human Decision Processes, 65, 61–67. doi:10.1006/obhd.1996.0005 Locey, M. L., Jones, B. A., & Rachlin, H. (2011). Real and hypothetical rewards in self-control and social discounting. Judgment and Decision Making, 6, 552–564. Locey, M. L., & Rachlin, H. (2008). [Altruism and anonymity]. Unpublished raw data. Loewenstein, G., & Prelec, D. (1992). Anomalies in intertemporal choice: Evidence and an interpretation. In G. Loewenstein & J. Elster (Eds.), Choice over time (pp. 119–146). New York, NY: Russell Sage Foundation. Madden, G. J., Begotka, A., Raiff, B. R., & Kastern, L. (2003). Delay discounting of real and hypothetical rewards. Experimental and Clinical Psychopharmacology, 11, 139–145. doi:10.1037/1064-1297.11.2.139 Madden, G. J., & Bickel, W. K. (Eds.). (2009). Impulsivity: The behavioral and neurological science of discounting. Washington, DC: American Psychological Association. Madden, G. J., Bickel, W. K., & Jacobs, E. A. (1999). Discounting of delayed rewards in opioid-dependent outpatients: Exponential or hyperbolic discounting functions. Experimental and Clinical Psychopharmacology, 7, 284–293. doi:10.1037/1064-1297.7.3.284 Madden, G. J., Petry, N. M., Badger, G. J., & Bickel, W. K. (1997). Impulsive and self-control choices in opioiddependent patients and non-drug-using control participants: Drug and monetary rewards. Experimental and Clinical Psychopharmacology, 5, 256–262. doi:10.1037/1064-1297.5.3.256

Self-Control and Altruism

Mazur, J. E. (1986). Choice between single and multiple delayed reinforcers. Journal of the Experimental Analysis of Behavior, 46, 67–77. doi:10.1901/ jeab.1986.46-67 Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. E. Mazur, J. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior: The effects of delay and of intervening events on reinforcement value (pp. 55–73). Hillsdale, NJ: Erlbaum. Mischel, W. (2004). Toward an integrative science of the person. Annual Review of Psychology, 55, 1–22. doi:10.1146/annurev.psych.55.042902.130709 Mitchell, S. H. (1999). Measures of impulsivity in cigarette smokers and non-smokers. Psychopharmacology, 146, 455–464. doi:10.1007/PL00005491 Myerson, J., & Green, L. (1995). Discounting of delayed rewards: Models of individual choice. Journal of the Experimental Analysis of Behavior, 64, 263–276. doi:10.1901/jeab.1995.64-263 Ostaszewski, P., Green, L., & Myerson, J. (1998). Effects of inflation on the subjective value of delayed and probabilistic rewards. Psychonomic Bulletin and Review, 5, 324–333. doi:10.3758/BF03212959 Parfit, D. (1984). Reasons and persons. Oxford, England: Oxford University Press. Petry, N. M. (2001). Delay discounting of money and alcohol in actively using alcoholics, currently abstinent alcoholics, and controls. Psychopharmacology, 154, 243–250. doi:10.1007/s002130000638 Rachlin, H. (1989). Judgement, decision, and choice: A cognitive/behavioral synthesis. New York, NY: Freeman. Rachlin, H. (1992). Teleological behaviorism. American Psychologist, 47, 1371–1382. doi:10.1037/0003066X.47.11.1371 Rachlin, H. (1994). Behavior and mind: The roots of modern psychology. New York, NY: Oxford University Press. Rachlin, H. (1995a). Self control: Beyond commitment. Behavioral and Brain Sciences, 18, 109–159. doi:10.1017/S0140525X00037602 Rachlin, H. (1995b). The value of temporal patterns in behavior. Current Directions in Psychological Science, 4, 188–192. doi:10.1111/1467-8721.ep10772634 Rachlin, H. (2000). The science of self-control. Cambridge, MA: Harvard University Press. Rachlin, H. (2006). Notes on discounting. Journal of the Experimental Analysis of Behavior, 85, 425–435. doi:10.1901/jeab.2006.85-05 Rachlin, H., & Green, L. (1972). Commitment, choice and self-control. Journal of the Experimental Analysis of Behavior, 17, 15–22. doi:10.1901/jeab.1972.17-15

Rachlin, H., & Jones, B. A. (2008a). Altruism among relatives and non-relatives. Behavioural Processes, 79, 120–123. doi:10.1016/j.beproc.2008.06.002 Rachlin, H., & Jones, B. A. (2008b). Social discounting and delay discounting. Journal of Behavioral Decision Making, 21, 29–43. doi:10.1002/bdm.567 Rachlin, H., & Jones, B. A. (2009). The extended self. In G. J. Madden & W. K. Bickel (Eds.), Impulsivity: The behavioral and neurological science of discounting (pp. 411–431). Washington, DC: American Psychological Association. Rachlin, H., Raineri, A., & Cross, D. (1991). Subjective probability and delay. Journal of the Experimental Analysis of Behavior, 55, 233–244. doi:10.1901/ jeab.1991.55-233 Raineri, A., & Rachlin, H. (1993). The effect of temporal constraints on the value of money and other commodities. Journal of Behavioral Decision Making, 6, 77–94. doi:10.1002/bdm.3960060202 Ryle, G. (1949). The concept of mind. Chicago, IL: University of Chicago Press. Schmitt, D. R. (1998). Social behavior. In K. A. Lattal & M. Perone (Eds.), Handbook of research methods in human operant behavior (pp. 471–508). New York, NY: Plenum Press. Siegel, E., & Rachlin, H. (1995). Soft commitment: Selfcontrol achieved by response persistence. Journal of the Experimental Analysis of Behavior, 64, 117–128. doi:10.1901/jeab.1995.64-117 Simon, J. L. (1995). Interpersonal allocation continuous with intertemporal allocation: Binding commitments, pledges, and bequests. Rationality and Society, 7, 367–392. doi:10.1177/104346319500700402 Staddon, J. E. R., & Simmelhag, V. L. (1971). The “superstition” experiment: A reexamination of its implications for the principles of adaptive behavior. Psychological Review, 78, 3–43. doi:10.1037/h0030305 Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677–680. doi:10.1126/science.103.2684.677 Teitelbaum, P. (1977). Levels of integration of the operant. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 125–152). Englewood Cliffs, NJ: Prentice Hall. Thorndike, E. L. (1911). Animal intelligence. New York, NY: Macmillan. Trope, Y., & Liberman, N. (2003). Temporal construal. Psychological Review, 110, 403–421. doi:10.1037/0033-295X.110.3.403 Vuchinich, R. E., & Simpson, C. A. (1998). Hyperbolic temporal discounting in social drinkers and problem drinkers. Experimental and Clinical Psychopharmacology, 6, 292–305. doi:10.1037/ 1064-1297.6.3.292 481

Chapter 21

Behavior in Relation to Aversive Events: Punishment and Negative Reinforcement Philip N. Hineline and Jesús Rosales-Ruiz

We are concerned in this chapter with some topics that generate ambivalence within society at large as well as within the community of behavior analysts— ambivalence that arises partly from multiple meanings of terms addressing those topics and partly from confusion between the study of a process and the advocacy of procedures and practices based on it. When such advocacy does occur, the ambivalence sometimes appears to arise through disregard of what has been learned from study of the relevant process. Ambivalence concerning research on these topics may arise from failure to recognize that these processes often occur outside the laboratory, for better or for worse, and without having been explicitly arranged and thus still require scientific understanding. Consider the word punishment, which identifies a topic we address in detail later. In behavior analysis, this term has specific technical meanings: A punishment procedure is an arrangement whereby some specified behavior produces a consequence, resulting in a subsequent decrease in the occurrence of that behavior. The behavioral process of punishment is a temporally extended sequence that includes the occurrence of some particular behavior, a consequence that the behavior produces, and a subsequent decrease in that behavior. The process can occur whether the behavior–consequence relation has been arranged by design (a procedure), whether it is the product of unwitting interaction between people, or whether it simply arises in the way the world works. These technical specifications correspond to only one of the ordinary-language meanings of the

term: “Those football linemen are taking intense punishment, but they keep on coming” equates punishment with physical trauma, irrespective of its effect on behavior; it does not satisfy the behavioral definition. More problematic is the ordinary-language meaning that equates punishment with retribution, vengeance, or justice: If a person commits a crime, is apprehended, and then serves out a designated term of prison or community service, the person may be said to have “repaid a debt to society.” This usage suggests retribution, but it says nothing about what the person is likely to do next and is thus tangential to punishment as behavioral procedure or process. Nevertheless, legally imposed consequences of malfeasance are characterized forthrightly as punishment, presumably to decrease that malfeasance. Meanwhile, as a cultural practice people are discouraged from taking justice into their own hands, and yet some years ago, Governor Dukakis’s presidential campaign suffered serious damage when, on being asked how he would react in the hypothetical situation of someone assaulting his wife, he declined to make vengeful assertions. By implication, most citizens prefer a leader who would be inclined toward personal vengeance in at least some circumstances. Thus, no matter how carefully one writes or speaks of punishment as defined in behavior analysis, one cannot prevent people from reading or hearing it in other ways. Behavior analysts as well as general audiences have histories of listening, speaking, and writing as members of the community at large, and thus we find muddled disagreements and discussions concerning the term punishment.

DOI: 10.1037/13937-021 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

483

Hineline and Rosales-Ruiz

Similarly, the term aversive is a potential red flag; in the vernacular, it is understood as including stimuli or situations that are experienced as painful or unpleasant. Conceptualized behaviorally, aversive events or situations are identified as aversive if behavior that produces or increases them decreases or ceases (punishment) or if behavior that removes or prevents them is maintained or increases (negative reinforcement). The two types of effect are not exactly consistent when intensity of the stimuli is varied, however, so a particular stimulus could be functionally aversive by one definition and not by the other (Azrin, Hake, Holz, & Hutchinson, 1965, discussed in the section Change of Situation in the Escape Paradigm later in this chapter). The use of aversive events is ubiquitous in human societies and in nature, but explicitly advocating that use can be controversial (e.g. Salzinger, 1991; Starin, 1991; Yates, 1991). Still, the controversy does not apply to all aversive procedures. Time-out procedures and prison sentences are widely accepted as forms of punishment, on the assumption that they are aversive consequences. Other aversive procedures are accepted or tolerated as normal practices at home, school, and work— temporary restraint, yanking a dog on a leash, penalties in football games, and until recent decades, even spanking have been generally noncontroversial aversive stimuli. For some people, of course, even most of those procedures are unacceptable. Finally (for the moment), negative is another loaded term that identifies two of our topics. Defined technically within behavior analysis, the term is algebraic in character, referring to events or situations that are removed or reduced, the removal or reduction being a consequence of specified behavior. We must emphasize that negative does not characterize the behavior itself. If the specified behavior subsequently increases, we are concerned with negative reinforcement, further subdivided as escape or avoidance. If the behavior subsequently decreases, we are concerned with negative punishment, sometimes identified as penalty, response cost, or time out (although time out can also function as negative reinforcement or even as positive reinforcement, depending on surrounding circumstances). These definitions of negative are, of course, each 484

contrasted with a positive alternative, whereby the consequence—whether reinforcing or punishing— entails the addition, appearance, or increase in some event or stimulus as a consequence of specified behavior. There is one contrasting technical usage, whereby in distinguishing between stimuli that are simultaneously present, a stimulus correlated with reinforcement (whether positive or negative) is deemed the positive stimulus and a stimulus correlated with nonreinforcement is deemed the negative stimulus. In ordinary language, positive and negative have more problematic and ill-defined connotations in which negative is roughly equated with undesirable or problematic; thus, one finds references to positive and negative behaviors and positive and negative experiences. The conceptual confusion is exacerbated by those vernacular usages being adopted apparently for marketing purposes, as in positive psychology (e.g., Seligman & Csikszentmihalyi, 2000) and positive behavior (or behavioral) support (e.g., Kern, Koegel, & Dunlap, 1996). Anyone teaching behavioral concepts at the introductory level has encountered the degree to which these vernacular intrusions persist: Despite having been introduced to the fact that much of criminal behavior is maintained by positive reinforcement and that negative reinforcement is the defining contingency of preventive medicine, many students continue to imply in their examination answers that positive is good and negative is bad. In addition to the resulting conceptual incoherence, this confusion obscures the fact that negative reinforcement can be fundamental to generating and maintaining behavior that people value highly. More generally, it should be evident in what follows that the susceptibility of people’s behavior to both punishment and negative reinforcement is an important aspect of their adaptability as a species, as individuals, and as a culture. Precis We begin by discussing punishment, both positive and negative, with a brief sketch of its historical background, leading to a systematic review of the features known to determine its effectiveness. We have selected examples with a translational

Behavior in Relation to Aversive Events

emphasis, illustrating relationships between the basic research that yielded a systematic categorization of the parameters of punishment and the applied research that has resulted. We begin our consideration of negative reinforcement with the escape–avoidance distinction and then emphasize some classic experiments introducing a rubric that articulates relationships extending well beyond the purview of traditional avoidance theory. We draw linkages between phenomena of negative and positive reinforcement (intermittent reinforcement, matching law, delay discounting) while also addressing the practical concerns of translational research. Punishment Punishment as a behavioral process is generally defined as an environmental change that produces a subsequent decrease in the occurrence of behavior that produced that change. The environmental change, or punishing consequence, is called a punisher and assumes one of two forms: (a) the presentation of a stimulus or situation (positive punishment) or (b) the removal of a stimulus or situation (negative punishment). In the case of stimulus presentation, for example, suppose a child’s receiving ripe cheese for dessert (considered a delicacy by the parents) is made contingent on good manners at the table. If good manners at the table decrease, the contingency would be considered a case of positive punishment and the dessert would be considered a punisher. In contrast, if good manners increase, then a positive reinforcement contingency would be indicated. In the case of stimulus removal, suppose the dessert is removed contingent on the child’s hitting her sister. If hitting decreases, the contingency would be considered a case of negative punishment. By contrast, if hitting increases, the removal of dessert would be considered either a negative reinforcement contingency (removal or prevention of an aversive stimulus) or perhaps extinction-induced aggression (a result of discontinuing positive reinforcement). A punisher, like a reinforcer, is defined functionally—in terms of a contingency and its subsequent effects on behavior.

Historical Background Early conceptions of punishment as a process symmetrical but opposite to reinforcement were formulated but later rejected by Thorndike (1932), who asserted that reinforcers (so-called “satisfiers”) strengthened behavior, but punishers (“annoyers”) did not weaken behavior. He said, “Annoyers do not act on learning” (p. 46). Skinner (1953) also minimized the role of punishment in learning. He said, “The effect of punishment was a temporary suppression of the behavior, not a reduction in the total number of responses” (p. 184). For Skinner, punishment seemed to displace behavior rather than weaken behavior. More important, however, punishment was seen as establishing the motivational operations necessary for negative reinforcement to work (somewhat akin to food deprivation in relation to positive reinforcement). If the behavior decreased, it was because other behavior increased through negative reinforcement. Thus, the effects of punishment on behavior were considered indirect and temporary, to be explained by the eliciting and motivational effects of stimuli. Expressing a similar view, Bolles (1967) concluded that it does not seem possible to weaken the punished behavior itself, nor to produce by punishment competing behavior other than the particular response it elicits. Therefore punishment is not effective in altering behavior unless the reaction to punishment itself competes with the response we wish to punish. (p. 433) Similarly, Estes (1969) proposed that the competition does not occur at the response level, but that shocks create a motivational operation that competes with the motivation of the punished behavior. For these authors, then, the reduction in behavior could be accounted for by the eliciting and motivational operations; they saw no need for punishment as a behavioral process separate from positive and negative reinforcement. That conception has not stood over time, however. On the basis of several converging lines of evidence (e.g., Azrin, 1956; Camp, Raymond, & Church, 1967), punishment came to be recognized 485

Hineline and Rosales-Ruiz

as a process analogous but opposite to that of reinforcement (see Baron, 1991; Fantino, 1973; Hineline, 1984a). Nevertheless, besides the direct punishing effect of reducing behavior directly, a punishing stimulus might have additional effects: Its removal might negatively reinforce behavior, and its presentation might elicit aggression. Further complicating matters, if punishment is to be at issue, some process or processes must be maintaining the to-be-punished behavior. Behavior could be maintained by positive or negative reinforcement, elicited by a stimulus, or could be a first occurrence that came about through some environmental change. The effects of a single punishment contingency could differ as a result of these surrounding circumstances. For the operative characteristics of punishment procedures to be discerned, the to-be-punished behavior must first be stably maintained under wellspecified conditions, with the punishing stimulus carefully calibrated and controlled. These conditions were first achieved in an experiment by Azrin (1956) that helped initiate an extensive research program summarized by Azrin and Holz (1966). This line of research is impressive at many levels and stands as one of the most significant contributions to the field of behavior analysis. It not only gave clarity and direction to the study of punishment, it also provided a framework for the study of punishment and paved the way for an informed technology of behavior reduction. Theoretical, experimental, and applied treatments (including the present one) continue to borrow from the Azrin and Holz analysis (e.g., Axelrod & Apsche, 1983; Baron, 1991; Crosbie, 1998; Fantino, 1973; Johnston, 1972; Matson & DiLorenzo, 1984). More research is surely needed (cf. Lerman & Vorndran, 2002; Perone, Galizio, & Baron, 1988), but it is also clear that much is already known about punishment as a distinct behavioral process.

Punishment Procedures In basic research, a variety of punishment procedures have been used with humans, including the use of loud noise, increased response effort, point loss, and time out from positive reinforcement as well as contingent electric shock. The effects have 486

been consistent with the effects seen with other animals when electric shock is used (see Crosbie, 1998). In applied behavior analysis, focusing on socially significant behavior problems, a more varied set of operations has been implemented. Positive punishers (which entail response-contingent stimulus presentation) rarely involve electric shock; they include presentation of nonharmful noxious substances, intense or distracting noise, response effort (overcorrection: restitutional and positive practice), blocking, physical restraint, reprimands, slaps, hair pulls, shaking, and even tickling. Negative punishers (which entail stimulus removal) entail a variety of procedures that can be grouped into two major categories: response cost and time out from positive reinforcement. Response cost consists of arrangements such as loss of tokens, money, privileges, free time, bonuses, and the like and terms such as fines and penalties. Time out consists of the contingent presentation of a brief extinction period. In applied behavior analysis, time out has been divided into two types: exclusionary time out and nonexclusionary time out, both of which entail the removal of access to positive reinforcers that would otherwise be available. The nonexclusionary type of time out consists of planned ignoring, contingent observation, and a time-out ribbon. Exclusionary time out entails special arrangements: a partition, a separate room, or a hallway. Clearly, the list and variety of specific punishment procedures is long. Commenting on the number and variety of procedures, Kazdin (2003) noted the irony of this, because in many venues punishment has been relegated to a category of procedures that should not be used at all or, if used, should be used only as a supplement to reinforcement programs. Even a cursory review covering all of these procedures would be beyond the scope of this chapter. Descriptions and evaluations of these procedures, addressed to an equally large variety of target behaviors—ranging from getting out of one’s seat during school to life-threatening behaviors such as chronic infant rumination—can be found in textbooks such as Applied Behavior Analysis by Cooper, Heron, and Heward (2007) and Behavior Modification in Applied Settings by Kazdin (2003). Two other books exclusively dedicated to the topic of

Behavior in Relation to Aversive Events

punishment are The Effects of Punishment on Human Behavior, edited by Axerold and Apsche (1983), and Punishment and Its Alternatives, by Matson and DiLorenzo (1984).

Parameters Determining the Effects of Punishment on Behavior By definition, punishment, as with reinforcement, always works, for both are functional concepts, defined in terms of how they change behavior. However, their effects on behavior are not fixed: Research has shown that punishment effects depend on the characteristics of the punisher, the immediacy, the program, the maintaining variables of the punished behavior, availability of alternative behavior, and alternative sources of reinforcement. The decreases in behavior are modulated by all of those factors, and punishment effects can be reliably predicted if one knows those conditions, which are especially important for ethical implementation of a behavioral technology. Azrin and Holz (1966) in their classic review listed 14 variables that modify the effectiveness of punishment. These variables have provided the basis for continued experimental analysis and have guided the development of behavior intervention programs to reduce unwanted behavior. Generally speaking, punishment applications should work fairly quickly. If they do not, one or more of the 14 variables needs adjustment. The adjustments can be grouped as having to do with the administration of the punisher, with the characteristics of the to-be-punished behavior, with the availability of alternative reinforced responses, and with stimulus control. Administration of punishment. To be effective, a punisher should be delivered immediately, consistently, suddenly, at maximum value, briefly, and on every occurrence of the targeted behavior. Faults in any of those variables will affect the degree of behavior reduction. We should note that these variables are not unique to punishment: Immediacy, potency, and consistency are also relevant to positive reinforcement in the acquisition or enhancement of behavior. Azrin and Holz (1966) suggested that punishment and reinforcement are similar processes but

opposite in their effects, a proposal that has gained favor over time (see Baron, 1991; Crosbie, 1998; Fantino, 1973). To the extent that reinforcement and punishment are symmetrical, the proposal offers an economy of analysis, because understanding of each domain has implications for the other (see Hineline, 1984a). Immediacy. A punishment procedure may fail because the punisher is delayed. Camp et al. (1967) systematically studied several delays of punishment (0, 2, 7.5, and 30 seconds) and confirmed that immediate delivery is more effective than delayed punishment and that at long delays, the effect of punishment does not differ from response-independent delivery of the punisher. In the laboratory, immediate delivery of the punisher is often ensured by the use of automatic apparatus, but this is not the case when a human practitioner or parent applies a punishment procedure in the natural environment. The human observer may be late in detecting the response and slow to apply the punisher. Although the body of research on procedures to deal with delayed punishment is small, and more research is needed, a few methods have been shown to mitigate this problem (see Lerman & Vorndran, 2002). The problem of delay is mainly technological and can be solved by means of devices that detect and deliver the punishment in a timely manner and by means of procedures such as the use of conditioned punishers that link the response to the delayed punishment. This problem is not different for delayed positive reinforcement, so procedures that effectively link the response with the delayed reinforcement can also be applied to delayed punishment. Intensity. The punishing stimulus should be as intense as possible. Although the relation between intensity of punishment and its effects has been well established for stimuli whose physical parameters can be measured, such as electric shock, noise, air blast, and point or money loss, intensity is an imprecise term when applied to other stimuli. Louder reprimands do not necessarily translate into more intense punishers. However, larger penalties (e.g., money or point loss) are more effective than smaller ones (Crosbie, 1998; Kazdin, 1972). We should note that longer durations do not necessarily mean larger 487

Hineline and Rosales-Ruiz

intensities and do not necessarily translate into better punishers. Given these difficulties, practitioners advised beginning with an intense but safe, practical, and accepted punisher and then withholding further increases in intensity until the other variables are ruled out. In the case of time out from reinforcement, one way to look at the intensity of punishment focuses on the duration of the extinction period. Here again, however, longer time outs do not necessarily mean more effective punishers. Time-out periods as brief as 2 minutes have been shown to be effective in reducing disruptive and undesirable behavior among children (e.g., Bostow & Bailey, 1969). Furthermore, the effectiveness of time out depends on the interaction of the time-in and time-out environments (e.g., Solnick, Rincover, & Peterson, 1977; see also Hackenberg & Defulio, 2007) and variables concerning the maintenance of behavior (Plummer, Baer, & Leblanc, 1977). If the behavior is maintained by positive reinforcement, time out from that reinforcement should work (given that the other parameters are fulfilled). If the behavior is maintained by negative reinforcement, however, time out might increase the behavior. The interplay of time in and time out was well summarized by Baer (2001): Time out is probably a widely used and largely misused procedure in our society today. It is often prescribed as an acceptable way to reduce undesirable behavior, and the behavior therapies often package it with an attempt at cognitive reorganization. But behavior-analytic logic teaches that time out has no necessary or fixed function for behavior. If the reinforcement and punishment schedules of the time-out environment are worse than those of the time-in environment, time out will weaken the behavior on which it is systematically contingent, but only if the contingency is managed well. If the reinforcement and punishment schedules of the time-out environment are equal to those of the time-in environment, time out will not change the behavior on which it is systematically contingent, 488

other than continue to detract from the time available for good programming. If the reinforcement and punishment contingencies of the time-out environment are better than those of the time-in environment time-out will strengthen the behavior on which it is systematically contingent, but only if the contingency is managed well. (p. 259) Manner of introduction. On one hand, introducing punishers at a low intensity followed by gradual increases is likely to result in adaptation and render higher levels of intensity ineffective. On the other hand, the sudden presentation of an intense stimulus is known to disrupt behavior. The combination of a sudden high-intensity stimulus maximizes the stimulus change. Thus, counterintuitive as it may seem, the punishing stimulus should not be introduced gradually but should instead be introduced at maximum feasible intensity (Azrin, 1960). Schedule of punishment. In general, the greater the proportion of punished to unpunished behavior is, the greater the reduction of the punished behavior. Similarly, the more frequent the punishment is, the less frequent the punished behavior (e.g., H. B. Clark, Rowbury, Baer, & Baer, 1973). Thus, the probability of punishment should be as high as possible; ideally, every occurrence of the targeted response should result in the punishing stimulus. However, when punishment is discontinued, continuously punished behavior recovers faster than intermittently punished behavior, presumably because the absence of punishment is more discriminable. A likely strategy, then, is to begin with consistent, intense punishment and then to gradually introduce intermittency, just as one would do with positive reinforcement in maintaining targeted responses. In addition, increased intensity of the punisher can enhance the effectiveness of intermittent punishment. Consistency. Failure of punishment may be the result of uncontrolled variations of punishment intensity. Again, this is less of a problem when automated procedures are used but a greater problem when human activity is required for the punishment administration. However, even in the laboratory the

Behavior in Relation to Aversive Events

intensity of stimuli can also be affected by nontargeted behavior that reduces it. The classic example is behavior called breakfast in bed. The term was inspired by a rat that learned to avoid the grid shock by lying on its back while pressing the lever with its hind foot to produce food. In the natural environment, individuals can resist, plea, apologize, and show remorse either to reduce the intensity of the punisher or to avoid it altogether. Treatment integrity is the more contemporary term for the accuracy and consistency with which interventions are implemented (e.g., McIntyre, Gresham, DiGennaro, & Reed, 2007). Characteristics of the to-be-punished behavior. In addition to the administration of the punishment procedure, the effectiveness of punishment depends on current variables operating on the to-be-punished behavior as well as its history (cf. Lerman & Vorndran, 2002). Except for first occurrences of the to-be-punished behavior, the delivery of punishment is contingent on ongoing behavior that is maintained by positive reinforcement or negative reinforcement or is induced by other processes. Positive punishment of behavior maintained by positive reinforcement is by far the most studied case. However, studies of behavior maintained by negative reinforcement or behavior elicited by shock are generally in agreement with the results found with punishment of behavior maintained by positive reinforcement. Nonetheless, knowledge of the variables maintaining the-to-be punished behavior can guide the choice of procedures. The first choice, of course, is to manipulate the maintaining variables directly and thus obviate the need for punishment. For example, if the behavior is maintained by positive or negative reinforcement, shaping procedures could be used. However, sometimes those maintaining variables cannot be controlled, leaving a punishment procedure as the best course of action. Identification of the maintaining variables is especially crucial for the application of time out. If the to-be-punished behavior is maintained by negative reinforcement, response-contingent time out will increase that behavior rather than decreasing it, as mentioned earlier. Plummer et al. (1977) provided an illustrative example in which time out from

a known positive reinforcer (praise plus food) produced an increase in the-to-be punished behavior. In this case, the procedural time out functionally constituted negative reinforcement by reducing the demands of the time-in environment. A given punishment procedure will produce a greater reduction in responding if that responding is maintained under a lean reinforcement schedule rather than a rich one. Thus, to maximize the punishment effect, the frequency of positive reinforcement for the punished response should be reduced. Similarly, the degree of motivation to emit the punished response should be reduced because punishment effectiveness has been found to be inversely related to the levels of deprivation of reinforcers maintaining the to-be-punished behavior: The higher the level of deprivation, the less effective the punishment procedure (Azrin, 1960). However, behavior suppressed under low levels of deprivation does not recover by increasing the levels of deprivation afterward. Another way to minimize the effects of the schedule maintaining the to-be-punished behavior is to make available a concurrent alternative response that will not be punished but will produce the same or greater reinforcement as the punished response (Herman & Azrin, 1964). For example, punishment of criminal behavior can be expected to be more effective if noncriminal behavior that produces the same advantages as the criminal behavior is available.

Stimulus Control and Punishment Effects As we discussed in the previous section, providing for an alternative reinforced response can make punishment more effective. However, if no alternative response is available the subject should be given access to a different situation in which the same reinforcement is available without punishment. This recommendation basically amounts to discrimination training, and it is used when the behavior is appropriate only under certain conditions and not under others. However, the stimulus control of behavior can enhance the effects of mild punishment. Doughty, Anderson, Doughty, Williams, and Saunders (2007) provided a relevant example. In their study, dysfunctional stereotypic behavior occurred frequently in the presence of a stimulus 489

Hineline and Rosales-Ruiz

correlated with nonpunishment of that behavior. Presentation of a stimulus correlated with mild punishment of stereotypy produced immediate decreases, with stereotypy occurring rarely, if ever, in the presence of that stimulus. Another study, illustrating a similar effect with pica (ingestion of non-nutritive substances), is that of Piazza, Hanley, and Fisher (1996). One condition involved response-independent delivery of food, coupled with an instruction not to touch the cigarette butts present in the room; no consequences were provided for picking up or consuming the cigarette butts. Another condition involved taping a sheet of purple construction paper to the wall along with the initial instructions. In this condition, however, each time the subject touched a cigarette butt, the therapist entered the room and delivered the reprimand, ‘’No butts.” If the subject did not comply with this reprimand, the therapist physically guided the subject to drop the butt. The two procedures were alternated in a multiple-element design, thus providing for the differential correlation of the purple piece of paper with the punishment contingency. The results show that response-independent presentation of food had virtually no effect, but the discrimination training produced dramatic effects on both cigarette butt pick-ups and cigarette butt pica. Similar effects were found by McKenzie, Smith, Simmons, and Soderlund (2008), who were able to completely eliminate chronic eye poking by correlating wristbands with the simple reprimand “Stop poking” in the punishment condition. Stimulus-control techniques for enhancing punishment effects have also been documented for the case of negative punishment. A variation on timeout procedures termed time-out ribbon, developed by Foxx and Shapiro (1978), is an example. During their baseline phase, a teacher continued to use reprimands (as had occurred before the experiment) in attempts to discourage the disruptive behavior of four elementary students with mental retardation. In the next phase, social praise and edibles were provided for good behavior, and the misbehavior was ignored while the child was wearing the ribbon. This was done to increase the density of reinforcement during the time in. Contingent on inappropriate

490

behavior, the ribbon was removed from the child, all forms of social interaction were terminated, and no edibles were delivered for 3 minutes. As a result, the disruptive behavior of all four children was suddenly and substantially reduced. Laraway, Snycersky, Michael, and Poling (2003) attributed the effects of this procedure not to punishment but to an establishing operation. They argued that “the [establishing operations] for the programmed events during the time-in also established the punishing effect of the ribbon loss (i.e., functioned as [establishing operations] for ribbon loss as a punishing event) and abated misbehaviors that resulted in ribbon loss” (p. 410). Alternatively, one could argue that the ribbon signaled periods of reinforcement, and absence of ribbon signaled periods of time out from reinforcement. Also notice that the differential reinforcement procedure before and after the introduction of time out did not have any effect on replacing the misbehavior with good behavior. Was it the case that during those phases the social praise and edibles were not potent? The effects of punishment were more likely enhanced by the stimulus control provided by the presence and the absence of the ribbon (see Azrin & Holz, 1966; Azrin, Holz, & Hake, 1963). DeFulio and Hackenberg’s (2007) study of the roles of added stimuli in avoidance time out showed similar stimulus-control effects. Another way to establish stimulus control in the absence of external stimuli is to alternate periods of reinforcement with periods of extinction plus punishment. This procedure establishes the delivery of punishment as a discriminative stimulus for extinction. Under these conditions, an otherwise ineffective punisher will produce large decreases in behavior during extinction periods. Thus, if possible, the delivery of the punishing stimulus should be made a signal or discriminative stimulus that a period of extinction is in progress. The principle of using enhanced stimulus control to increase the effects of punishment offers an exciting area for further applied and translational research. Azrin and Holz’s (1966) elegant and systematic strategies provide an excellent model for such work. Also, these results invite the reconsideration of

Behavior in Relation to Aversive Events

events such as don’t-do-it reprimands common in everyday situations despite their typical ineffectiveness. A great contribution of behavior analysis could be to link the determinants of effectiveness to such common procedures. Punishment can also be established as a discriminative stimulus for reinforcement. Thus, great care should be taken to see that the delivery of the punishment stimulus is not correlated with the delivery of reinforcement. Otherwise, the punishing stimulus may acquire conditioned reinforcing properties. Both parents and pet owners often unwittingly produce this effect by making their deliveries of punishment the occasions that are most reliably followed by expressions and gestures of affection. Holz and Azrin (1961) provided a compelling experimental analogue of this, demonstrating unambiguously that an effective punisher can lose its effectiveness by establishing it as discriminative for positive reinforcement. To establish shock as a discriminative stimulus, they alternated periods in which responding produced positive reinforcement on an intermittent schedule and punishment on a fixed-ratio 1 schedule with periods of extinction in which responding produced neither reinforcement nor punishment. Under these conditions, punishment increased responding instead of decreasing it. The results are shown in Figure 21.1. This type of stimulus control over behavior is considered a side effect of punishment because it could be established unintentionally during the administration of punishment.

Maintenance and Generalization of Punishment Effects Two enduring rationales for favoring reinforcement over punishment procedures are the specificity of punishment and its transient nature. Although it has been pointed out repeatedly that the same is true for positive reinforcement, the argument has been resistant to change. Unfortunately, behavior analysts have perhaps inadvertently contributed to this misunderstanding by treating reinforcement and punishment as entirely separate domains, instead of pointing out that problems of maintenance and generalization are common to both.

Figure 21.1. Illustration of punishment as a discriminative stimulus for positive reinforcement: Cumulative responding during no punishment as a discriminative stimulus for positive reinforcement and punishment as a discriminative stimulus for positive reinforcement. From “Discriminative Properties of Punishment,” by W. C. Holz and N. H. Azrin, 1961, Journal of the Experimental Analysis of Behavior, 4, p. 228. Copyright 1961 by the Society for the Experimental Analysis of Behavior, Inc. Reprinted with permission.

For example, Matson and DiLorenzo (1984) and Miltenberger (2001) proposed using the types of generalization-promoting procedures that were suggested long ago by Stokes and Baer (1977). In a recent review of the applied literature, Lerman and Vorndran (2002) acknowledged the value of that suggestion but urged caution: “Current knowledge about punishment, however, is insufficient to guide the application of such strategies” (p. 452). As supporting evidence, they pointed to Stokes and Baer’s recommendation to use intermittent and delayed reinforcement as one way to program indiscriminable contingencies so as to maintain behavior for long periods of time. They rightly pointed out that these contingencies undermine the efficacy of punishment. Nevertheless, there are some promising studies along these lines. Lerman, Iwata, Shore, and DeLeon (1997) were able to maintain punishment effects by slowly changing from punishing every response to punishing as infrequently as once every 300 seconds. H. B. Clark et al. (1973) were able to maintain punishment effects by transitioning from

491

Hineline and Rosales-Ruiz

punishing every response to punishing every third or fourth response (on average, in variable schedules) without the need for prolonged gradual transitions. Clearly, more research is needed here; perhaps the intermittent punishment rule applies only to the initial decrease, as it does for reinforcement during the initial increase, or perhaps there are ways to accomplish intermittency of punishment without compromising the effects. Assuming that what is known about reinforcement is very likely to apply to punishment as well, we recommend exploring Stokes and Baer’s (1977) analysis to promote generalization of punishment effects. Within this general framework, programming common stimuli is a very promising approach, one that was adopted by Piazza et al. (1996) and by McKenzie et al. (2008). With appropriate adjustments, both punishing loosely and punishing sufficient exemplars should also be quite effective. Even more broadly, we propose that the effectiveness, generalization, and maintenance of punishment should be systematically programmed, as is commonly done in the applications of positive reinforcement. Punishment as a motivational effect. The suppressive effects of punishment have sometimes been attributed to the development of avoidance behavior that competes with the punished response; this is called the avoidance hypothesis of punishment (see Dinsmoor, 1954, 1998). In this view, the suppressive effects of punishment are seen as secondary effects of negative reinforcement. That is, the decrease in behavior is accounted for as an increase in alternative behavior maintained by negative reinforcement (see also Skinner, 1953). In that view, the role of the punishment is relegated to potentiate negative reinforcers only. Although this view is not the dominant position in behavior analysis, it is still occasionally acknowledged as an alternative interpretation (e.g., Sidman, 2006; Spradlin, 2002). The continuing appeal of the avoidance hypothesis regarding punishment effects may be attributable to the difficulty of discerning, both conceptually and experimentally, the multiple effects of stimuli that are involved (e.g., Arbuckle & Lattal, 1987; Galbicka & Branch, 1981; Galbicka & Platt, 1984). An example provided by Dinsmoor (1998) about an 492

infant learning to walk illustrates this difficulty. Noting that the process will certainly involve falling, he asked, Do these aversive stimuli [falling] influence the infant’s learning by suppressing erroneous responses such as looking past obstacles, placing the feet too close together, or letting the body tilt too far to one side? Or, are the necessary adjustments such as scanning the oncoming terrain, spreading the feet, and promptly correcting minor deviations in balance learned because they avoid untoward consequences? (p. 194) Dinsmoor preferred the second alternative, but it is difficult to imagine how these competing alternative accounts can be empirically separated. Notice that the role of positive reinforcement is not mentioned. One could easily ask, does walking increase because one is avoiding falling or because one is getting somewhere? The conundrum is similar to the issue (discussed later) of ascertaining the roles of discriminative versus establishing stimuli in avoidance and the distinction between positive and negative reinforcers. To be sure, the presentation of aversive stimuli such as an electric shock can have motivational effects. That is, it establishes shock removal as a negative reinforcer (Michael, 1982, 1993). However, the reduction of behavior is only partially explained by the avoidance and escape behavior maintained by negative reinforcement. Azrin et al. (1965) showed (in an experiment we describe in the section Change of Situation in the Escape Paradigm later in this chapter; see Figure 21.4) that the motivational effects of shock presentation, as measured by escape or avoidance, are independent of their suppressive effects. Shock at low values had no apparent role; at intermediate values, it served as an establishing stimulus operation only; and at higher values, it could also serve as a punisher. A parallel effect of positive reinforcement would be that certain types of reinforcers are good enough to approach (consume) but ineffective for reinforcing other behaviors.

Behavior in Relation to Aversive Events

Negative Reinforcement As we noted earlier, the negative of negative reinforcement denotes the reduction of some event as a consequence of behavior, resulting in a subsequent increase in the occurrence of that behavior. That reduction can entail any of several characteristics, including complete prevention, reduction in intensity, postponement, or reduction in the event’s frequency of occurrence. Thus, a person might take a medication that eliminates pain, follow a particular diet that reduces blood pressure, reduce the frequency of illness by arranging to get more sleep, or postpone the payment of income tax while continuing to pay it once per year. Although ancillary principles can be involved, each example comes under the heading of negatively reinforced behavior. We identify and describe experimental procedures that isolate and illustrate variations such as these.

Basic Terms and Distinctions Concerning Negative Reinforcement A distinction consistently maintained in the literature is that between escape, whereby behavior results in removal of something already present, and avoidance, whereby behavior postpones or prevents something from occurring. However, ambiguity can arise despite these clear definitions because as Michael (1975) has argued, removal of one thing inherently implies the addition of its converse. Thus, putting on a coat can be characterized as accomplishing either the removal of cold or the facilitation of warmth; removal of a threat can be understood as the production of safety. Although there will always be borderline cases in which the specification of addition versus removal seems arbitrary, one of the two usually has the advantage of precision: The threat is likely to be specific, whereas safety is defined only indirectly by the absence not only of that threat, but of all threats. Thus, a limited range of responses is likely to be involved in removal of a threat, whereas a wide and largely unspecified range of responses is likely to be accompanied by safety. However, particular circumstances can exist in which safety is more clearly delineated than its corresponding range of threats, so the interpretive language should adjust accordingly.

A second resolution of the arbitrariness is based on the fact that if something is to be removed as a consequence of specified behavior, that behavior must occur in the presence of the to-be-removed stimulus or situation. Many such stimuli or situations will elicit or otherwise evoke behavior that competes with the to-be-reinforced behavior (Hineline, 1984a). Thus, if a person is chilled while in bed, shivering and curling up to conserve heat are likely to compete with getting up and finding a warm garment or blanket. Laboratory experiments on avoidance have frequently been complicated by the competing behavior (immobility, biting, and other stereotyped behavior patterns) that is elicited by to-be-avoided shocks (e.g., Bolles, 1970; Dinsmoor, 1977; Forgione, 1970; Hake & Campbell, 1972; Keehn, 1967). These eliciting and other response-inducing effects have generally been called side effects or by-products of aversive control (e.g., Hutchinson, 1977; Newsom, Favell, & Rincover, 1983; Ulrich, Dulaney, Kucera, & Colasacco, 1972). Such effects are not confined to stimuli that produce pain or discomfort, however, for competing, aggressive behavior can also be produced by events such as response-independent point loss, which, as we show, can also generate avoidance. A third consideration that distinguishes negative from positive reinforcement is more complex because it concerns the sequential characteristics and extended time scales of motivating, or establishing, operations, conditions, or stimuli. As Michael (1982) has shown, for situations involving positive reinforcement it can be a relatively straightforward matter to distinguish between discriminative and motivational aspects of stimuli or of the situations that those stimuli delineate. In contrast, and as we illustrate, discriminative and establishing stimuli are often conflated when negative reinforcement is at issue. An establishing condition or establishing stimulus is a condition or stimulus that potentiates some other event as a reinforcer. An establishing operation is the explicit manipulation of such a stimulus or the production of such a condition. Thus, a bartender potentiates beverages as positive reinforcers by freely supplying salted snack foods. Establishing conditions can also arise without having been explicitly arranged—as when a group of hikers 493

Hineline and Rosales-Ruiz

arrives at the pub after a long hike on a warm day. In the case of positive reinforcement, the most commonly recognized establishing conditions vary on time scales quite different from the behavior that affects the related reinforcers: Hunger increases or decreases over hours, or at least many minutes, whereas the behavior enhanced by its producing the potentiated reinforcer (food) is often analyzed moment-to-moment. The salted pretzels have their effect gradually, whereas the episode of procuring a drink transpires quickly. Discriminative stimuli, however, denote whether the behavior of concern can produce a given consequence, irrespective of whether the consequence is a potent reinforcer. For example, the presence of a bartender looking at you is a discriminative stimulus that occasions your requesting a beverage. In the past, beckoning to bartenders who are looking elsewhere has not been reinforced; thus, people come to beckon and request a beverage only when the bartender is looking at them. Discriminative stimuli commonly come and go moment to moment, thus varying on time scales commensurate with the behavior of concern. In situations involving negative reinforcement, however, both establishing and discriminative stimuli often change from moment to moment, often simultaneously, and sometimes with both roles even devolving onto a single stimulus, which conflates the establishing and discriminative functions. As a result, interpretations of avoidance have tended to emphasize the motivational aspect of relevant stimuli—denoting them as conditioned aversive stimuli, in the behavior-analytic tradition (e.g., Anger, 1963), or as elicitors of conditioned fear in contrasting interpretive traditions (e.g., Kamin, Brimer, & Black, 1963). Interpretations with this emphasis have given insufficient recognition to the discriminative aspects of those stimuli. As we show, basic behavior-analytic research on avoidance has revealed that in many cases, the function of a putatively motivational stimulus is mainly discriminative in character. Conflation of those two functions has tangled the study of negative reinforcement throughout the history of its psychological interpretation. On one hand, a traditional issue has been “the avoidance problem,” whereby avoidance, as noted earlier, is 494

defined by the prevention of some event or situation: How can nonoccurrence be a consequence of behavior? This problem appears to arise from a presumed necessity of consequences being contiguous with behavior if they are to reinforce it. Thus, avoidance theory has been viewed as a distinct domain, with interpretations focused on identifying a plausible process whereby some event can be construed as supplying a contiguous consequence (Bolles, 1970; Dinsmoor, 2001; Herrnstein, 1969; Hineline, 1976, 1984a). Most avoidance theories, which is to say, theories posited as solving the avoidance problem, have been anchored in procedures that include a warning stimulus—a stimulus that accompanies but occurs in advance of the to-be-avoided stimulus or situation. Indeed, in the history of avoidance studies, the warning stimulus was ubiquitous in experimental procedures and was initially viewed as essential to the behavior of avoiding, partly because it fits so comfortably with a theoretical account known as two-factor, or two-process, avoidance theory (for reviews, see Bolles, 1973; Herrnstein, 1969). The two putative processes are Pavlovian (or respondent) conditioning and operant conditioning. By that account, the warning stimulus begins as a neutral event, but through pairing with the primary aversive stimulus it becomes a conditioned aversive (or fear-eliciting) stimulus whose removal can serve to reinforce behavior. According to that account, people do not avoid rain directly; instead, they avoid rain by escaping from warning stimuli such as clouds, which in their past experience have accompanied rain. Although that concatenation of two processes can account for some of the behavior that is commonly characterized as avoidance, the theory is silent concerning many features of such behavior. In addition, a major contribution of behavior-analytic research has been to show not only that avoidance can occur without the aid of warning stimuli, but even more tellingly that warning stimuli often play a role different from that posited by two-process theory. Warning stimuli have often proved to be more discriminative than motivational. Thus, although many textbooks of conditioning and learning continue to feature two-process theory as the major explanation of avoidance, studies accumulated over

Behavior in Relation to Aversive Events

more than a half-century have shown that theory to be untenable as a comprehensive account (for reviews supporting this assertion, see Hineline, 1976, 1981). We describe representative studies of that kind here, along with an alternative conception of the behavior called avoidance.

Classic, and Still Relevant, Experiments The first, and best-known, experiment questioning the importance of warning stimuli in avoidance was done long ago by Sidman (1953a), who devised a procedure that carries his name. It was first accomplished with laboratory rats but was soon replicated with other species, including humans (e.g., Ader & Tatum, 1961; Behrend & Bitterman, 1963; Black & Morse, 1961; F. C. Clark, 1961). The simplest version of Sidman’s procedure is arranged by means of a clock, a response lever, and a device that can deliver brief (uncomfortable but harmless) electric shocks; the relationships among them is presented schematically in Panel A of Figure 21.2. In the absence of the subject’s lever pressing, the timer delivers a brief shock when its specified time has elapsed, and then it starts over. If the subject fails to respond, the brief shocks will continue to occur at the specified intervals, typically 20 seconds. Any lever press resets the timer, starting the interval over and preventing a shock. This procedure readily established and maintained lever pressing. Sidman (1953b) also systematically varied the time between shocks when no responses intervened and, by adding a second response–shock timer, independently assessed the time by which a response could postpone a shock, as diagrammed in Panel B of Figure 21.2 (Sidman, 1953b). Although these procedures did not supply exteroceptive warning stimuli, Anger (1963) offered a plausible account of how the temporal regularities of those procedures still allowed for two-process avoidance theory to account for the results, proposing that the passage of time functions in a manner comparable to stimuli such as tones and lights. A still stronger challenge to two-factor theory was provided by Herrnstein and Hineline (1966), who found that the lever-press responding of laboratory rats was readily established and maintained by changes in the frequency of probabilistic events,

A S R

SS = RS Interval SS Interval

B

S R RS Interval

Figure 21.2. Schematic diagram showing two variants of Sidman’s (1953a) avoidance procedure, which does not include warning stimuli. In both diagrams, the passage of time (and the progress of the timer that can deliver an impending shock (S) is indicated as progressing from left to right, as indicated by the solid arrowed line. Instantaneous transitions (initiated by responses that occur at any time, as indicated by the braces, or by the delivery of shocks, as indicated at the right) are indicated by the dashed lines. A: a version based on a single timer; B: a version with two timers, one of which controls the S–S interval (the time between shocks in which no response occurs), and the response R–S interval, which is initiated by, and also reset by, a response. From Advances in Analysis of Behavior: Vol. 2. Predictability, Correlation, and Contiguity (p. 205), by P. Harzem and M. D. Zeiler (Eds.), 1981, Chichester, England: Wiley. Copyright 1981 by John Wiley & Sons. Adapted with permission.

again with no warning stimuli supplied. In the absence of responding, shocks were delivered at one probability, sampled every 2 seconds; a single response lowered the probability, still sampled every 2 seconds, where it stayed until a shock resulted, thus returning to the higher probability distribution. Herrnstein and Hineline argued, then, that shockfrequency reduction per se constituted the reinforcing consequence of behavior—an interpretation that Sidman (1962) had tentatively proposed on the basis of an experiment that had not been designed to directly manipulate the shock frequencies. 495

Hineline and Rosales-Ruiz

Addressing a converse interpretation, Hineline (1970) subsequently found that short-term postponement of shocks could also generate and maintain responding even when overall shock frequency was held constant. In that experiment, brief shocks were arranged to occur at the 10th second within repeating 20-second cycles and with a retractable response lever available during the first 8 seconds. A single response retracted the lever for the remainder of the cycle and postponed the shock from the 10th to the 18th second. Thus, with one shock per 20-second cycle, shock frequency was constant at three per minute and yet responding was established and maintained. In a second experiment, the response postponed the shock from the 10th second to the point 8 seconds after that response and with a new cycle beginning 2 seconds after the shock. This procedure did not maintain responding—a result attributable to the fact that responses shortened the cycle, thus increasing shock frequency. Warning stimuli are irrelevant here, making two-factor theory silent concerning the results, but the results suggest two distinguishable independent variables operating on different time scales—short-term, moment-tomoment postponement and changes in overall frequency of aversive events. Ironically, some of the strongest evidence that challenges the two-process interpretation of warning stimuli came from the insertion of warning stimuli into the very procedures that had shown them to be unnecessary. Thus, Sidman (1955) superimposed the presentation of a light (in a procedure for rats) and a tone (for cats) during the 5-second period before a shock was due. Lever presses in advance of the warning stimulus reset the timer, thus postponing both the warning stimulus and the shock, whereas responses in the presence of the warning stimulus terminated that stimulus while also resetting the timer. One might have thought that the warning stimulus, having been paired with shock, would function as an aversive stimulus to be avoided. Instead, the animals typically waited for the warning stimulus to come on before responding. In the light of this, one might suggest that the warning stimulus was sufficiently aversive to maintain escape responding in its presence, but not sufficiently aversive to maintain behavior that would prevent its 496

onset. However, the animals were still capable of preventing the warning stimulus, as was shown by arranging for responses to be ineffective only during the warning stimulus (Sidman & Boren, 1957a). A still stronger case for a differing role of warning stimuli was presented by G. E. Field and Boren (1963), who built on Sidman’s (1962) ingenious escalator procedure, in which the subject’s responding was analogous to running up a down-escalator (Sidman, 1962). That is, with no responding, brief shocks occurred every 5 seconds. Each response postponed the next shock by an increment of 10 seconds, up to a maximum of 110 seconds, while the clock kept running, thus continually subtracting from the accumulated shock-free time. G. E. Field and Boren then superimposed warning stimuli that varied in synchrony with the current amount of response-produced postponement—a row of 11 lights that were illuminated successively or an auditory clicker that produced varying numbers of clicks per second. When lights alone were supplied, most responses occurred when shocks were 40 to 70 seconds away. When the clicks alone were supplied, the subjects allowed the shocks to come closer, with nearly half of the responses occurring when shocks were 30 to 40 seconds away. The most informative result occurred when the lights and clicks were subsequently presented in combination; the resulting response patterns were similar to those of clicks alone. By the logic of two-process theory, the fact that the added lights alone had maintained greater postponement than clicks alone would mean that lights were more aversive than the corresponding click rates. It would follow that adding clicks to light would have relatively little effect, perhaps resulting in slightly greater postponement. Instead, the opposite occurred: Adding the click to the light resulted in the animals maintaining a closer temporal distance to shock. It is also relevant that when neither the click nor the lights were supplied, the animals maintained the maximal, 110-second distance from shock. Thus, the role of the warning stimuli was primarily discriminative rather than motivational. Dinsmoor and Sears (1973) demonstrated that response-produced stimuli correlated with the absence of shock could also play a potent role in avoidance situations. They used pigeons as subjects

Behavior in Relation to Aversive Events

and a Sidman-type shock postponement procedure contingent on treadle pressing. In addition, during initial training each response produced a 5-second 1,000-Hz tone. During subsequent testing in every third session, the frequency of the responseproduced tone was varied from session to session, from 250 to 4,000 Hz, including some sessions with no tones. This testing yielded distinct generalization gradients that were fairly symmetrical on a logarithmic scale, thus revealing control by these safety signals. These manipulations did not differentiate between discriminative and conditioned reinforcing roles of those stimuli, but they did indicate effects distinct from the motivational and energizing role commonly attributed to warning stimuli.

Situational Change as Reinforcing Avoidance Subsequent and increasingly complex experiments have made the two-process, primarily motivational account less tenable. To make sense of them, it is useful to begin by developing the concept of behavioral situation. A behavioral situation is a bounded period of time that can be functionally contrasted with other such periods. Its boundaries are typically delineated with accompanying stimuli—in experiments, by the temporary presence or absence of lights or tones or even of the specific chamber or room in which a specific procedure is implemented. Boundaries of the situation can also arise directly from events or stimuli that yield its functional properties. Thus, a rainstorm begins when droplets begin to arrive and ends when they cease, but if those transitions are gradual, the storm can be made more unitary if accompanied by abrupt changes from sunshine to cloudiness and back to sunshine. For the analysis of negative reinforcement, our main concern is with goings on within a situation that affect the aversiveness of that situation—aversiveness being revealed by whether response-contingent termination or prevention of the situation will serve to maintain behavior. One might think that the major determinant of this aversiveness would mainly be the frequency of primary aversive events within the situation. It turns out that several other and more interesting features contribute to that aversiveness.

A variety of experiments have revealed the roles of stimuli that delineate such situations. The first of these, by Sidman and Boren (1957b), entails yet another elaboration of Sidman’s shock-postponement procedures, using two timers and an added light, as diagrammed in Figure 21.3. The first timer specified a response–light (R-L) interval and thus controlled the onset of a light; lever presses could reset that timer, thus postponing the presentation of the light. If the light was allowed to come on, a different timer became operative that specified a response–shock (R-S) interval; now lever presses could reset this timer, thus postponing a shock, but the light stayed on. The light went off only if the subject (a laboratory rat) paused long enough for a shock to occur, accompanied by reactivation of the first timer and the R-L interval that it controlled. When the R-L and R-S intervals were equal at 10 seconds, nearly all responding occurred in the presence of the light, directly postponing shocks; on the few occasions when shocks occurred, the subjects typically waited out the R-L interval and then resumed the

//////////////// S R1

R2

Figure 21.3. Diagram of a procedure that illustrates the aversiveness of behavioral situations, as distinct from aversive events per se within those situations. The passage of time is indicated by the solid line, from left to right, and instantaneous changes are indicated by the dashed lines. The hatched area indicates the presence of a warning stimulus. Responses in advance of the warning light (R1 in the diagram) reset a timer controlling the response–light (R–L) interval, thus postponing the onset of that stimulus. If the warning stimulus was allowed to come on, responding (R2 in the diagram) could still postpone shock, but only by resetting the timer that controlled the response–shock interval (also the duration of the warning stimulus), which stayed on. Only by taking a shock (represented by S in the diagram) could the animal return to the situation in which R1 and the R-L interval was in effect. From Advances in Analysis of Behavior: Vol. 2. Predictability, Correlation, and Contiguity (p. 213), by P. Harzem and M. D. Zeiler (Eds.), 1981, Chichester, England: Wiley. Copyright 1981 by John Wiley & Sons. Adapted with permission. 497

Hineline and Rosales-Ruiz

responding that reset the R-S clock. To this point, this result was reminiscent of Sidman’s (1955) experiment, showing little that was new. The most important results occurred when the R-L interval was increased, with the RS interval held constant at 10 seconds. As the R-L interval (the interval during the dark, whereby responding could postpone the light) was increased over blocks of experimental sessions, responding increased during that interval even though the animals could have responded more slowly without cost. Even more significant, during the light, which accompanied the constant 10-second R-S interval, response rates decreased from more than 12 responses per minute to fewer than three. That is, the largest effect on response rate occurred in a situation in which the response contingency was held constant. In addition, it was clear that the animals were waiting out the interval and taking a shock, with the result being the opportunity to more easily avoid other shocks. Krasnegor, Brady, and Findley (1971) demonstrated closely similar relationships with monkeys instead of rats as subjects and with ratio schedules under time limitations instead of shock-postponement procedures.

Change of Situation in the Escape Paradigm Conceptualizing change of situation as a potentially reinforcing consequence of behavior enables us to tie some of these complex elaborations of avoidance back to the putatively simpler escape relations and then to elaborate the escape relation itself in terms of intermittent reinforcement contingencies that are analogous to the familiar schedules of positive reinforcement. First, to review, the escape contingency is one in which a response occurring in the presence of an aversive stimulus can directly eliminate that stimulus. Furthermore, in a pure escape procedure, additional responding, which occurs in the absence of the aversive stimulus, has no effect on the subsequent likelihood of that stimulus occurring; otherwise, it would be a combination of escape and avoidance contingencies. Removal of an aversive stimulus surely constitutes a highly discriminable consequence of the response. Thus, an escape procedure has properties of an Sd–SΔ arrangement, in 498

which Sd is a discriminative stimulus that sets the occasion for a particular response to be reinforced and SΔ is a stimulus that occasions nonreinforcement (extinction) of that response; responding is effective in the presence of the Sd and ineffective in its absence. However, not only is responding ineffective in the absence of the Sd, the establishing condition is absent as well. As noted earlier, this illustrates a major difference between positive and negative reinforcement; in the case of positive reinforcement, the discriminative and establishing functions typically occur out of synchrony and on differing time scales. Using basic schedules—variable intervals and fixed ratios operative in the presence of shock—with the scheduled reinforcement being termination of shock, Dinsmoor and Winograd (1958) and Winograd (1965) were able to identify schedule effects suggestive of those obtained with positive reinforcement but with limited ranges of schedule requirements. The limitations appeared to arise substantially from competition, with behavior patterns directly elicited by the shocks, thus illustrating the asymmetry between negative and positive reinforcement, which we have noted. It was a short step beyond escape from continuously available shock to arrange for escape from situations delineated by continuous tones or lights that accompany brief intermittent shocks, thus effectively aggregating those shocks plus delineating stimuli into functionally unitary situations (Hineline, 1984a). The salience of such delineating stimuli can be important, as shown by Baron, DeWaard, and Lipson (1977), who found that when an elevated shelf was made available during the time-out periods from an avoidance procedure, which enabled the rats to get off the grid floor, response rates on the variable-interval schedule whereby they could produce the time-out periods were elevated— even though no shocks were delivered if they stayed on the grid floor during the time outs. Interpreting this result as enhanced discriminability of safe periods was supported by an extensive series of studies by Badia and his colleagues, who examined animals’ preferences for situations in which brief shocks were preceded by signals of a few seconds’ duration in comparison with situations in which comparable

Behavior in Relation to Aversive Events

shocks were unsignaled. Some manipulations indicated that rats preferred situations in which shocks were signaled over situations in which the shocks were unsignaled, even when the signaled shocks were more intense or of longer duration (Badia, Culberson & Harsh, 1973) or occurred more frequently (Badia, Coker, & Harsh, 1973). Azrin et al. (1965) used the addition of a delineating stimulus when studying removal of a punishment contingency as the basis for negative reinforcement. That is, a pigeon’s pecks on one response key resulted in escape from a situation in which the food-reinforced responses on a second key produced shocks. The experimenters accomplished this by first maintaining one response with a schedule of positive reinforcement, then arranging for those responses to produce brief shocks, and finally arranging for a second response to produce a change of illumination, which accompanied the disabling of the punishment contingency until the next positive reinforcer had been produced. Responding that produced this change of situation was readily established and maintained. Variations in punishment intensity yielded a finding of great practical importance: As shown in Figure 21.4, when escape was possible, a much less intense punishment was required to suppress the positively reinforced behavior; with no escape available, the punished behavior was much more persistent. Arranging for discriminable shock-free periods thus delineated as intermittent consequences of responding, Dinsmoor (1962) reported stable

Figure 21.4. Frequency of punished responding under escape versus no-escape conditions. From “Motivational Aspects of Escape From Punishment,” by N. H. Azrin, D. F. Hake, W. C. Holz, and R. R. Hutchinson, 1965, Journal of the Experimental Analysis of Behavior, 8, p. 34. Copyright 1965 by the Society for the Experimental Analysis of Behavior, Inc. Reprinted with permission.

maintenance of responding on a variety of variableinterval schedules. In a tour de force demonstrating the potency of reinforcement schedules based on clearly discriminable time-out periods as reinforcers, Kelleher and Morse (1964; also see Hineline, 1976) showed that when surrounding parameters (shock intensity, potency of food reinforcers, etc.) were appropriately adjusted, they were able to obtain identical positively reinforced and negatively reinforced behavior patterns on fixed-interval 10-minute and fixed-ratio 30 schedules of reinforcement that were presented in alternation. Furthermore, when chlorpromazine and d-amphetamine were administered (in separate experiments) with systematic variation of doses, the two drugs had differing effects on behavior maintained by the differing schedules, but similar effects on a given schedule irrespective of whether that schedule implemented positive reinforcement (food delivery) or negative reinforcement (termination of the stimulus that accompanied intermittent brief shocks). This finding runs counter to the pervasive practice of using hedonic terms to characterize psychoactive drugs. Baum (1973) studied behavioral allocation as a function of the frequency of discriminable time-out periods in a situation that entailed brief responseindependent shocks. The subjects in this experiment were pigeons in a chamber whose floor was divided into two platforms. Standing on one platform illuminated a red light; standing on the other yielded a green light. Depending on which platform a bird was standing on, one of two variable-interval timers could deliver a 2-minute shock-free period with the lights off. Systematic changes in the mean intervals generated by the two timers resulted in corresponding variations in the birds’ relative time spent on the two platforms—variations that on average corresponded to the matching law that is familiar in studies of positive reinforcement (e.g., Davison & McCarthy, 1988). Logue and DeVilliers (1978) also obtained results consistent with the matching law, using a procedure that required more active responding (than merely standing on a platform) that could cancel shocks that were scheduled to occur at various frequencies. Reduction in the duration of a situation can function as negative reinforcement, even when the reinforced behavior does not immediately produce 499

Hineline and Rosales-Ruiz

that reduction, as illustrated by Mellitz, Hineline, Whitehouse, and Laurence (1983). In their experiment, responding on either of two adjacent and concurrently equivalent levers could postpone shock in Sidman’s basic procedure. Each rat proved to have an initially preferred lever when performance had become stable but did not respond exclusively on that lever. With this accomplished, the Sidman procedure remained in effect as before, but in addition, each response on the nonpreferred lever subtracted 1 minute from the daily 120-minute session. This contingency was discontinued when there were only 2 minutes remaining in the session, whereas the Sidman procedure continued in effect until the end. Systematically over time, all five animals switched to a preponderance of responding on what had been their nonpreferred lever. For four of the five animals, this effect was validated by reversal procedures, repeatedly shifting the preferences from one lever to the other and back. The clear message here is that reduction of situation duration can function as negative reinforcement, and the consequence need not be contiguous with the occurrence of the affected response. However, distribution of events within a situation can affect the degree to which transitions into that situation will be reinforcing. Gardner and Lewis (1976) examined this feature while clarifying and enhancing the dissociation of reinforcement via shock postponement versus reinforcement via shock-frequency reduction. The Gardner and Lewis procedures entailed an imposed situation in which brief shocks occurred unpredictably on the average of twice per minute. The animal’s lever press in this situation initiated a 3-minute alternate situation that was accompanied by light and auditory click. The six shocks that would have been distributed irregularly throughout the 3 minutes if the response had not occurred were now closely packed, one per second beginning 10, 88, or 165 seconds into the 3-minute alternate period, for different groups of animals. As shown in Figure 21.5, both the 88- and the 165-second delays resulted in the animals spending substantial percentages of time in the alternate situation, whereas those with the 10-second delays spent most of their time in the imposed condition. In a second experiment, the imposed situation was the same as before, but the 3-minute alternate 500

Figure 21.5. Percentage of session time spent in the 3-minute response-produced alternate situations for subjects in each of three groups, whose responses yielded transitions from two-per-minute irregularly spaced shocks to six closely spaced shocks with 10 seconds, 88 seconds, and 165 seconds delay, respectively. From “Negative Reinforcement With Shock-Frequency Increase,” by E. T. Gardner and P. Lewis, 1976, Journal of the Experimental Analysis of Behavior, 25, p. 5. Copyright 1976 by the Society for the Experimental Analysis of Behavior, Inc. Reprinted with permission.

situations entailed more shocks than the six that would be received in a comparable period of the imposed situation. For one group, a train of 9 oneper-second shocks began 161 seconds into the imposed period. For another group, a train of 12 shocks occurred 158 seconds into that period, and for the third group, a train of 18 shocks began 152 seconds into that period. In each case, after the last shock in the alternate period, 10 seconds elapsed before a return to the imposed period. All three rats in each of the first two groups responded sufficiently to spend most of their time in the alternate situation, thus receiving nine or 12 postponed shocks during periods in which they could have received six evenly distributed shocks. Of the third group, only one of the three animals responded appreciably, producing 18 postponed instead of six evenly distributed shocks. In a follow-up experiment, Lewis, Gardner, and Hutton (1976) controlled for several extraneous features while arranging that in the alternate condition, the first shock occurred at the same time as would have been delivered by the imposed condition if the animal had not responded. Again, the animals responded so as to shift the

Behavior in Relation to Aversive Events

preponderance of shocks toward the end of the alternate period, despite there being no reductions in overall frequency of shocks. More conventionally, change of situation arranged as an intermittent but immediate behavioral consequence (as in typical schedules of positive reinforcement) has provided a flexible paradigm serving to reveal increasingly multifaceted aspects of negative reinforcement. Thus, Perone and Galizio (1987) arranged for rats’ responses on one lever to postpone shocks according to Sidman’s procedure, whereas responses on a second, concurrently effective lever could occasionally (via conventional variable-interval schedules) produce discriminable 2-minute periods of time out from the postponement procedure. Responding was robustly and independently maintained on both levers. Galizio and Perone (1987) then used this preparation as a baseline for studying the effects of psychoactive drugs. Chlordiazepoxide had little effect on shock-postponement responding at doses that enhanced responding on the time-out lever. In contrast, morphine depressed time-out responding at doses that either increased or had no effect on shock-postponement responding. Galizio, Robinson, and Ordronneau (1994) subsequently replicated the dissociative effect of morphine, using variable-ratio schedules in place of variable-interval schedules. Interpretations that appeal to general emotional or motivational states, which are commonly invoked to categorize psychoactive drugs, have little to say about such differences between negatively reinforced repertoires. Courtney and Perone (1992) also used a similar arrangement, but with response-produced deletion of noncontingent shocks in place of shock postponement. This arrangement enabled them to evaluate the relative contributions of shock frequency and response effort to the effectiveness of time out from an aversive situation. They found that in some circumstances, response effort required to avoid shocks contributed more to the aversiveness of a situation than was contributed by relative frequencies of the shocks themselves.

Avoidance of Time Out From Situations of Positive Reinforcement Just as time out from situations containing positive reinforcement can be a basis for punishment

contingencies, time out from positive reinforcement can also provide a basis for negative reinforcement, and thus prevention of loss of food situations can constitute avoidance. Baer (1962) provided an early demonstration of such an effect with a procedure in which young children could postpone the interruption of animated cartoons. His first procedure was analogous to escape avoidance in that any time the cartoon was off, it could be reinstated by a response. Additional responses could postpone a subsequent interruption of the cartoon. This procedure failed to establish postponement responding. In Baer’s second procedure, responses during the cartoon could accumulate time, as in Sidman’s (1962) escalator procedures. This version maintained responding that postponed the interruption of the cartoon. Baer (1960) found that preschool children’s responding on the time-accumulating procedure could also be maintained by preventing loss of access to conversation with a puppet. Baron and Kaufman (1966) implemented a somewhat similar procedure with college students as subjects, whereby money could be accumulated except during time-out periods that could be prevented by button pressing. Initial training entailed an escapeavoidance procedure supplemented by verbal instructions. Thereafter, a Sidman-type postponement contingency was implemented, yielding results similar to those Sidman reported with rats as subjects. Also using rats as subjects, D’Andrea (1971) arranged a procedure more closely analogous to the basic Sidman procedure in virtually all respects, with the to-be-postponed event being a 1-minute time out (lights off) from the situation in which food was delivered irrespective of behavior. Holding constant the periods separating time outs in the absence of responding, D’Andrea varied the amount by which a response could postpone the time out and obtained results analogous to those Sidman had obtained when manipulating the response–shock interval. Galbicka and Branch (1983) replicated the basic procedure, but with pigeons as subjects. The results were analogous provided that one took into account some elicitation of the birds’ pecking that occurred with transitions from time-out to time-in periods. Again with rats, Richardson and Baron (2008) extended the range of a similar procedure, 501

Hineline and Rosales-Ruiz

systematically varying the frequency with which the response-independent food was delivered during time in, and found that rates of postponement varied in a way indicating that time outs from high fooddelivery frequencies were more aversive, producing effects analogous to higher shock intensities previously studied with analogous procedures. More subtle features of negative reinforcement contingencies and their effects, originally demonstrated with electric shock as the to-be-avoided stimulus, have been replicated in experiments whereby responding could avoid or postpone time out from situations of positive reinforcement. For example, DeFulio and Hackenberg (2007) arranged for concurrent contingencies of food production and postponement of time outs from that contingency of food production. In addition, they supplied accompanying stimuli that were distinctively correlated with the passage of time within the R–time out (Sidman-type postponement) interval. When the stimuli and corresponding intervals were manipulated, the results were similar to those we described earlier, concerning discriminative stimuli superimposed on shock-postponement procedures. That is, in general, with the addition of such stimuli, avoidance responding shifted toward the situation closest to the to-be-avoided event (e.g., G. E. Field & Boren, 1963; Sidman, 1955), and when the relative efficiency of responses was varied differentially, responding gravitated toward the situations in which it was most efficient (Krasnegor et al., 1971; Sidman & Boren, 1957a, 1957b). Also using pigeons as subjects, Pietras and Hackenberg (2000) played postponement-of-time-out contingencies off against the resulting frequencies of food delivery that would result from the time outs, in a manner similar to the much earlier experiments by Hineline (1970), Gardner and Lewis (1976), and Lewis et al. (1976), who had played shock postponement off against shock frequency. The background condition of Pietras and Hackenberg’s discrete-trial procedure was a random-interval schedule of response-produced food, effective throughout 125-second trials except during timeout periods. Each trial began with a 5-second opportunity during which a bird’s peck on a separate response key could postpone by 45 seconds an 502

immediately impending 60-second time-out period to the latter half of the 125-second trial but could not eliminate it. The birds reliably postponed those time-out periods. Similar responding was maintained in a second experiment when the postponement was accomplished at the cost of the postponed time out being longer than the immediate time out that would occur if the bird had not responded on the postponement key. Additional manipulations of the length of postponement intervals verified this to be a robust effect, and potential artifacts, including adventitious positive reinforcement of the postponement response, were ruled out.

Summarizing Principles A reader new to the topics and the approach represented in this chapter may well have found the novel and complex procedures and relationships to be confusing. To mitigate this, Hineline (1984a) proposed a set of summarizing principles that have remained useful even as the procedures and research questions have become still more complex: 1. Negative reinforcement is to be understood in terms of transitions between situations as well as by postponement or prevention of events within a situation. In some cases, a continuously present aversive stimulus defines such a situation; in other cases, the situations will be partly defined by additional, delineating stimuli and by operant contingencies that are in effect only during the situation. 2. Relative aversiveness of a situation (the degree to which transitions away from it will reinforce behavior) depends only partly on primary aversive stimuli that occur within the situation. Even when those stimuli do contribute to aversiveness, a relevant feature is the relation between their short-term versus longer term distributions over time. 3. Relative aversiveness of a situation depends substantially on contingencies (work requirements) within that situation, in comparison to contingencies in alternative situations. 4. Most important, the role of the alternative situation or situations depends on contingencies regarding change of situation—that is, on what

Behavior in Relation to Aversive Events

is involved in getting from one situation to the other. 5. All things being equal, performance tends to allow persistence of the situation closer to primary aversive events. (p. 505) This rubric encompasses the classical procedures that were derived from and addressed to traditional avoidance theory. It also provides a systematic way to approach the additional procedures and phenomena that have been described here, which are clearly relevant to behavior termed avoidance and about which traditional avoidance theories are silent. Applied and translational research involving negative reinforcement. Even the most basic, conceptually driven behavior-analytic research will often have direct practical implications. Thus, Iwata (1987) identified several ways in which negative reinforcement is inextricably involved in practical situations. He distinguished among (a) undesirable behavior acquired and maintained through negative reinforcement, (b) treatment of negatively reinforced behavior, and (c) negative reinforcement as therapy. Examples in the first category included tantrums, aggressive behavior, and selfinjurious behavior maintained by the termination of instructional demands. Promising examples of the second category entailed differential reinforcement of alternative, even incompatible behavior that could be maintained by the same consequence and thus still be negatively reinforced. Termination of moderately intense tones were included in the third category, providing negative reinforcement of behaviors such as improved posture, taking pills at appropriate times, and eliminating nocturnal enuresis. In recent years, negative reinforcement, as in escape from demands, has become a standard component of functional analyses (e.g., Iwata, Dorsen, Slifer, Jauman, & Richman, 1994; Repp & Horner, 1999) and descriptive analyses (e.g., Pence, Roscoe, Bourret, & Ahearn, 2009), both of which are now in common use for identifying the sources of problematic behavior. When such behavior has thus been identified as being maintained by negative reinforcement, as in a child’s tantrums yielding escape from academic tasks (e.g. Iwata et al., 1994), an effective strategy uses the same negative reinforcer to shape

and maintain alternative, more acceptable behavior (such as asking for help). That strategy has not been common practice when addressing problematic behavior of pets, however. Instead, aggressive and other problematic behavior is more likely to be addressed with reprimands and other attempts at punishment, which are for the most part ineffective as well as stressful (Merck Veterinary Manual, 2006; Schilder & van der Borg, 2004). A notable, somewhat exotic alternative was suggested by Pryor (1999), who provided an anecdotal account of her work with timid llamas; more recently, Snider (2007) provided a systematic, data-based report of a similar technique with aggressive dogs. The dogs all had long-standing repertoires of problematic aggressive behavior. The behavior of five of the six was dramatically changed within minutes, or at most a few hours, of intervention; the sixth served as an unplanned control, whereby household circumstances resulted in extended exposure to the baseline, preintervention condition. The technique is predicated on the assumption that the dogs’ aggressive behavior is maintained by the termination of threat—from an unfamiliar person, a specific person, another dog, and so forth— characterized as a decoy in the treatment. A threshold distance is empirically identified as the maximal distance that yields signs of aggressive behavior when the decoy approaches the potential aggressor. Then, in repeated trials, the decoy is advanced steadily from a safe distance to just short of the threshold distance, pausing there until the dog exhibits relaxed or other distinctly nonaggressive behavior. At that point, the decoy quietly moves back to the safe area. With repeated trials, the alternative behavior, having thus been negatively reinforced, comes to occur more quickly and reliably, at which point the decoy is advanced slightly closer. Again, only when the specified alternative behavior occurs does the decoy return to a safe area. Once the relaxed, nonaggressive behavior occurs quickly and reliably with the decoy in close proximity, positive reinforcement is brought into play. Although the initial intervention was consistently and quickly effective, it was also necessary to train for generalization, repeating it in various settings (except in a case in which the aggressive behavior was very 503

Hineline and Rosales-Ruiz

situation specific) or with various decoys, except when the target of the aggressive behavior was a particular person. Superficially, the whole arrangement could be mistaken for one based on habituation or systematic desensitization, but as Snider (2007) carefully delineated, crucial details are different. Thus, the procedure entails differential reinforcement and shaping of incompatible behavior, combined with stimulus fading and initially using negative instead of positive reinforcement. A possible source of its effectiveness is the resemblance of this procedure to the fading procedures known as errorless discrimination techniques, which have previously been based only on positive reinforcement (Terrace, 1963; see also Etzel, 1997). Translational research—research that explicitly builds on basic research to address issues of practical and social importance—is a current priority of the national scientific agenda. The best of such work advances both the applied and the basic agendas; it is work with clear practical relevance that informs a basic conceptual issue. Bruzek, Thompson, and Peters (2009) used termination of the (recorded) sounds of infant crying in a study that nicely illustrates that strategy in the domain of negative reinforcement. The basic issue concerned the phenomenon of resurgence—the recurrence of a previously, but not currently, reinforced response during extinction of a more recently reinforced response. Bruzek et al.’s practical issue was the behavior of adults in caring for young children. The experimental arrangement entailed a life-sized baby doll, various toys, and audio recordings of an infant’s crying, which could be played by means of an audio speaker under the crib. With no special instructions, the undergraduate subjects readily acquired first one and then another response—engaging the baby doll with each of two toys—when one or the other response could terminate or prevent the recorded cries. After extinguishing both of those responses, a third response was reinforced and then extinguished; during that final phase, the initially reinforced response recurred. In general, the results indicated that a response with a longer history of reinforcement was more likely to resurge than a more recently but briefly reinforced response. As the authors pointed out, this finding has implications not only for relatively benign methods for dealing with an infant’s crying, but also 504

for more problematic behavior such as rough handling that is known to occur in such situations. Studies of humans’ reactions to potential money loss have provided another paradigm for translational research based on negative reinforcement. For example, Magoon and Critchfield (2008) used the generalized matching law (Baum, 1974), which has proved to be broadly applicable in basic behavioranalytic research (e.g., Davison & McCarthy, 1988), as a basis for assessing the degree of asymmetry between positive and negative reinforcement. When quantitatively equated positive and negative reinforcement contingencies of intermittent reinforcement were concurrently in effect, no consistent bias was evident. This arrangement, which provided tangible consequences for actual choices, contrasts with studies that merely ask people what they would do in hypothetical situations. Whether the lack of bias commonly reported in these studies hinges on the difference between actual versus hypothetical choices is an important matter for further study. Building on experiments by Baron and Galizio (1976) that had demonstrated reinforcement of observing responses—responses maintained by the production of discriminative stimuli—that functioned analogously in both positive and negative reinforcement situations, Galizio (1979) used avoidance of monetary loss as the basis for studying the origins of instructions as discriminative stimuli. Although, on one hand, accurate instructions were seen to facilitate a person’s initially behaving in concert with the contingencies of loss prevention, subsequent exposure to consequences of following inaccurate instructions resulted in the instructions being ignored. Still later exposure to accurate instructions resulted in presentation of the instructional stimuli themselves having the effects of reinforcers for observing responses. This sequence of experiments documented a role of reinforcement history in determining the ways in which and the degrees to which instruction following is maintained.

Indirect Origins of Stimulus Relations in Avoidance We have identified various functions of stimuli in situations that involve negative reinforcement— discriminative, as related to contingencies; delineating,

Behavior in Relation to Aversive Events

as affecting aggregation of dispersed events or defining the boundaries of distinct situations; and negatively reinforcing, in various time relations. Appropriately, those functions have been demonstrated as being generated directly by their placement within those situations, and indeed, that genesis has often been the focus of basic research. In applied settings, however, one often does not have sufficient access to an individual’s history to identify the origins of particular stimulus functions, and a particular stimulus with an evident role in aversively maintained behavior may not even have a plausible basis for participating in a history that produced that function. In such cases, ad hoc appeals to stimulus generalization do not satisfy. What is needed is a principle whereby purely arbitrary stimulus relations can arise and whereby one stimulus can stand in place of another despite their lacking any physical resemblance—a scientifically grounded basis for symbolic relations. That grounding was initially provided by Sidman and his colleagues in their experimental demonstrations of equivalence relations that entail symmetrical and transitive relations between arbitrarily chosen stimuli (e.g., Sidman, 1986, 1994; Sidman et al., 1982; Sidman, Kirk, & Willson-Morris, 1985). Augustson and Dougher (1997) then built on this conception by showing that the function of a warning stimulus in an avoidance procedure could transfer to a physically dissimilar, arbitrarily chosen stimulus solely by virtue of independently established equivalence relations. Equivalent or symbolic relations, whereby one entity stands in place of another, are transparent to the extent that all interrelated stimuli are interchangeable (see Chapter 16, this volume). These can be viewed as special cases of arbitrary relational responding, conceptualized as higher order operant classes (Catania, 2007, pp. 120, 155–158) or in terms of relational frame theory (e.g., Dymond & Rehfeldt, 2000; Hayes, 1991; Steele & Hayes, 1991). Most of these sets of arbitrary relations might be characterized as making up structured rather than transparent networks. Thus, the network of relations characterized as different from entails symmetry (if B is different from A, then A is different from B) but not transitivity (if B is different from A, and C

is different from B, C may or may not be different from A). Intermediate to these are opposites (if B is the opposite of A, and C is the opposite of B, then C = A). Using human subjects, Dymond, Roche, Forsyth, Whelan, and Rhoden (2007, 2008) built on this conception to demonstrate avoidance based on such relations with aversive images and sounds as the to-be-avoided stimuli. Prior training that established sets of opposites, then, yielded the predicted transfers or nontransfers of avoidance reactions to other stimuli when avoidance was taught with respect to a particular member of the set. Dymond and Roche (2009) have proposed a pragmatic approach to clinical anxiety by using relational frame theory when devising strategies for the extinction of discriminative aspects of anxiety-producing stimuli. Summing Up: Controversies and Cautions Punishment and negative reinforcement contingencies abound in nature and in social communities and thus constitute important domains for study. In each domain, we have offered a rubric for systematic understanding. In one domain, our focus has been on the variables whereby punishment is effective in reducing behavior, including immediacy and consistency, intensity, availability of alternative unpunished repertoires, and the separate variables tending to maintain the punished behavior. In the other domain, our focus has been on the varied time scales whereby negative reinforcement is operative: shortterm postponement of, immediate but sometimes intermittent escape from, and reduction in frequency of aversive events. Sometimes two or more of these occur in alternative or nested relationships whereby situational change is the best way to characterize the consequence of behavior. Further complicating matters, the aversive event can be time out from positive reinforcement. In both domains, our basic understanding translates readily into practical applications, especially with close attention to discriminative effects. Despite this understanding and the effective technology that it yields, some controversies endure. Behavior analysis, in general, has sometimes 505

Hineline and Rosales-Ruiz

encountered resistance arising from its distinctive prose patterns (Hineline, 1980, 1990) and its emphasis on interpretive concepts anchored directly in behavior–environment interactions rather than in dispositional characteristics of the behaving organism (D. P. Field & Hineline, 2008; Hineline, 1984b, 1992; Hineline & Wanchisen, 1989). In addition, as we noted earlier, the topics of this chapter have been the focus of their own particular controversies. Less parochial and more volatile controversies continue to swirl around punishment—the term, with its various meanings, and the practices that are taken as corresponding to it but that do not always match up with those meanings. On one hand, in many contexts and forums, aversive control is not viewed as controversial. In discussions at the philosophical and cultural levels that focus on what is deemed good or bad for the individual and society, there seems to be agreement that mild and temporary pain, loss of ability, loss of freedom, or loss of pleasure are acceptable if justly applied. Unjust procedures, besides “cruel and unusual,” would include procedures that are thought to humiliate, dehumanize, or inflict extreme pain (TASH, 2011). Less mildly, as we noted earlier, justice is understood as retribution—a sanitized term for vengeance, which can be more baldly stated (while biblically justified) as “an eye for an eye and a tooth for a tooth.” Punishment as explicitly and contingently applied seems to be controversial in ways that punishment as haphazard practice may not be. Thus, whether it is implemented by design appears to be important, which is ironic, for implemented as procedures explicitly based on behavioral principles, punishment is defined on the basis of its producing a decrease in behavior, not on its characteristics related to pain or trauma. Still, it is mainly the characteristics of the punishing consequence, rather than the contingencies and effects of that consequence, that gain the most attention and generate the most heat in the controversies, with opponents viewing the latter as ends that do not justify the means. For them, the consequent stimuli, irrespective of the effects of their use, are considered by some to be in violation of human rights or to conflict with values or moral standards (Singer, Gert, & Koegel, 1999; Van Houten et al., 1988). 506

From a scientific perspective, which we have attempted to represent here, one could wish that these controversies could be put aside, recognizing the processes of punishment and negative reinforcement as important to both humans’ and other organisms’ adaptiveness. Those who study those processes take no pleasure in exposing experimental subjects to pain, discomfort, or even inconvenience; instead, they recognize behavior in relation to aversive events as inevitably involved in people’s lives, making those processes important to understand, that they might be minimized when alternative techniques are not feasible and used effectively when that use is deemed important and appropriate. Nevertheless, scientific study does legitimize some concerns that are embedded in these controversies. An example concerns people’s understanding of the behavior of punishing as well as of the behavior targeted for punishment, and the importance of discerning whether the former is a vengeful action rather than an action predicated on a decrease in the targeted behavior. First, basic research with both human and nonhuman subjects has shown that aggressive behavior can be a by-product of extinction (e.g. Flory, Smith, & Ellis, 1977; Kelly & Hake, 1970) or of intermittent positive reinforcement (May & Kennedy, 2009). That is, when an individual with a history of consistently reinforced behavior encounters a period or situation of nonreinforcement, that individual has an increased likelihood of attacking whatever or whoever is available— conspecifics or even inanimate objects. In addition, aggressive behavior can also be generated by painful stimulation (Hutchinson & Emley, 1977), by injury suffered at the hands of another person, or even by circumstances in which a team of people is working to prevent monetary loss (Emurian, Emurian, & Brady, 1985; Hake & Campbell, 1972). One might argue that humans have transcended this sort of thing, but the nasty little fact crops up when, for example, a computer or a soft-drink machine suddenly ceases to yield its usual reinforcers or someone violates a person’s home, injures a person’s friend, or steals a person’s property. Second, the circumstances in which a punishment procedure might be proposed concern situations in which an individual’s behavior is aversive to

Behavior in Relation to Aversive Events

other people or situations in which reinforcementbased interventions have been unsuccessful—that is, the behavior of caregivers, teachers, and perhaps consultants is under extinction. It follows, then, that punishment procedures should be more tightly regulated than other procedures to ensure that punishment is being implemented purely as a best feasible alternative for decreasing the behavior of concern and does not constitute vengeful or aggressive behavior on the part of those implementing the procedure. This illustrates a feature of behavior analysis that is often overlooked by those seeing it from other viewpoints as well as occasionally by behavior analysts themselves: Behavior analysts properly view the principles that they study as applicable to their own behavior even as they address the behavior of others.

Azrin, N. H., & Holz, W. C. (1966). Punishment. In W. K. Honig (Ed.), Operant behavior: Areas of research and application (pp. 380–447). Englewood Cliffs, NJ: Prentice-Hall.

References

Baer, D. M. (1962). A technique of social reinforcement for the study of child behavior: Behavior avoiding reinforcement withdrawal. Child Development, 33, 847–858.

Ader, R., & Tatum, R. (1961). Free-operant avoidance conditioning in human subjects. Journal of the Experimental Analysis of Behavior, 4, 275–276. doi:10.1901/jeab.1961.4-275 Anger, D. (1963). The role of temporal discriminations in the reinforcement of Sidman avoidance behavior. Journal of the Experimental Analysis of Behavior, 6(3, Pt. 2, Suppl.), 477–506. doi:10.1901/jeab.1963.6-s477

Azrin, N. H., Holz, W. C., & Hake, D. F. (1963). Fixedratio punishment. Journal of the Experimental Analysis of Behavior, 6, 141–148. doi:10.1901/ jeab.1963.6-141 Badia, P., Coker, C., & Harsh, J. (1973). Choice of higher density signaled shock over lower density unsignalled shock. Journal of the Experimental Analysis of Behavior, 20, 47–55. doi:10.1901/jeab.1973.20-47 Badia, P., Culberson, S., & Harsh, J. (1973). Choice of longer or stronger signaled shock over shorter or weaker unsignalled shock. Journal of the Experimental Analysis of Behavior, 19, 25–32. doi:10.1901/jeab. 1973.19-25 Baer, D. M. (1960). Escape and avoidance response of preschool children to two schedules of reinforcement withdrawal. Journal of the Experimental Analysis of Behavior, 3, 155–159. doi:10.1901/jeab.1960.3-155

Baer, D. M. (2001). A small matter of proof. In W. T. O’Donohue, D. A. Henderson, S. C. Hayes, J. E. Fisher, & L. J. Hayes (Eds.), A history of the behavioral therapies: Founders’ personal histories (pp. 253–265). Reno, NV: Context Press.

Arbuckle, J. L., & Lattal, A. (1987). A role for negative reinforcement of response omission in punishment? Journal of the Experimental Analysis of Behavior, 48, 407–416. doi:10.1901/jeab.1987.48-407

Baron, A. (1991). Avoidance and punishment. In I. H. Iversen & K. A. Lattal (Eds.), Experimental analysis of behavior: Part 1. Techniques in the behavioral and neural sciences (pp. 173–217). Amsterdam, the Netherlands: Elsevier Science.

Augustson, E. M., & Dougher, M. J. (1997). The transfer of avoidance evoking functions through stimulus equivalence classes. Journal of Behavior Therapy and Experimental Psychiatry, 28, 181–191. doi:10.1016/ S0005-7916(97)00008-6

Baron, A., DeWaard, R. J., & Lipson, J. (1977). Increased reinforcement when time-out from avoidance includes access to a safe place. Journal of the Experimental Analysis of Behavior, 27, 479–494. doi:10.1901/jeab.1977.27-479

Axelrod, S., & Apsche, J. (Eds.). (1983). The effects of punishment on human behavior. New York, NY: Academic Press.

Baron, A., & Galizio, M. (1976). Clock control of human performance on avoidance and fixed-interval schedules. Journal of the Experimental Analysis of Behavior, 26, 165–180. doi:10.1901/jeab.1976.26-165

Azrin, N. H. (1956). Some effects of two intermittent schedules of immediate and non-immediate punishment. Journal of Psychology, 42, 3–21. doi:10.1080/00 223980.1956.9713020 Azrin, N. H. (1960). Effects of punishment intensity during variable interval reinforcement. Journal of the Experimental Analysis of Behavior, 3, 123–142. doi:10.1901/jeab.1960.3-123 Azrin, N. H., Hake, D. F., Holz, W. C., & Hutchinson, R. R. (1965). Motivational aspects of escape from punishment. Journal of the Experimental Analysis of Behavior, 8, 31–44. doi:10.1901/jeab.1965.8-31

Baron, A., & Kaufman, A. (1966). Human, free-operant avoidance of “time out” from monetary reinforcement. Journal of the Experimental Analysis of Behavior, 9, 557–565. doi:10.1901/jeab.1966.9-557 Baum, W. M. (1973). Time allocation and negative reinforcement. Journal of the Experimental Analysis of Behavior, 20, 313–322. doi:10.1901/jeab.1973.20-313 Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242. doi:10.1901/jeab.1974.22-231 507

Hineline and Rosales-Ruiz

Behrend, E. R., & Bitterman, M. E. (1963). Sidman avoidance in the fish. Journal of the Experimental Analysis of Behavior, 6, 47–52. doi:10.1901/jeab.1963.6-47 Black, A. H., & Morse, P. (1961). Avoidance learning in dogs without a warning stimulus. Journal of the Experimental Analysis of Behavior, 4, 17–23. doi:10.1901/jeab.1961.4-17 Bolles, R. C. (1967). Theory of motivation. New York, NY: Harper & Row. Bolles, R. C. (1970). Species-specific defense reactions and avoidance learning. Psychological Review, 77, 32–48. doi:10.1037/h0028589 Bolles, R. C. (1973). The avoidance learning problem. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 6, pp. 97–145). New York, NY: Academic Press. Bostow, D. E., & Bailey, J. B. (1969). Modification of severe disruptive and aggressive behavior using brief time-out and reinforcement procedures. Journal of Applied Behavior Analysis, 2, 31–37. doi:10.1901/ jaba.1969.2-31 Bruzek, J. L., Thompson, R. H., & Peters, L. C. (2009). Resurgence of infant caregiving responses. Journal of the Experimental Analysis of Behavior, 92, 327–343. doi:10.1901/jeab.2009-92-327 Camp, D. S., Raymond, G. A., & Church, R. M. (1967). Temporal relationship between response and punishment. Journal of Experimental Psychology, 74, 114–123. doi:10.1037/h0024518 Catania, A. C. (2007). Learning (Interim [4th] ed.). Cornwall-on-Hudson, NY: Sloan. Clark, F. C. (1961). Avoidance conditioning in the chimpanzee. Journal of the Experimental Analysis of Behavior, 4, 393–395. doi:10.1901/jeab.1961.4-393 Clark, H. B., Rowbury, T., Baer, A. M., & Baer, D. M. (1973). Time-out as a punishing stimulus in continuous and intermittent schedules. Journal of Applied Behavior Analysis, 6, 443–455. doi:10.1901/ jaba.1973.6-443 Cooper, J. O., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis. Columbus, OH: Pearson. Courtney, K., & Perone, M. (1992). Reductions in shock frequency and response effort as factors in reinforcement by time-out from avoidance. Journal of the Experimental Analysis of Behavior, 58, 485–496. doi:10.1901/jeab.1992.58-485 Crosbie, J. (1998). Negative reinforcement and punishment. In K. A. Lattal & M. Perone (Eds.), Handbook of research methods in human operant behavior (pp. 163–189). New York, NY: Plenum Press. D’Andrea, T. (1971). Avoidance of time-out from response-independent reinforcement. Journal of the Experimental Analysis of Behavior, 15, 319–325. doi:10.1901/jeab.1971.15-319 508

Davison, M., & McCarthy, D. (1988). The matching law: A research review. Hillsdale, NJ: Erlbaum. DeFulio, A., & Hackenberg, T. D. (2007). Discriminated time-out avoidance in pigeons: The roles of added stimuli. Journal of the Experimental Analysis of Behavior, 88, 51–71. doi:10.1901/jeab.2007.59-06 Dinsmoor, J. A. (1954). Punishment: I. The avoidance hypothesis. Psychological Review, 61, 34–46. doi:10.1037/h0062725 Dinsmoor, J. A. (1962). Variable-interval escape from stimuli accompanied by shocks. Journal of the Experimental Analysis of Behavior, 5, 41–47. doi:10.1901/jeab.1962.5-41 Dinsmoor, J. A. (1977). Escape, avoidance, punishment: Where do we stand? Journal of the Experimental Analysis of Behavior, 28, 83–95. doi:10.1901/jeab. 1977.28-83 Dinsmoor, J. A. (1998). Punishment. In W. O’Donohue (Ed.), Learning and behavior therapy (pp. 188–204). Boston, MA: Allyn & Bacon. Dinsmoor, J. A. (2001). Stimuli inevitably generated by behavior that avoids electric shock are inherently reinforcing. Journal of the Experimental Analysis of Behavior, 75, 311–333. doi:10.1901/jeab.2001.75-311 Dinsmoor, J. A., & Sears, G. W. (1973). Control of avoidance by a response-produced stimulus. Learning and Motivation, 4, 284–293. doi:10.1016/0023-9690 (73)90018-0 Dinsmoor, J. A., & Winograd, E. (1958). Shock intensity in variable-interval escape schedules. Journal of the Experimental Analysis of Behavior, 1, 145–148. doi:10.1901/jeab.1958.1-145 Doughty, S. S., Anderson, C. M., Doughty, A. H., Williams, D. C., & Saunders, K. J. (2007). Discriminative control of punished stereotyped behavior in humans. Journal of the Experimental Analysis of Behavior, 87, 325–336. doi:10.1901/jeab.2007.39-05 Dymond, S., & Rehfeldt, R. (2000). Understanding complex behavior: The transformation of stimulus functions. Behavior Analyst, 23, 239–254. Dymond, S., & Roche, B. (2009). A contemporary behavior analysis of anxiety and avoidance. Behavior Analyst, 32, 7–27. Dymond, S., Roche, B., Forsyth, J. P., Whelan, R., & Rhoden, J. (2007). Transformation of avoidance response functions in accordance with same and opposite relational frames. Journal of the Experimental Analysis of Behavior, 88, 249–262. doi:10.1901/ jeab.2007.22-07 Dymond, S., Roche, B., Forsyth, J. P., Whelan, R., & Rhoden, J. (2008). Derived avoidance learning: Transformation of avoidance response functions in accordance with same and opposite relational frames. Psychological Record, 58, 269–286.

Behavior in Relation to Aversive Events

Emurian, H. H., Emurian, C. S., & Brady, J. V. (1985). Positive and negative reinforcement effects on behavior in a three-person microsociety. Journal of the Experimental Analysis of Behavior, 44, 157–174. doi:10.1901/jeab.1985.44-157

Galizio, M., & Perone, M. (1987). Variable-interval schedules of time-out from avoidance: Effects of chlordiazepoxide, CGS 8216, morphine, and naltrexone. Journal of the Experimental Analysis of Behavior, 47, 115–126. doi:10.1901/jeab.1987.47-115

Estes, W. K. (1969). Outline of a theory of punishment. In B. A. Campbell & R. M. Church (Eds.), Punishment and aversive behavior (pp. 57–82). New York, NY: Appleton-Century-Crofts.

Galizio, M., Robinson, E. G., & Ordronneau, C. (1994). Opioid drugs and time-out from avoidance. Behavioural Pharmacology, 5, 125–130. doi:10.1097/ 00008877-199404000-00003

Etzel, B. (1997). Environmental approaches to the development of conceptual behavior. In D. M. Baer & E. M. Pinkston (Eds.), Environment and behavior (pp. 52–79). Boulder, CO: Westview Press.

Gardner, E. T., & Lewis, P. (1976). Negative reinforcement with shock-frequency increase. Journal of the Experimental Analysis of Behavior, 25, 3–14. doi:10.1901/jeab.1976.25-3

Fantino, E. (1973). Aversive control. In J. A. Nevin & G. S. Reynolds (Eds.), The study of behavior: Learning, motivation, emotion, and instinct (pp. 239–279). Glenview, IL: Scott, Foresman.

Hackenberg, T. D., & Defulio, A. (2007). Time-out from reinforcement: Restoring the balance between analysis and application. Revista Mexicana de Análisis de la Conducta, 33, 37–44.

Field, D. P., & Hineline, P. N. (2008). Dispositioning and the obscured roles of time in psychological explanation. Behavior and Philosophy, 36, 5–69.

Hake, D. F., & Campbell, R. L. (1972). Characteristics and response-displacement effects of shock-generated responding during negative reinforcement procedures: Pre-shock responding and post-shock aggressive responding. Journal of the Experimental Analysis of Behavior, 17, 303–323. doi:10.1901/jeab.1972.17-303

Field, G. E., & Boren, J. J. (1963). An adjusting avoidance procedure with multiple auditory and visual warning stimuli. Journal of the Experimental Analysis of Behavior, 6, 537–543. doi:10.1901/jeab.1963.6-537 Flory, R. K., Smith, E. L., & Ellis, B. B. (1977). The effects of two response-elimination procedures on reinforced and induced aggression. Journal of the Experimental Analysis of Behavior, 27, 5–15. doi:10.1901/jeab. 1977.27-5 Forgione, A. G. (1970). The elimination of interfering response patterns in lever-press avoidance situations. Journal of the Experimental Analysis of Behavior, 13, 51–56. doi:10.1901/jeab.1970.13-51 Foxx, R. M., & Shapiro, S. T. (1978). The time-out ribbon: A nonexclusionary time-out procedure. Journal of Applied Behavior Analysis, 11, 125–136. doi:10.1901/jaba.1978.11-125

Hayes, S. C. (1991). A relational control theory of stimulus equivalence. In L. J. Hayes & P. N. Chase (Eds.), Dialogues on verbal behavior: The First International Institute on Verbal Relations (pp. 19–40). Reno, NV: Context Press. Herman, R. L., & Azrin, N. H. (1964). Punishment by noise in an alternative response situation. Journal of the Experimental Analysis of Behavior, 7, 185–188. doi:10.1901/jeab.1964.7-185 Herrnstein, R. J. (1969). Method and theory in the study of avoidance. Psychological Review, 76, 49–69. doi:10.1037/h0026786 Herrnstein, R. J., & Hineline, P. N. (1966). Negative reinforcement as shock-frequency reduction. Journal of the Experimental Analysis of Behavior, 9, 421–430. doi:10.1901/jeab.1966.9-421

Galbicka, G., & Branch, M. N. (1981). Selective punishment of interresponse times. Journal of the Experimental Analysis of Behavior, 35, 311–322. doi:10.1901/ jeab.1981.35-311

Hineline, P. N. (1970). Negative reinforcement without shock reduction. Journal of the Experimental Analysis of Behavior, 14, 259–268. doi:10.1901/jeab.1970.14-259

Galbicka, G., & Branch, M. N. (1983). Stimulus-food relations and free-operant postponement of time-out. Journal of the Experimental Analysis of Behavior, 40, 153–163. doi:10.1901/jeab.1983.40-153

Hineline, P. N. (1976). Negative reinforcement and avoidance. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 364–414). Englewood Cliffs: Prentice-Hall.

Galbicka, G., & Platt, J. R. (1984). Interresponse-time punishment: A basis for shock-maintained behavior. Journal of the Experimental Analysis of Behavior, 41, 291–308. doi:10.1901/jeab.1984.41-291

Hineline, P. N. (1980). The language of behavior analysis: Its community, its function, and its limitations. Behaviorism, 8, 67–86.

Galizio, M. (1979). Contingency-shaped and rulegoverned behavior: Instructional control of human loss avoidance. Journal of the Experimental Analysis of Behavior, 31, 53–70. doi:10.1901/jeab.1979.31-53

Hineline, P. N. (1981). The several roles of stimuli in negative reinforcement. In P. Harzem & M. D. Zeiler (Eds.), Advances in analysis of behavior: Vol. 2. Predictability, correlation, and contiguity (pp. 203–246). Chichester, England: Wiley. 509

Hineline and Rosales-Ruiz

Hineline, P. N. (1984a). Aversive control: A separate domain? Journal of the Experimental Analysis of Behavior, 42, 495–509. doi:10.1901/jeab.1984.42-495 Hineline, P. N. (1984b). What, then, is Skinner’s operationism? [Commentary]. Behavioral and Brain Sciences, 7, 560. Hineline, P. N. (1990). The origins of environment-based psychological theory. Journal of the Experimental Analysis of Behavior, 53, 305–320. doi:10.1901/jeab. 1990.53-305

Experimental Analysis of Behavior, 10, 461–465. doi:10.1901/jeab.1967.10-461 Kelleher, R. T., & Morse, W. H. (1964). Escape behavior and punishment behavior. Federation Proceedings, 23, 808–817. Kelly, J. F., & Hake, D. F. (1970). An extinction-induced increase in an aggressive response with humans. Journal of the Experimental Analysis of Behavior, 14, 153–164. doi:10.1901/jeab.1970.14-153

Hineline, P. N. (1992). A self-interpretive behavior analysis. American Psychologist, 47, 1274–1286. doi:10.1037/0003-066X.47.11.1274

Kern, L., Koegel, R. L., & Dunlap, G. (Eds.). (1996). Positive behavioral support: Including people with difficult behavior in the community. Baltimore, MD: Brookes.

Hineline, P. N., & Wanchisen, B. A. (1989). Correlated hypothesizing, and the distinction between contingency-shaped and rule-governed behavior. In S. C. Hayes (Ed.), Rule-governed behavior: Cognition, contingencies, and instructional control (pp. 221–268). New York, NY: Plenum Press.

Krasnegor, N. A., Brady, J. V., & Findley, J. D. (1971). Second-order optional avoidance as a function of fixed-ratio requirements. Journal of the Experimental Analysis of Behavior, 15, 181–187. doi:10.1901/ jeab.1971.15-181

Holz, W. C., & Azrin, N. H. (1961). Discriminative properties of punishment. Journal of the Experimental Analysis of Behavior, 4, 225–232. doi:10.1901/jeab. 1961.4-225 Hutchinson, R. R. (1977). By-products of aversive control. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 415–431). Englewood Cliffs, NJ: Prentice Hall. Hutchinson, R. R., & Emley, G. S. (1977). Electric shock produced drinking in the squirrel monkey. Journal of the Experimental Analysis of Behavior, 28, 1–12. doi:10.1901/jeab.1977.28-1 Iwata, B. A. (1987). Negative reinforcement in applied behavior analysis: An emerging technology. Journal of Applied Behavior Analysis, 20, 361–378. doi:10.1901/jaba.1987.20-361 Iwata, B. A., Dorsen, M. F., Slifer, K. J., Jauman, K. E., & Richman, G. S. (1994). Toward a functional analysis of self-injury. Journal of Applied Behavior Analysis, 27, 197–209. doi:10.1901/jaba.1994.27-197

Laraway, S., Snycersky, S., Michael, J., & Poling, A. (2003). Motivating operations and terms to describe them: Some further refinements. Journal of Applied Behavior Analysis, 36, 407–414. doi:10.1901/jaba. 2003.36-407 Lerman, D. C., Iwata, B. A., Shore, B. A., & DeLeon, I. G. (1997). Effects of intermittent punishment on self-injurious behavior: An evaluation of schedule thinning. Journal of Applied Behavior Analysis, 30, 187–201. doi:10.1901/jaba.1997.30-187 Lerman, D. C., & Vorndran, C. M. (2002). On the status of knowledge for using punishment: Implications for treating behavior disorders. Journal of Applied Behavior Analysis, 35, 431–464. doi:10.1901/jaba. 2002.35-431 Lewis, P., Gardner, E. T., & Hutton, L. (1976). Integrated delays to shock as negative reinforcement. Journal of the Experimental Analysis of Behavior, 26, 379–386. doi:10.1901/jeab.1976.26-379

Johnston, J. M. (1972). Punishment of human behavior. American Psychologist, 27, 1033–1054.

Logue, A. W., & De Villiers, P. A. (1978). Matching in concurrent variable-interval avoidance schedules. Journal of the Experimental Analysis of Behavior, 29, 61–66. doi:10.1901/jeab.1978.29-61

Kamin, K. J., Brimer, C. J., & Black, A. H. (1963). Conditioned suppression as a monitor of fear of the CS in the course of avoidance training. Journal of Comparative and Physiological Psychology, 56, 497–501. doi:10.1037/h0047966

Magoon, M. A., & Critchfield, T. S. (2008). Concurrent schedules of positive and negative reinforcement: Differential-impact and differential-outcomes hypotheses. Journal of the Experimental Analysis of Behavior, 90, 1–22. doi:10.1901/jeab.2008.90-1

Kazdin, A. E. (1972). Response cost: The removal of conditioned reinforcers for therapeutic change. Behavior Therapy, 3, 533–546. doi:10.1016/S00057894(72)80001-7

Matson, J. L., & DiLorenzo, T. M. (1984). Punishment and its alternatives: New perspectives for behavior modification. New York, NY: Springer.

Kazdin, A. E. (2003). Behavior modification in applied settings. Belmont, CA: Wadsworth. Keehn, J. D. (1967). Bar-holding with negative reinforcement: Preparatory or perseverative? Journal of the 510

May, M. E., & Kennedy, C. H. (2009). Aggression as positive reinforcement in mice under various ratioand time-based reinforcement schedules. Journal of the Experimental Analysis of Behavior, 91, 185–196. doi:10.1901/jeab.2009.91-185

Behavior in Relation to Aversive Events

McIntyre, L. L., Gresham, F. M., DiGennaro, F. D., & Reed, D. D. (2007). Treatment integrity of schoolbased interventions with children in the Journal of Applied Behavior Analysis 1991–2005. Journal of Applied Behavior Analysis, 40, 659–672. McKenzie, S. D., Smith, R. G., Simmons, J. N., & Soderlund, M. J. (2008). Using a stimulus correlated with reprimands to suppress automatically maintained eye poking. Journal of Applied Behavior Analysis, 41, 255–259. doi:10.1901/jaba.2008.41-255 Mellitz, M., Hineline, P. N., Whitehouse, W. G., & Laurence, M. T. (1983). Duration-reduction of avoidance sessions as negative reinforcement. Journal of the Experimental Analysis of Behavior, 40, 57–67. doi:10.1901/jeab.1983.40-57

frequency. Journal of the Experimental Analysis of Behavior, 74, 147–164. doi:10.1901/jeab.2000.74-147 Plummer, S., Baer, D. M., & Leblanc, J. (1977). Functional considerations in the use of procedural time-out and an effective alternative. Journal of Applied Behavior Analysis, 10, 689–705. doi:10.1901/ jaba.1977.10-689 Pryor, K. (1999). Don’t shoot the dog! The new art of teaching and training (Rev. ed.). New York, NY: Bantam Books. Repp, A. C., & Horner, R. H. (1999). Functional analysis of problem behavior. Belmont, CA: Wadsworth.

Merck veterinary manual. (9th ed.). (2006). Whitehouse Station, NJ: Marial Ltd.

Richardson, J. V., & Baron, A. (2008). Avoidance of time-out from response-independent food: Effects of delivery rate and quality. Journal of the Experimental Analysis of Behavior, 89, 169–181. doi:10.1901/jeab. 2008.89-169

Michael, J. (1975). Positive and negative reinforcement, a distinction that is no longer necessary; or a better way to talk about bad things. Behaviorism, 3, 33–44.

Salzinger, K. (1991). Definitions and usage, or a rose by any other name smells as sweet. Behavior Analyst, 14, 213.

Michael, J. (1982). Distinguishing between discriminative and motivating functions of stimuli. Journal of the Experimental Analysis of Behavior, 37, 149–155. doi:10.1901/jeab.1982.37-149

Schilder, M. B. H., & van der Borg, J. A. M. (2004). Training dogs with the help of the shock collar: Short and long term behavioural effects. Applied Animal Behaviour Science, 85, 319–334. doi:10.1016/j.applanim.2003. 10.004

Michael, J. (1993). Establishing operations. Behavior Analyst, 16, 191–206. Miltenberger, R. G. (2001). Behavior modification: Principles and procedures (2nd ed.). Belmont, CA: Wadsworth. Newsom, C., Favell, J. E., & Rincover, A. (1983). Side effects of punishment. In S. Axelrod & J. Apsche (Eds.), The effects of punishment on human behavior (pp. 285–316). New York, NY: Academic Press. Pence, S. T., Roscoe, E. M., Bourret, J. C., & Ahearn, W. H. (2009). Relative contributions of three descriptive methods: Implications for behavioral assessment. Journal of Applied Behavior Analysis, 42, 425–446. doi:10.1901/jaba.2009.42-425

Seligman, M. E. P., & Csikszentmihalyi, M. (2000). Positive psychology: An introduction. American Psychologist, 55, 5–14. doi:10.1037/0003-066X.55.1.5 Sidman, M. (1953a). Avoidance conditioning with brief shock and no exteroceptive warning signal. Science, 118, 157–158. doi:10.1126/science.118.3058.157 Sidman, M. (1953b). Two temporal parameters in the maintenance of avoidance behavior by the white rat. Journal of Comparative and Physiological Psychology, 46, 253–261. doi:10.1037/h0060730 Sidman, M. (1955). Some properties of the warning stimulus in avoidance behavior. Journal of Comparative and Physiological Psychology, 48, 444–450. doi:10.1037/ h0047481

Perone, M., & Galizio, M. (1987). Variable-interval schedules of time-out from avoidance. Journal of the Experimental Analysis of Behavior, 47, 97–113. doi:10.1901/jeab.1987.47-97

Sidman, M. (1962). Reduction of shock frequency as reinforcement for avoidance behavior. Journal of the Experimental Analysis of Behavior, 5, 247–257. doi:10.1901/jeab.1962.5-247

Perone, M., Galizio, M., & Baron, A. (1988). The relevance of animal-based principles in the laboratory study of human operant conditioning. In G. Davey & C. Cullen (Eds.), Human operant conditioning and behavior modification (pp. 59–85). New York, NY: Wiley.

Sidman, M. (1986). Functional analysis of emergent verbal classes. In T. Thompson & M. D. Zeiler (Eds.), Analysis and integration of behavioral units (pp. 213–245). Hillsdale, NJ: Erlbaum. Sidman, M. (1994). Equivalence relations and behavior: A research story. Boston, MA: Authors Cooperative.

Piazza, C. C., Hanley, G. P., & Fisher, W. W. (1996). Functional analysis and treatment of cigarette pica. Journal of Applied Behavior Analysis, 29, 437–450. doi:10.1901/jaba.1996.29-437

Sidman, M. (2006). The distinction between positive and negative reinforcement: Some additional considerations. Behavior Analyst, 29, 135–139.

Pietras, C. J., & Hackenberg, T. D. (2000). Time-out postponement without increased reinforcement

Sidman, M., & Boren, J. J. (1957a). A comparison of two types of warning stimulus in an avoidance situation. 511

Hineline and Rosales-Ruiz

Journal of Comparative and Physiological Psychology, 50, 282–287. doi:10.1037/h0046474

Starin, S. (1991). “Nonaversive” behavior management: A misnomer. Behavior Analyst, 14, 207–209.

Sidman, M., & Boren, J. J. (1957b). The relative aversiveness of signal and shock in an avoidance situation. Journal of Abnormal and Social Psychology, 55, 339–344. doi:10.1037/h0043237

Steele, D., & Hayes, S. C. (1991). Stimulus equivalence and arbitrarily applicable relational responding. Journal of the Experimental Analysis of Behavior, 56, 519–555. doi:10.1901/jeab.1991.56-519

Sidman, M., Kirk, B., & Willson-Morris, M. (1985). Six-member stimulus classes generated by conditional-discrimination procedures. Journal of the Experimental Analysis of Behavior, 43, 21–42. doi:10.1901/jeab.1985.43-21

Stokes, T. F., & Baer, D. M. (1977). An implicit technology of generalization. Journal of Applied Behavior Analysis, 10, 349–367. doi:10.1901/jaba.1977.10-349

Sidman, M., Rauzin, R., Lazar, R., Cunningham, S., Tailby, W., & Carrigan, P. (1982). A search for symmetry in the conditional discriminations of rhesus monkeys, baboons, and children. Journal of the Experimental Analysis of Behavior, 37, 23–44. doi:10.1901/jeab.1982.37-23 Singer, G. H., Gert, B., & Koegel, R. I. (1999). A moral framework for analyzing the controversy over aversive behavioral interventions for people with severe mental retardation. Journal of Positive Behavior Interventions, 1, 88–100. doi:10.1177/109830079900100203

TASH. (2011). TASH letter to Massachusetts Department of Developmental Services. Retrieved from http://tash. org/tash-letter-to-massachusetts-department-ofdevelopmental-services Terrace, H. S. (1963). Discrimination learning with and without “errors.” Journal of the Experimental Analysis of Behavior, 6, 1–27. doi:10.1901/jeab.1963.6-1 Thorndike, E. L. (1932). Fundamentals of learning. New York, NY: Columbia University, Teachers College. doi:10.1037/10976-000

Skinner, B. F. (1953). Science and human behavior. New York, NY: Macmillan.

Ulrich, R. E., Dulaney, S., Kucera, T., & Colasacco, A. (1972). Side effects of aversive control. In R. M. Gilbert & J. D. Keehn (Eds.), Schedule effects: Drugs, drinking, and aggression (pp. 203–242). Toronto, Ontario, Canada: University of Toronto Press.

Snider, K. S. (2007). A constructional canine aggression treatment: Using a negative reinforcement shaping procedure with dogs in home and community settings. Unpublished master’s thesis, University of North Texas, Denton.

Van Houten, R., Axerold, S., Bailey, J. S., Favell, J. E., Foxx, R., Iwata, B., & Lovaas, O. I. (1988). The right to effective behavioral treatment. Journal of Applied Behavior Analysis, 21, 381–384. doi:10.1901/ jaba.1988.21-381

Solnick, J. V., Rincover, A., & Peterson, C. R. (1977). Some determinants of the reinforcing and punishing effects of time-out. Journal of Applied Behavior Analysis, 10, 415–424. doi:10.1901/jaba.1977.10-415

Winograd, E. (1965). Escape behavior under different fixed ratios and shock intensities. Journal of the Experimental Analysis of Behavior, 8, 117–124. doi:10.1901/jeab.1965.8-117

Spradlin, J. E. (2002). Punishment: A primary process? Journal of Applied Behavior Analysis, 35, 475–477. doi:10.1901/jaba.2002.35-475

Yates, C. (1991). A response to nonaversive behavior management and “default” technologies. Behavior Analyst, 14, 217–218.

512

Chapter 22

Operant Variability Allen Neuringer and Greg Jensen

During almost the entirety of a documentary film featuring Pablo Picasso (Clouzot, 1956), the camera is focused on the rear of a large glass screen that serves as a canvas. As viewers watch, each paint stroke appears and a theme emerges, only to be transformed in surprising ways. In many of the paintings, what started out as the subject is modified, sometimes many times. Erasures; new colors; alterations of size; and, indeed, the very subject of the painting flow into one another. Each painting can readily be identified as a Picasso, but the process seems to be filled with unplanned and unpredictable turns. Uncertainty and surprise characterize other arts as well. A fugue may be instantly recognizable as a composition by J. S. Bach, and yet the transitions within the fugue may astound the listener, even after many hearings. Leonard Bernstein wrote of the importance of establishing expectancies in musical compositions and then surprising the listener. Fiction writers describe their desires to complete novels to find out what the characters will do because the authors can be as uncertain about their creations as are their readers. Everyday behaviors may similarly be described in terms of generativity and unpredictability. Listen carefully to the meanderings of a conversation. Watch as an individual walks back and forth in his or her office, the tilt of the head, the slight changes in pace or direction. Monitor the seemingly unpredictable transitions in one’s daydreams or images. Science is generally thought to be a search for predictable relationship—if A then B—but throughout history, some have argued that to understand the

world, including mind and behavior, scientists must appreciate the reality of unpredictability. From Epicurus in the 3rd century BC, who hypothesized random swerves of atoms, to contemporary quantum physicists, some have posited that nature contains within it unpredictable aspects that cannot be explained by if-A-then-B causal relationships, no matter how complex those relationships might be. In this chapter, we discuss the unpredictability of behavior and focus on one aspect of it. When reinforcers are contingent on variability, or more precisely on a level of operant–response variability (with levels ranging from easily predictable responding to randomlike), the specified level will be generated and maintained. Stated differently, response unpredictability can be reinforced. Stated yet another way, variability is an operant dimension of behavior. Operant dimension implies a bidirectional relationship between behavior and reinforcer. Responses influence (or cause) the reinforcers, and reinforcers influence (or cause) reoccurrence of the responses. The same bidirectional relationship is sometimes true of response dimensions as well. For example, when food pellets are contingent on rats’ lever presses, a minimum force must be exerted in a particular direction at a particular location. Force, direction, and location are response dimensions that are involved in the control of reinforcers and come to be controlled by the reinforcers. Variability is related to reinforcement in the same way. We refer to this capacity as the operant nature of variability or by the shorthand operant variability.

DOI: 10.1037/13937-022 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

513

Neuringer and Jensen

The very idea of operant variability is surprising to many and, at first blush, seems counterintuitive. Does variability not indicate noise? How can noise be reinforced? In fact, does reinforcement not constrain and organize responses—by definition—and is that definition not confirmed by observation? As we show, the answers are sometimes yes, but not always. Operant variability provides an important exception, one that may be a factor in the emission of voluntary operant responses generally. The chapter is organized broadly as follows. We discuss ■■

■■

■■

■■

■■

Experimental evidence showing that reinforcers and discriminative stimuli control behavioral variability; Relationships between reinforcement of variability and other influences; Explanations: When variability is reinforced, what in fact is being reinforced? How operant variability applies in such areas as creativity, problem solving, and psychopathology; and How reinforced variability helps to explain the voluntary nature of operant behavior generally.

Reinforcement of Variability As a way to describe the phenomenon, we begin with descriptions of some of the methods that have been successfully used to reinforce variability.

Recency-Based Methods Imagine that a response earns a reinforcer only if it has not been emitted recently. Page and Neuringer (1985) applied this recency method, based on Schwartz (1980, 1988), to pigeons’ response sequences across two illuminated keys, left (L) and right (R). Each trial consisted of eight responses, yielding 256 (or 28) different possible patterns of L and R, for example, LLRLRRRR. In the initial variability-reinforcing (or VAR) phase of the experiment, a pattern was reinforced only if it had not occurred for some number of trials, referred to as the lag. A trial terminated with food only if the Throughout this chapter, we use stochastic and random interchangeably.

1

514

sequence of eight L and R responses in that trial differed from those in each of the previous 50 trials (as evaluated across a moving window). This contingency was referred to as lag 50. If the current sequence repeated any one (or more) of the previous 50, then a brief time out (darkening of all lights) resulted, and food was withheld. After food or time out, the keylights were again illuminated, and another trial initiated. Sequences during the first 50 trials of a session were checked against the trials at the end of the previous session, that is, performance was evaluated continuously across sessions. Approximately 25 sessions were provided under these VAR contingencies. Let us consider some possible outcomes. One would be that the birds stopped responding— responding extinguished—because the lag 50 requirement was too demanding. At the other end of the possibility spectrum, the birds cycled across at least 51 patterns and by so doing were reinforced 100% of the time, with each sequence being different from every one of the previous 50. Although unlikely for pigeons, one way to solve lag contingencies is to count in binary, with L = 0 and R = 1, then LLLLLLLL followed by LLLLLLLR, followed by LLLLLLRL, then LLLLLLRR, and so on. Lag procedures have also been used with human participants, and therefore such sophisticated counting behavior must be considered. A third possible result would be alternations between a few preferred sequences (these would not be reinforced because of their high frequencies but would fill the lag window) and an occasional “do something else” (leading to reinforcement). The fourth possibility would be that L and R responses were generated in randomlike fashion, or stochastically,1 as if the birds were flipping a coin. This last alternative best describes the results. One piece of evidence was that reinforcement occurred on approximately 70% of the trials (with the other 30% leading to time-outs; Page & Neuringer, 1985). The pigeons’ performances were compared with the results of a simulated model in which a computer-based random-number generator produced L and R responses under exactly the same

Operant Variability

reinforcement contingencies experienced by the pigeons. The simulation showed that the model was reinforced on 80% of trials because, by chance, response sequences were repeated within the window of the lag 50 approximately 20% of the time. Thus, the pigeons’ performances were similar to, although not quite as good as, that of a random response model. A second source of support for approximations to random generation was provided by statistical analyses of the sequences, namely by the U statistic (Page & Neuringer, 1985). That statistic is a measure of uncertainty or entropy and is calculated from the relative frequencies of a set of items using the equation U = −∑ n

RFi • log ( RFi ) log ( n )

(1)

.

Here, RFi refers to the relative frequency of element i, out of n total elements. As a convention, every RFi = 0.0 is considered to contribute a value of 0.0 to the sum, without an attempt to resolve log(RFi). When all elements occur with equal frequency, U is maximal with a value of 1.0; if any single element has a frequency of 1.0 (and all others are 0.0), then U is minimal with a value of 0.0. In the Page and Neuringer (1985) study, three levels of U value were analyzed at the end of each session. U value was calculated for the relative frequencies of L and R; for relative frequencies of dyads (namely LL, LR, RR, and RL); and for triads (e.g., LLL, LLR, LRL . . .). The birds’ U values were compared with those from the random model. As expected, the random model produced U values close to 1.0 at each level of analysis, and the pigeons’ U values also approached 1.0, although not quite as closely as the random model. Thus, in this case, the results can best be described as randomlike but discriminably different from a true random source. Rather than responding equiprobably, the birds demonstrated biases, for example, favoring one key over another or favoring repetition over switching. We return later to a more detailed discussion of whether operant responses can be generated stochastically when the reinforcement contingencies are more demanding.

If Page and Neuringer’s (1985) experiment had stopped at this point, there would be uncertainty as to why responses varied. The issue is tricky because it involves more than whether the lag procedure resulted in response variability (which it clearly did). Rather, was variability directly reinforced, or could the results be explained differently? Variability could have resulted from extrinsic sources (noise in the environment) or intrinsic sources (within the organism), or it could have been caused by experimental error, an insufficient flow of reinforcers, or any number of other things. To show that variability depended on the “if vary, then reinforce” contingency, a control procedure provided reinforcers after some eight-response trials and time outs after others, just as in the VAR condition, but these reinforcers were now unrelated to the pigeon’s sequence variations. Under this control, reinforcers and time outs were yoked to those delivered during the VAR phase. In other words, the yoke condition was identical to the VAR condition (eight responses per trial, and trials were followed by food or time out at exactly the same rates as during VAR), except that the pigeon received food and time outs whether or not the variability contingency had been met. Each pigeon’s terminal six sessions under the lag 50 VAR contingencies provided the trial-by-trial schedule for its reinforcements and time outs under the yoke condition. The yoke procedure produced large and consistent effects. Levels of variability fell rapidly and remained low under the yoke condition. U values that had approached the 1.0 of a random model in the VAR condition dropped to values closer to 0.50, indicating substantially more sequence repetition. Other statistics confirmed the increased repetitiveness and predictability of responding. These effects were replicated with an A-B-A-B design (VAR–yoke– VAR–yoke), yielding a conclusion that direct reinforcement of variability was responsible for the high variability (see Figure 22.1). Lag contingencies have been used with other species, including humans, monkeys, rats, budgerigars, and fish (see Neuringer, 2002, for a review). With rats, four-response trials across L and R levers are often used, in which case 16 (or 24) different sequences are possible. Here, too, high sequence variability is observed. In an example of a human 515

Neuringer and Jensen

response frequencies (rather than recency) provides a partial solution, and we describe it next.

Frequency-Based Methods

Figure 22.1. Three levels of the U statistic, an index of behavioral variability (U1 based on number of left [L] and right [R] responses; U2 on LL, LR, RR, and RL dyads; and U3 based on triads), during the lag 50 reinforcement-of-variability phases and yoke d-VR phases. VR = variable ratio; F = first session in each phase; L = last session in each phase. Adapted from “Variability Is an Operant,” by S. Page and A. Neuringer, 1985, Journal of Experimental Psychology: Animal Behavior Processes, 11, p. 445. Copyright 1985 by the American Psychological Association.

procedure, Stokes and Harrison (2002) presented on a computer screen a triangle consisting of one location at the top, two locations in the next row down, three in the third, and so on until the sixth row, which contained six locations. A trial involved moving from the top row to the bottom, thereby requiring five responses, with 32 possible patterns. These five-response sequences were reinforced under lag contingencies, and high levels of variability often resulted. In this procedure, however, as well as others with humans, some (albeit rarely observed) participants use a different strategy of cycling through a subset of sequences, such as a binary counting strategy. Another problem with lag procedures is that they never reinforce repetitions, and a random generator sometimes repeats (e.g., if 16 sequences are possible, then for a random generator, the probability of two identical back-to-back sequences is .0625). An alternative method that bases reinforcement on As such, this contingency is called a least frequent contingency.

2

516

In these procedures, reinforcement is contingent on low overall relative frequencies. As one example, rats’ responses were reinforced on the basis of fourresponse trials (across L and R levers), with, as we indicated, 16 different possible sequences (LLLL, LLLR, LLRL, LLRR, etc.; Denney & Neuringer, 1998). Frequencies of these sequences were updated throughout each session in 16 separate counters, or bins, and a sequence was reinforced only if its relative frequency—the number of times that it was emitted divided by the total number of sequences— was less than some designated threshold value. Reinforcement of low relative frequency sequences has the advantage of permitting occasional reinforcement of repetitions. This procedure has several technical aspects. For example, after each trial, all bins are multiplied by an exponent, for example, 0.95, which results in recent sequences having more weight than those emitted in the past, a kind of memory decay (see Denney & Neuringer, 1998, for details). The important point is that, as with lag, highly variable responding was generated. The procedure has been used in many experiments with yoke serving as the control (Neuringer, 2002). In an interesting variant of the procedure, Donald Blough (1966) reinforced variable interresponse times (IRTs). Blough’s goal was a difficult one, namely, to see whether pigeons could learn to behave like an emitter of atomic particles, the most random of physical phenomena (or, put another way, to respond as would a Geiger counter). For a random-in-time responder, the likelihood of a response is independent of whether a previous response had occurred recently. To accomplish this, Blough created a series of IRT bins, such that a random responder would be expected to have an equal number of IRTs in each bin. A moving window of 150 responses was analyzed in real time, with each response allocated into one of 16 bins, depending on its IRT. Blough then only reinforced an IRT falling in the bin with the lowest current relative frequency, that is, he reinforced only for the least frequent2 IRT

Operant Variability

in a given window. The procedure resulted in the pigeons learning to approximate the IRT distributions of a truly random generator, that is, to distribute pecks (with some minor exceptions resulting from double pecks) much as a Geiger counter would respond.

Statistical Feedback Methods If the goal is to test whether animals and people can respond truly randomly, then both recency and frequency methods have a potential weakness. As indicated earlier, systematic strategies, such as binary counting, can provide higher frequencies of reinforcement than responding stochastically. Although sometimes present in animals, this type of strategy is most commonly observed in human participants. Note that exploiting weaknesses in the variability-reinforcing contingencies—by responding in a systematic way that maximizes reinforcement—is not a sign of insensitivity to the schedule requirements. If anything, it is precisely the opposite. An additional problem is that all statistical tests of randomness (and therefore reinforcement contingencies based on randomlike responding) have certain blind spots that result in false positives. Therefore, researchers sought an alternative procedure, one that was better at detecting, and therefore not reinforcing, strategic or patterned responding. It seemed reasonable to hypothesize that if a reinforcement contingency was based on a multiplicity of different measures of variability, it might be less likely to reward exploitative strategies and potentially lead more reliably to approximations to random outputs, especially in human participants. Before we describe the relevant experiment, note that the attempt to reinforce randomness flies in the face of more than 50 years of research in which people were asked to generate random sequences (e.g., “Pretend you are flipping a coin”). The consistent conclusion from this large body of studies was that people do not respond randomly when so requested (Brugger, 1997), and indeed, some researchers concluded that people cannot respond randomly. (The literature on human randomness rarely references nonhuman animal studies.) This conclusion is fundamentally important because randomness implies absence of identifiable causes and independence from determination. Most psychologists assume that

all behaviors are strictly determined—by inheritance, experiences, stimuli, responses, and the like—and therefore that random behavior is not possible, certainly not when voluntarily attempted. However, none of the earlier studies tried to reinforce randomlike behavior directly. This approach was accomplished with a procedure that required students to enter tens of thousands of responses at a computer terminal (Neuringer, 1986). A trial consisted of 100 responses across two keys (which we refer to as 1 and 2) with feedback, based on common statistical tests of randomness, presented at the end of each trial. At first, satisfying one statistical test was reinforced, then two tests had to be satisfied, and then three, and so on—with graphical feedback showing each statistic relative to that expected from a true random source—until participants were passing 10 evaluations of random sequences. The challenge confronting the participants was even greater than just described. Participants were required to generate a distribution of statistical values that would be expected from a random source (Neuringer, 1986). To take a simple example, across many trials of 100 responses each, random generation of 1s and 2s shows a distribution of proportions of 1s and 2s. The most likely outcome would be approximately equal numbers of 1s and 2s, or 50% each, but some trials would occur in which, for example, there were 40% 1s and 60% 2s, or vice versa, and fewer trials of, say, 30% 1s (or 2s). The participants’ task, therefore, was not simply to match the average of a random distribution but more precisely to approximate the randomly generated distributions. This task required many weeks of training, but all participants learned to approximate the random model according to 10 simultaneously applied statistics, and some participants (but not all) were able to pass additional tests as well. How should this research be interpreted? Because so much training was necessary, it would seem that the ability to respond unpredictably is unnatural, but that would be a misinterpretation. If you, the reader, were to call out 100 instances of heads and tails and try to do so unpredictably, it is unlikely that an observer of your behaviors, even with access to sophisticated statistical analyses, 517

Neuringer and Jensen

could predict your responses with a high degree of accuracy. You can, without training, respond quite unpredictably. The requirement to pass 10 statistical tests, however, demands equivalence, over the long run, of instances, dyads, triads, and the like and absence of all biases. To use a rat’s operant response as an analogy, it is quite easy to train a rat to press a lever for food pellets. Indeed, that can often be accomplished in an hour-long session. To train a rat to respond precisely at some given force, or with precise interresponse intervals, may take weeks or months of training. In similar fashion, approximating true randomness is difficult to attain, but responding variably, and to large extent unpredictably, is readily achieved. In rats, for example, highly variable operant responding is obtained within a few sessions (McElroy & Neuringer, 1990).

Novelty-Based Methods To evaluate variability, each of the methods discussed to this point requires that a set of possibilities be explicitly defined—responses, sequences, paths, or times. Mathematical definitions of randomness, statistical analyses, and reinforcement contingencies depend on such specification. However, in many outside-of-lab situations, the set may not be known, and an alternative method allocates reinforcers for novel, or not previously emitted (e.g., within a session or ever) responses. Reinforcement of novel responses was first used by Pryor, Haag, and O’Reilly (1969) in research with porpoises. At the beginning of each session, Pryor et al. waited until they observed some behavior not previously emitted by the porpoise and then selected the new behavior for consistent reinforcement during the session. This procedure resulted in the porpoise emitting an unprecedented range of behaviors, including aerial flips, gliding with the tail out of the water, and “skidding” on the tank floor, some of which were as complex as responses normally produced by shaping techniques, and many of which were quite unlike anything seen in . . . any other porpoise. (p. 653) One problem with long-term use of this procedure, however, is that, over time, it becomes increasingly 518

difficult for the subject to produce never-before-seen behaviors and increasingly difficult for the observer to discriminate among the various behaviors being emitted. At least over the short run, however, reinforcing novel responses led to an exceedingly high level of unpredictable behaviors. The Pryor et al. (1969) study was followed by an analogous one with preschool children (Goetz & Baer, 1973). The children were rewarded for block constructions that differed from any that had previously been observed during the session. As training proceeded, the children built increasingly varied forms, including ones never before made by the child. Similar results were obtained with the drawing of color pictures as the target behavior (Holman, Goetz, & Baer, 1977). The evidence from many methods has therefore shown control over response variability by directly contingent reinforcers (see also Hachiga & Sakagami, 2010; Machado, 1989, 1992, 1997). Variability is highest when reinforcers follow high variability. In the next section, we show that reinforcers exert even more precise control than that: Levels of variability can be specified, levels that span the range from response repetitions (or stereotypy) to response unpredictability. As such, variability parallels other operant dimensions in which reinforcers influence exactly how fast to respond or when, with what force, or at which location.

Levels of Variability Other experiments in the Page and Neuringer (1985) article described earlier applied different lag values in different phases, from lag 1 (the current sequence of eight responses had to differ from the single previous sequence) to lag 50 (the current sequence had to differ from each of the previous 50 sequences). As the lag increased, requiring that sequences be withheld for an increasing number of trials, responses generally became increasingly unpredictable (as assessed by U values, number of different sequences per session, and other statistics; see also Machado, 1989). Frequency-based methods show similar control over levels. For example, Grunow and Neuringer (2002) used a different threshold reinforcement criterion with each of four groups of rats: one that required rats to distribute

Operant Variability

three-response sequences (across three different operanda) in a way that paralleled a random generator (high variability), another that required medium-high variability, another that required medium-low variability, and the last that permitted frequent repetitions. Levels of variability were again controlled by these specific requirements, as shown by the leftmost points in Figure 22.2 (the other points in the figure are discussed later). Several additional studies have demonstrated reinforcement control over precise levels of variability in pigeons (Neuringer, 1992) and people (G. Jensen, Miller, & Neuringer, 2006). Precisely controlled levels of behavioral (un)predictability can be observed in many natural situations. Variable behaviors are used to attract attention, as when male songbirds increase the variability of their songs in the presence of a receptive female (Catchpole & Slater, 1995). During play and games, animals and people modulate levels of

variability as a function of the reactions of their playmates. When entertaining a child, the actions of an adult sometimes include surprises, such as tickling, as well as repetitions, such as repeatedly bouncing a child on one’s lap, and the child’s reactions influence the action’s (un)predictability. Similarly, in conversations, the speaker is (often) sensitive to the reaction of the listener with variations in topic as well as in prosody, loudness, and speed. Unpredictability is particularly important in competitive situations. Consider the example of table tennis: When a skilled player (S) plays with a beginner (B), S will often return the ball in a way that B can easily predict, but as B becomes increasingly capable, S will vary ball placement and speed until a high level of unpredictability is (sometimes) manifest. Precise control of levels of unpredictability plays a substantial role in game theory, under the rubric of mixed strategies (see Glimcher, 2003; Smith, 1982). These examples are only a few of the commonplace variations in

Figure 22.2. U value as a function of reinforcement frequencies. Each line represents a different group: .037 = very high variability (var) required for reinforcement; .37 = very low variability required; .055 and .074 = intermediate levels required. CRF = continuous reinforcement, or reinforcement every time variability contingencies were met; VI 1 = variable-interval reinforcement for meeting variability contingencies no more than once per minute, on average; VI 5 = variableinterval reinforcement no more than once every 5 minutes. From “Learning to Vary and Varying to Learn,” by A. Grunow and A. Neuringer, 2002, Psychonomic Bulletin and Review, 9, p. 252. Copyright 2002 by the Psychonomic Society, Inc. Adapted with permission. 519

Neuringer and Jensen

levels of response (un)predictability that characterize many real-world operant behaviors, variations that are controlled by consequences. We discuss additional real-world applications later.

Orthogonal Dimensions As indicated in the introduction, reinforcement often depends on a combination of many aspects of a response. For example, a child may receive a reinforcer for saying “thank you” but only when the child (a) speaks slowly and (b) makes eye contact. Because responses can vary across many dimensions independently from one another, one can readily imagine circumstances in which it might be functional to vary some dimensions of behavior while keeping others highly predictable. A demonstration of the independent reinforcement of variability and predictability along independent dimensions was provided by Ross and Neuringer (2002). They instructed college students to earn points in a video game involving drawing rectangles on a computer screen. Three dimensions

of the rectangles were evaluated: area (the number of pixels enclosed by the rectangle), location (the position of its center point), and shape (its heightto-width ratio). To be reinforced, the rectangles had to vary along two of these dimensions while repeating along the third. The participants were told nothing about these criteria, and the only instructions were to gain points by drawing rectangles. Participants were randomly assigned to one of three groups, with rewards delivered in one group when the areas of the drawn rectangles were approximately the same, trial after trial, but locations and shapes varied. The other two groups had analogous contingencies, but for one, locations had to repeat, and for the other, shapes had to repeat. All participants learned to meet their respective three-part contingencies, varying and repeating as required (Figure 22.3). Thus, binary feedback—reinforcement or not—influenced variability and repetitions along three orthogonal dimensions and did so independently, thereby highlighting the precise, multifaceted way in which reinforcers control variability.

Figure 22.3. U values for each of three dimensions of rectangles drawn by participants in three separate groups. One group was required to repeat the areas of their rectangles while varying shapes and locations (left set of bars), a second group was required to repeat shape while varying areas and locations (middle set of bars), and a third group was required to repeat location while varying areas and shapes (right set of bars). Error bars indicate standard errors. From “Reinforcement of Variations and Repetitions Along Three Independent Response Dimensions,” by C. Ross and A. Neuringer, 2002, Behavioural Processes, 57, p. 206. Copyright 2002 by Elsevier B.V. Adapted with permission. 520

Operant Variability

As we show in the next section, reinforcers exert other simultaneous influences: They select the set or class of responses from which instances emerge and, simultaneously, the required level of variation.

Response Sets and Variations Whenever variability is reinforced, a set of appropriate responses is also strengthened. Reinforcers select the set from which variations emerge. Mook and Neuringer (1994) provided experimental evidence for this point. In the first phase, rats’ variable fourresponse sequences across L and R levers (lag schedule) were reinforced. In the second phase, only sequences that began with two right responses, RR, were reinforced. Thus, now only RRLL, RRLR, RRRL, and RRRR patterns were effective. In the first phase, all 16 possible sequences were emitted, whereas in the second phase, most sequences began with two right responses, RR. Thus, the reinforcement contingency generated behaviors that satisfied the appropriate set definition while simultaneously producing a required level of variability within that set. In another experimental example (Neuringer, Kornell, & Olufs, 2001), rats responded in chambers containing five operanda: left lever, right lever, left key, center key, and right key. In one phase, reinforcers were contingent on variations across only three of the operanda (left and right levers and center key), and the rats learned to respond variably across only those three. A binary event—reinforce or not—can function simultaneously to define a response class and levels of (un)predictability along multiple dimensions of the within-class instances. This result shows an extraordinary diversity of control by simple reinforcement operations.

Discriminative Stimuli Operant responses are generally influenced by discriminative stimuli, that is, cues that indicate reinforcer availability. If pigeon pecks are intermittently reinforced when the keylight is red but not when it is green, the birds learn to peck almost exclusively when the keylight is red. Discriminative stimuli control levels of variability as well. For example, Page and Neuringer (1985, Experiment 6) reinforced repetitions of a single sequence of key pecks, LRRLL, in the presence of blue keylights, whereas variable

sequences were reinforced in the presence of red keylights (lag schedule). Blue and red alternated after every 10 reinforcers under what is referred to as a multiple schedule (two different reinforcement contingencies presented successively, each correlated with a distinct stimulus). The birds learned to repeat in the presence of blue and to vary in the presence of red, and when the stimulus relationships were reversed, the birds varied in the presence of blue while repeating in the presence of red. In another experiment, rats learned to emit variable fourresponse sequences across L and R levers in the presence of one set of lights and tones and repeated a single pattern, LLRR, in the presence of different stimuli (Cohen, Neuringer, & Rhodes, 1990). In an even more stringent test by Denney and Neuringer (1998), rats’ variable sequences were reinforced in one stimulus, whereas in a yoke stimulus, reinforcers were delivered at exactly the same rate and distribution but independent of variability. The cues came to exert strong differential control, and when variability was required, the animals varied; when variability was not required but permitted in yoke, response sequences became more repetitive and predictable. These results indicate that an individual may behave in a habitual and predictable manner in one context, whereas in a different context, perhaps occurring only a few moments later, the same individual will respond unpredictably or in novel ways. The results further indicate (along with other VAR–yoke comparisons described earlier) that to engender highly variable behaviors, it may be necessary to reinforce variability explicitly rather than, as in laissez-faire environments, simply permit individuals the freedom to vary. To the extent that individual freedom depends on the possibility of variation, reinforcement plays an important role (a topic to which we return in the final sections of this chapter).

Endogenous Stimulus Control The discriminative stimuli described in the previous section were external to the organism and publicly observable. Another form of discriminative control depends on the interactions of an organism with a reinforcement schedule. An example of such endogenous stimulus control is seen in the pauses that follow reinforcers under fixed-interval schedules. 521

Neuringer and Jensen

The reinforcers serve as indicators that for some period of time, reinforcement is not possible. Hopson, Burt, and Neuringer (2002) showed that response– reinforcer relationships exert discriminative control over levels of variability as well (see also Neuringer, 2002). Rats’ responses were reinforced under a schedule in which two periods alternated, VAR and repetition (REP), but these periods were not cued by external stimuli (technically, a mixed schedule). In the VAR period, four-response sequences of L and R lever presses were reinforced if they met a threshold variability contingency; in the REP period, only repetitions of LLLL were reinforced. (Probabilities of reinforcement were equalized in the two periods by intermittent reinforcement of LLLL in REP.) After the schedule transitioned into the VAR component, responding began to vary within a few trials, and variations continued until the schedule transitioned into REP, with responding soon reverting to LLLL. These results indicate that the variability produced when reinforcement is withheld for short periods, as when a new response is being shaped, may partly be discriminatively controlled despite absence of external cues; that is, animals and people may learn when it is functional to vary, and some of the cues may come from response–outcome relationships. Noncontingent Effects Until this point, we have focused on contingencies that directly relate reinforcers to variability. All operant responses are also influenced by events that are not directly contingent on the responses, sometimes referred to as eliciting or inducing influences, respondents, or establishing operations. For example, levels of deprivation, injections of drugs, and ambient room temperature can all influence learning and maintenance of operant responses. Even noncontingent aspects of the reinforcement operation itself may have important effects, for example, attributes such as the quality and quantity of food (referring here to when these do not change as a function of behavior). Thus, to understand operant responding, including operant variability, these other influences must be considered. We turn to a discussion of effects of noncontingent events on operant variability. As we describe at the end of this section, 522

noncontingent influences often interact in important ways with variability-contingent reinforcers.

Random Events Many behavioral trajectories are initiated by the accidental confluence of the organism with one or more environmental happenings. Hurricanes, earthquakes, and wars change behaviors in ways that cannot readily be anticipated. Winning a lottery is a happier example. Another might be happening to sit next to a particular individual on a cross-country flight, which leads to a long-term romantic relationship (see Bandura, 1982; Taleb, 2007). These events, although randomly related to the individual’s behavior, have important long-term influences. Random events have been purposively used throughout history to guide behaviors, for example, throws of dice, randomly selected sticks, cards, bones, or organs. Today, a referee flips a coin at the beginning of a football game to decide which team can choose whether to kick the ball; a computer’s random-number generator assists scientists with avoiding biases in assigning subjects to experimental groups; and alleotoric events are used in modern art, music, and literature. The Dice Man by Rhinehart (1998) provides a fictional example of intentional use of random artifacts. The protagonist, bored with life, writes a number of possible actions on slips of paper and then periodically selects one blindly and behaves accordingly. These examples show that random events that are independent of an individual’s actions may be used to avoid biases, engender unlikely responses, and break out of behavioral ruts.

Evolved Responses Modes of unpredictable behavior have evolved that permit organisms to escape from or avoid predators or aggressors. These behaviors have been referred to as protean behaviors that are “sufficiently unsystematic in appearance to prevent a reactor predicting in detail the position or actions of the actor” (Driver & Humphries, 1988, p. 36). Examples include the random zigzags of butterflies, stickleback fish, rabbits, and antelopes when being attacked. One consequence of evolved protean behavior is that it interferes with a predator species’ evolving a response to a specific escape or avoidance pattern. In brief,

Operant Variability

protean behaviors demonstrate evolved randomlike responses to eliciting stimuli.

Schedules of Reinforcement and Expectancy Both in the laboratory and in the real world, it is common for responses to be intermittently (or occasionally) reinforced. Much operant conditioning research is devoted to documenting the effects of such schedules. To take one example, under a fixedratio schedule of reinforcement, a fixed number of responses (say, 30) is required to gain access to a pellet of food. After receipt of each pellet, it is impossible to obtain another immediately, because 30 additional responses are required. As was the case for the fixed-interval schedules mentioned earlier, pauses are generally observed after reinforcement, or lower rates of responding, as compared with later in the ratio, when access to reinforcement is possible. In addition to these effects on response rate, response variability is also found to change under similar reinforcement schedules. In the cases discussed here, variability plays no role in the contingency, that is, reinforcers do not depend on response variations. However, responding tends to become increasingly repetitive and predictable as an anticipated reinforcer is approached in time or number. This tendency was shown for variability across two levers when a fixed sequence of responses was the operant (Cherot, Jones, & Neuringer, 1996), for variability of lever-press durations also under ratio schedules (Gharib, Gade, & Roberts, 2004), and for variability of movements across a chamber space when access to a potential sexual reinforcer is approached (Atkins, Domjan, & Gutierrez, 1994). In each of these cases, variability is relatively high when reinforcers are distant with respect to effort, time, or space, and responding becomes more predictable as reinforcers are neared (see also Craig, 1918). These changes in response predictability are said to be induced by the schedule of reinforcement. Another variable shown to induce differences in response variability is reinforcement frequency. In general, response variability is high when reinforcers are infrequent and low under high-density reinforcement (Lee, Sturmey, & Fields, 2007). One

interpretation of these effects is that low expectation (or anticipation) of reinforcers induces variability (Gharib et al., 2004). Whatever the explanation, it is important to be able to identify whether variability is selected by reinforcers or pushed by states of the body (endogenous inducers) or environmental events, including noncontingent effects of reinforcers. Discriminating between selection and induction will facilitate the modification of variability when that is desirable.

Experience Thorndike (1911) and Guthrie and Horton (1946) described the responses of cats that had been confined in a puzzle box and who received food contingent on escape. Response topographies were highly variable at first but, over trials and rewards, became increasingly predictable and stereotyped. Antonitis (1951) studied nose pokes by rats along a long horizontal slit. When pokes produced access to food, location variability decreased across trials. Notterman and Mintz (1965) measured the force exerted by rats on a response lever and found that across training, force decreased, approaching the minimum level necessary to operate the lever, with force variability decreasing as well. Brener and Mitchell (1989) extended these results to the total energy expended by a rat in an operant conditioning chamber. A last example comes from Vogel and Annau (1973), who reinforced pecking three times on a left key and three times on a right key, in any order. Across sessions, a marked increase occurred in the predictability (stereotypy) of the pigeons’ patterns of response. A general consensus has therefore emerged: Variability of operant behavior decreases with experience. This conclusion, however, may apply mainly to situations in which every response or sequence leads to a reinforcing consequence and to situations in which high variability is not differentially reinforced.

Extinction After long-term experience in which responses produce reinforcers, suddenly withholding reinforcers— referred to as extinction of responding—increases variability. In the experiment by Antonitis (1951) noted earlier, after the rats were accustomed to 523

Neuringer and Jensen

producing food reinforcers by poking their noses anywhere along a horizontal opening, food was withheld, which caused an increase in location variability. Extinction-induced variability has been seen along many other response dimensions: location (Eckerman & Lanson, 1969), force (Notterman & Mintz, 1965), topography (Stokes, 1995), and number (Mechner, 1958). One contrary result is often cited, namely a study by Herrnstein (1961) in which variability of the location of pigeon pecks along a continuous strip was reported to decrease during a period of extinction. However, the extinction in that study followed a phase in which every response was reinforced (continuous reinforcement) in an A-B design (first a reinforcement phase, then extinction, without return to the first phase). Experience may therefore have confounded the results. In general, extinction causes variability to increase. The variations induced by extinction generally emerge from the class of responses established during original learning. For example, if lever pressing produced food pellets, a rat may vary the ways in which it presses when food is withheld, but much of the behavior will be directed toward the lever (e.g., Stokes, 1995). Neuringer et al. (2001) quantified the bounded nature of extinction-induced variability that was observed after rats had been rewarded for repeating a single sequence across two levers and a key: left lever, key, right lever (LKR), in that order. The top panel of Figure 22.4 shows the distribution of the relative frequencies of each of the possible sequences (proportions of occurrences) during the conditioning, or reinforcement, phases (filled circles) and during extinction (open circles). The LKR sequence was, of course, most frequent during the reinforcement phase, with other, somewhat similar sequences falling off in terms of their frequencies. The LKR sequence was also most frequent throughout the extinction phase—during which time response rates fell to low levels—with the two curves being quite similar. (Note that these curves show relative frequencies. Absolute rates of response were much lower during extinction than during the reinforcement phase.) Also shown at the bottom of the figure are the ratios of response proportions during the reinforcement and extinction phases (i.e., 524

Figure 22.4. The top graph shows the proportion (or probability) of occurrences of the threeresponse patterns shown along the x-axis during a period when a single sequence (left lever, key, right lever) was being reinforced (filled circles) and during a period of extinction, when reinforcers were withheld completely (open circles). The bottom graph shows the ratio of responding during extinction (EXT) to responding during reinforcement (REIN; i.e., the ratio of the two curves in the upper graph). Together, the graphs show that patterns of responding during extinction were similar to those during reinforcement, but high-frequency sequences decreased and low-frequency sequences increased during the extinction phase. Adapted from “Stability and Variability in Extinction,” by A. Neuringer, N. Kornell, and M. Olufs, 2001, Journal of Experimental Psychology: Animal Behavior Processes, 27, p. 89. Copyright 2001 by the American Psychological Association.

the ratio of the two curves in the upper graph). The take-home message is that the basic form of the behavior was maintained during extinction, and variability increased because of the generation of unusual or highly unlikely sequences (for related

Operant Variability

findings, see Bouton, 1994). Extinction was therefore characterized as resulting in a “combination of generally doing what worked before but occasionally doing something very different. . . . [This] may maximize the possibility of reinforcement from a previously bountiful source while providing necessary variations for new learning” (Neuringer et al., 2001, p. 79). Knowledge of these effects can be applied to one’s own behavior as well as to others. When in a rut, or unproductive or dissatisfied, avoiding those reinforcers that had been produced by habitual behaviors may help.

Interactions Noncontingent inducers often interact with variability-contingent reinforcers to control levels of response variability. Additional phases in the Grunow and Neuringer (2002) experiment described in the Levels of Variability section provide one example. In the first phase of that experiment, recall that high, medium-high, medium-low, and low levels of response-sequence variability were reinforced across groups of rats, resulting in different levels of response variability across the four groups. Two additional phases followed in which, although the four different variability criteria were unchanged, overall frequencies of reinforcement were systematically lowered by providing reinforcement only intermittently. In particular, a variable-interval (VI) schedule of reinforcement was superimposed on the variability contingency: first a VI 1 minute (such that food pellets were limited to an average of once per minute, with unpredictable gaps of time between food deliveries) and then VI 5 minute (limiting food pellets to no more than once, on average, every 5 minutes). Under the VI schedules, after an interval elapsed, the first trial to meet the variability contingency ended with a reinforcer. All other trials ended with a brief time out (whether or not the variability requirement had been satisfied). As reinforcement frequencies were lowered, response rates fell in all four groups and did so equally, that is, all groups responded much more slowly when varying sequences were reinforced on average once every 5 minutes than when they were reinforced each time that they met the contingencies. However, different results were obtained for

variability, indicated by the U values in Figure 22.2. The individual curves represent the four variability thresholds, and the x-axis represents frequencies of reinforcement. The four thresholds exerted primary control, that is, the groups differed in variability throughout the experiment. Effects of reinforcement frequency were more subtle and depended on the threshold requirements, an interaction effect. When the contingencies were lenient and low levels of variability sufficed for reinforcement, variability increased as reinforcement rates fell (from continuous reinforcement to VI 1 to VI 5). When the contingencies were demanding and high levels of variability were reinforced, the opposite occurred, that is, variability decreased with decreasing reinforcements. The intermediate groups showed intermediate effects. A similar interaction was obtained when delays were introduced between the end of a varying sequence and reinforcement (Wagner & Neuringer, 2006). Thus, when reinforcers are contingent on variability, the contingency exerts a strong—and often primary—effect, but that effect is modified by noncontingent influences, including reinforcement rates and delays. Levels of response variability depend on both contingent and noncontingent influences. Interactions between variability-contingent and variability-noncontingent reinforcement may help to explain effects seen outside of the lab. Repetitive behaviors are required for many workers (e.g., factory workers, mail carriers, fare collectors), but for others, variable (and unpredictable) behaviors are the norm (e.g., inventors, fashion designers, artists). Lowering pay or withholding positive feedback may affect behaviors differently in these two cases. Thus, to predict effects on behavioral variability, one must know both contingent and noncontingent relationships. Cherot et al. (1996) described a different interaction that may also help to illuminate real-world effects. In that experiment, repeated response sequences across two levers were reinforced in one group of rats (REP) and sequence variability was reinforced in another group (VAR). Not every sequence that met the VAR or REP contingency gained reinforcement, however; rather, a superordinate fixed-ratio 4 also had to be satisfied. That is, the REP group had to successfully repeat a sequence 525

Neuringer and Jensen

four times to get a single reinforcer, and the VAR group had to successfully vary a sequence the same number of times. For example, in the REP group, one animal may have emitted LLLR in the first trial after a reinforcer, but that correct sequence caused only a signal indicating correct. No food was given. If the next trial also contained LLLR, the correct signal was again provided. If the following trial was LLLL, then a brief time out was given. This process continued until the fourth correct LLLR sequence produced the signal plus a food pellet. Exactly the same procedure was in place for the VAR animals, except they were correct only when they met a lag variability contingency. As shown in Figure 22.5 (bottom), the main result was that the VAR animals responded much more variably overall than did the REP animals, again indicating primary control by the variability and repetition contingencies. However, as reinforcement was approached (i.e., as the last of the four successful sequences was neared), levels of variability decreased for both VAR and REP groups. Recall the expectancy-of-reinforcement effects described in the section Schedules of Reinforcement and Expectancy earlier in this chapter. In this case as well, variability decreased as reinforcers were approached, thereby facilitating correct responding in the REP group but interfering with it in the VAR group (Figure 22.5, top panel). Let us pause for a moment to consider this surprising finding. Despite the fact that variability was being reinforced in the VAR group, as the reinforcer was neared, the likelihood of varying decreased. It is important to note again that reinforcement of variability generated much higher levels of variation overall than did reinforcement of repetitions, a variability-contingent effect, but superimposed was an expectancy-inducing decrease in variability. Similar interactions may help to explain effects of reinforcers on other types of behavior, including creative behaviors, a topic that we discuss in the Applications section. Memorial and Stochastic Explanations We have described types of events that result in variable behaviors. Now, we examine two commonly 526

Figure 22.5. The top graph shows percentages of sequences that met variability (VAR) or repetition (REP) contingencies as a function of location within a fixedratio 4 (FR 4) schedule. The lines connect means for groups of rats, and the error bars indicate standard deviations. The lower graph shows U values, an index of sequence variability, for the two groups across the FR schedule. Adapted from “Reinforced Variability Decreases With Approach to Reinforcers,” by C. Cherot, A. Jones, and A. Neuringer, 1996, Journal of Experimental Psychology: Animal Behavior Processes, 22, p. 500. Copyright 1996 by the American Psychological Association.

discussed explanations of operant variability, namely memorial and stochastic processes. According to the memorial explanation, each response can be related to or predicted from prior stimuli or responses. According to the stochastic-generator hypothesis,

Operant Variability

individual responses are unpredictable because of the nature of a random process. That is, individual responses do not have identifiable individual causes, a hypothesis that many consider problematic. We consider each of these explanations and argue that the evidence for both of these hypotheses is good and therefore that behaving (more or less) unpredictably derives from multiple sources.

Memory-Based Variability Memory is a shorthand way to refer to the influence of past events that are separated in time from a current response. The term is not intended to connote conscious awareness (although that might be involved) but rather potentially identifiable influences (or causes). To the extent that memorial processes are responsible for variability generation, prediction of individual responses is possible, even when the overall output is variable; thus, each member of a variable sequence could be said to be determined by prior events. At the outset of this chapter, we indicated that under lag 50 schedules, in which the current response sequence must differ from each of the previous 50 sequences, responding was highly variable and, indeed, approached that expected of a stochastic generator. However, behaviors are often quite different under lag 1 or 2 schedules. In these cases, the current sequence must differ from only the previous one or two, and memory-based response strategies frequently emerge: Animals and people sometimes cycle repeatedly through two or three sequences, apparently basing a current response sequence on the just-emitted sequences. The cycling strategy produces reinforcement for every sequence, which is a better rate of return than responding in a stochastic-like manner.3 The advantage is, however, only conferred when the memory demands are within the subject’s capacity. In a demonstration of the latter point, Machado (1993) studied pigeons pecking L and R keys under a frequency-dependent variability contingency. Under this schedule, if the sequence is composed of just one response, then pecking the key that had been pecked least frequently in the past will be

reinforced. By alternating, LRLRLR . . . , every response is reinforced, and birds developed just such an alternation strategy. When the sequence consisted of two responses, the birds again developed memory-based sequences, for example, repeating RRLLRRLL. However, when the sequence was increased to three responses, such that reinforcement was given for responses in the least frequent three-response bin, the birds apparently could not develop the optimal fixed pattern of RRRLRLLL . . . but instead reverted to randomlike behavior (Machado, 1993, p. 103). Thus, a memory-based strategy was used when that was possible, but when the memory demands became too high, stochastic responding emerged. A similar pattern was seen with songs generated by a songbird, a budgerigar, under lag contingencies of reinforcement (Manabe, Staddon, & Cleaveland, 1997). Under lag 1, the birds tended to alternate between two songs; under lag 2, they cycled among three songs. When the lag was increased to 3, however, song diversity and variability increased appreciably. Thus, under recencyand frequency-based methods of variability reinforcement, variable responses are generated via memorial processes when possible, but reversion to stochastic-like emission is seen when memory requirements exceed the organism’s capacity.

Chaotic Responding Memory-based strategies can be used in other ways as well. For example, chaotic processes generate outputs that are so noisy that it is exceedingly difficult to distinguish them from stochastically generated ones. “Chaos [is] a technical term . . . refer[ring] to the irregular, unpredictable behavior of deterministic, nonlinear systems” (R. V. Jensen, 1987, p. 168). Chaotic behavior is both highly variable and precisely controlled by prior events (Hoyert, 1992; Mosekilde, Larsen, & Sterman, 1991; Townsend, 1992). A study by Neuringer and Voss (1993) asked whether people could learn to generate chaotic-like responses. They used one example of a chaotic function, the logistic difference function: R n = t • R n−1 • (1 − R n−1 ) .

(2)

For example, stochastic responding on two alternatives under a lag 1 contingency earns 50% reinforcement, whereas alternating earns 100%.

3

527

Neuringer and Jensen

Here, Rn refers to the nth iteration in a series, each R is a value between 0.0 and 1.0, and t is a constant between 1.0 and 4.0. The current value of the logistic difference function (Rn) is based on the previously generated value (Rn −1). The process begins with an arbitrary seed value for the R0 between 0 and 1, which is used to calculate R1. Apart from the initial seed value, the function is completely self-generated, with each value determined by the just-prior value, together with the constant parameters. Chaotic outputs have two identifying characteristics. First, given a constant value for t that approaches 4, for example, 3.98, the generated sequence approximates a random one, that is, it passes many tests for randomness. Outputs are noisy and apparently unpredictable. However, second, if the current value of Rn is plotted as a function of the just prior value, Rn− 1, a predictive structure can be identified. In the particular case of the logistic difference function, the form of this autocorrelated relationship is a parabola (different chaotic functions show different types of internal structures). Thus—and this is the identifying attribute of chaotic processes—a deterministic mathematical function can generate randomlike outputs with prediction of each value of the function possible given precise knowledge of parameters and prior values. The outputs are extremely noisy and, at the same time, identifiably determined. In the Neuringer and Voss (1993) study, college students were shown, after each response, the difference between their responses and that of the iterated logistic difference model. With training, the students became increasingly adept at responding in chaoticlike fashion—the students’ responses matched closely the iterations of the logistic function—and their autocorrelations increasingly approximated a parabola (Figure 22.6). Because each iteration in the logistic difference sequence is based on the prior output, Neuringer and Voss hypothesized that the human subjects also remembered prior responses. Put simply, the subjects may have learned (or memorized) a long series of “if the previous response was value A, then the current response must be value B” pairs, a suggestion made by Metzger (1994) and by Ward and West (1994). To test this memory hypothesis, Neuringer and Voss (2002) interposed pauses (IRTs) between each 528

Figure 22.6. Responses in trial n as a function of responses in trial n − 1 during the first 120 responses of the experiment (left column) and final 120 responses (right column). Each row of graphs represents data from a single participant. The drawn lines show the best-fitting parabolas. From “Approximating Chaotic Behavior,” by A. Neuringer and C. Voss, 1993, Psychological Science, 4, p. 115. Copyright 1993 by the Association for Psychological Science. Adapted with permission.

Operant Variability

response, that is, they slowed responding (see also Neuringer, 2002). As IRT durations increased, the difference between the subjects’ sequences and the model’s chaotic output increased, and the underlying parabolic structure was disrupted, providing evidence that the highly variable responding was memory based.

Stochastic Generation Stochastic generation has been hypothesized at numerous points in the chapter. Here we discuss in more detail what the stochastic hypothesis involves and possible ways to test it. The issue is complex, difficult, and important. If variable operant responses are generated stochastically, then it may not be possible to predict individual responses at greater than chance levels. Stochastic generation may also be relevant to operant responses generally and to explanations of their voluntary nature, as we discuss later. A researcher confronts many problems, however, in attempting to decide whether a particular response stream is random or not and confronts additional difficulties when trying to determine whether it has been generated by a random process (see Nickerson, 2002). To get an intuitive sense of what random implies, imagine an urn filled with 1,000 colored balls. The urn is well shaken, and one ball is blindly selected. After selection, the ball’s color is noted, the ball is returned to the urn, and the selection process is repeated. If the urn contains an equal number of blue and red balls, then prediction of each ball’s color will be no better than chance; that is, the probability of a correct prediction would be .50. The repeated selections represent a random process4 with the resulting output being a random sequence. Note that predictions can be better than 50% for random processes, as shown by the following: If the urn was filled with an uneven number of different colored balls, prediction could become increasingly accurate. For example, if the urn contained 900 red balls and 100 blue balls, then prediction accuracy would rise to .90 (if one always predicted red). However, the process and output are still referred to as stochastic. Thus, stochastic outputs are more or less predictable depending on the relative frequencies

of the items (the two colors, in our example). It is also true that the greater the number of different item classes, for example, different colors, the less predictable any given instance will be. If the urn contained equal numbers of 20 different colors, for example, then the chance level of prediction would be .05 (rather than .50 in the two-color case). Discussion of these concepts in historical context can be found in Gigerenzer et al. (1989). When trying to ascertain whether a finite sequence of outputs was randomly generated, the best one can do is to estimate the probability that a random process is involved. For example, if 100 selected balls were all blue, it would be unlikely but not impossible that the balls were selected randomly from an urn containing an equal number of red and blue balls. Given a random generating process, any subsequence of any length is possible, and every particular sequence of outcomes of a given length is exactly as likely as any other (see Lopes, 1982). These considerations indicate the impossibility of proving that a particular finite sequence deviates from random: The observed sequence may have been selected from an infinite random series (see Chaitin, 1975). However, the probability of 100 blue balls is extremely low in our example, and the probability is much higher for sequences that contain approximately 50% red and 50% blue. Thus, one can evaluate the probability that a given output was generated by a stochastic process having particular parameters. A second problem is that as demonstrated by chaos theory, seemingly random outputs may be generated by nonrandom processes. Another example is given by iteration of the digits of pi. Someone could memorize the first 100 digits of pi and use those to generate a randomlike sequence. Thus, behavioral outputs can be highly variable but (given sufficient knowledge by an observer) predictable. How, then, can one test whether highly variable operant responses derive from a stochastic process? The test must involve a comparison—which of two hypotheses is most likely to account for the data?— and the most likely alternative is the one already discussed, namely, memory-based processes under which each response can be predicted from

Specifically, this process is “random with replacement” on account of the act of returning the ball to the urn. All discussions of randomness in this chapter refer to this type of randomness.

4

529

Neuringer and Jensen

knowledge of prior stimuli or responses. A powerful tool for making this comparison is memory interference, that is, the degrading of control by prior events. Availability of memory-interfering procedures leads to the following reasoning: When approximation to a stochastic output is reinforced, if a memoryinterfering event degrades performance, it provides evidence against a stochastic generation process. Absence of memory-interfering effects provides evidence consistent with stochastic generation. We have already seen evidence for stochastic-like responding when demands on memory were high (Machado, 1993; Manabe et al., 1997) and turn next to additional tests of the stochastic hypothesis. Neuringer (1991) compared the effects of memory interference on responding by two groups of rats. One group obtained reinforcers by repeating a single pattern, LLRR. Once that pattern was well learned, blackouts were introduced between each response, the durations ranging from 0.1 second to 20 seconds across different phases of the experiment. Responses were ineffective during the blackout periods. As blackout durations increased, errors increased and reinforcement rates fell. Neuringer hypothesized that the interposed blackouts degraded performance because each response in the LLRR

sequence depended in part on memory for the justprior response. Effects of the same blackouts were assessed in a second group of rats that obtained reinforcers for varying four-response sequences under lag contingencies. Neuringer (1991) reasoned that if variable responses were generated by a memory-based process, then performances would be degraded as blackout durations increased, as was the case for the LLRR group. In fact, performances by the variability group actually improved with increasing blackout durations, resulting in higher rates of reinforcement. Some have suggested that absence of memory for prior events is necessary for random responding (Weiss, 1965), implying that memory interferes with random generation. In any event, the results were clearly inconsistent with the memory hypothesis. In a related study, alcohol was administered to a single group of rats that had learned to respond variably when one stimulus was present and repeat LLRR sequences given a second stimulus (Cohen et al., 1990). The two stimuli alternated throughout each session under a multiple schedule. As alcohol doses increased, performance of the LLRR sequence was seriously impaired, whereas varying under the lag contingency was unaffected (Figure 22.7; see

Figure 22.7. Percentages of reinforced, or correct, sequences as a function of ethanol dosage for each of five rats. Varying sequences were reinforced under one stimulus condition (left graph), and repetitive LLRR sequences were reinforced under another stimulus (right graph). The lines connect averages of the five subjects (BG, BR, RG, B, and P). From “Effects of Ethanol on Reinforced Variations and Repetitions by Rats Under a Multiple Schedule,” by L. Cohen, A. Neuringer, and D. Rhodes, 1990, Journal of the Experimental Analysis of Behavior, 54, p. 5. Copyright 1990 by Society for the Experimental Analysis of Behavior, Inc. Adapted with permission. 530

Operant Variability

also Doughty & Lattal, 2001). Thus, within a single session, the drunk rats failed to repeat accurately but were highly proficient when required to vary. Both interposed time delays and alcohol, two ways to affect memory for prior responses; degraded performances of fixed-pattern sequences; and either improved operant variability or left it unaffected. Additional evidence for the difference between memory-based and stochastic responding was provided by Neuringer and Voss (2002; see also Neuringer, 2002). College students learned to generate chaotic-like sequences (according to the logistic difference chaos function described in Equation 2) as well as to generate stochastic-like sequences (given feedback from eight statistical tests, as in Neuringer, 1986). These two ways of responding variably were alternated, under stimulus control, throughout each session. Memory interference was later introduced in a rather complex way. In the chaos phase of the experiment, subjects were required to generate four separate chaotic functions, each differing from the other. In the stochastic phase, four uncorrelated response sequences were required. In essence, chaotic responses were used to interfere with one another, and stochastic responses were used similarly. Results showed that performances were significantly degraded during the four-segment chaotic phases: Chaotic responses interfered with one another. A different result was obtained from the stochastic portion of the experiment. For one subject, seven of eight statistics were closer to a random model during the interference phase than at the end of the original training with a single sequence; for the second subject, all eight statistics were closer. These results are consistent with the hypothesis that a memorybased process controls chaotic variability and that a stochastic process, not dependent on memory, controls stochastic variability. (For additional evidence, see G. Jensen et al., 2006; Page & Neuringer, 1985.) The importance of this experiment is that it demonstrated the two modes of variability generation, one memory based, the other stochastic, under a procedure that controlled for extraneous variables. Applications We have described experiments on the reinforcement of predictable and unpredictable responding

and the underlying processes. Results from these laboratory experiments may help to explain unpredictable operant behaviors in many nonlaboratory cases in which variability contingencies occur naturally. In this section, we continue to describe laboratory-based studies but ones with direct relevance to real-world conditions.

Training New Responses Skinner (1981) hypothesized a parallel between evolutionary processes and selection by reinforcers of operant responses from a substrate of varying behaviors (see also Baum, 1994; Hull, Langman, & Glenn, 2001; Staddon & Simmelhag, 1971). As described in the Reinforcement of Variability section earlier in this chapter, variable behaviors can be generated by reinforcers whose delivery is directly contingent on that variability, something not anticipated by Skinner. One question that has potential importance for applications is whether reinforced variability facilitates acquisition of new responses, especially difficult-to-learn ones. Neuringer, Deiss, and Olson (2000) reinforced variable five-response sequences across L and R levers while concurrently reinforcing a target sequence that rats find difficult to learn, namely RLLRL. Reinforcement for varying was limited to once per minute, whereas the target sequence RLLRL was reinforced whenever it occurred. Thus, if the rats learned to emit the target, reinforcement was much more frequent than if they only varied. The question was whether concurrent reinforcement of variations would facilitate acquisition of the target sequence, and the answer was obtained through a comparison with two other groups of rats. In one group, the same RLLRL target sequence was reinforced whenever it occurred, but varying was never reinforced. These target-only animals stopped responding altogether—responses extinguished because reinforcer frequencies were low or zero— and they did not learn the target (shown by the control [CON] data in Figure 22.8, top panel). In a second control group, the RLLRL target was also reinforced whenever it occurred. In addition, fiveresponse sequences were reinforced at a rate yoked to that obtained by the experimental animals; note that these reinforcers could follow any sequence and 531

Neuringer and Jensen

Figure 22.8. Rates of emission of a difficult-to-learn target sequence (RLLRL on top and LLRRL on bottom) for three groups of rats as a function of blocks of sessions (each session block shows the average of five sessions). In all groups, the target sequence was reinforced whenever it occurred. For one group, reinforcement was additionally arranged for varying sequences (VAR); for a second group, the additional reinforcers occurred at the same rate as in VAR but independent of variability (ANY); a third group did not receive additional reinforcement for any sequence other than the target sequences (CON). Adapted from “Reinforced Variability and Operant Learning,” by A. Neuringer, C. Deiss, and G. Olson, 2000, Journal of Experimental Psychology: Animal Behavior Processes, 26, p. 107. Copyright 2000 by the American Psychological Association.

did not depend on variations. These animals continued to respond throughout (the yoke reinforcers maintained high response strength) but as shown by the independent-of-variability (ANY) data in the top 532

panel of Figure 22.8, these rats too did not learn the target. Only the experimental animals who concurrently received reinforcers for variable responding (VAR) learned to emit the RLLRL sequence at high rates. Thus, it appeared that concurrent reinforcement of variations facilitated acquisition of a difficult-to-learn sequence, a potentially important finding. The experiment was replicated with a second difficult sequence with the same results (bottom panel of Figure 22.8) and in a separate study with rats as well (Neuringer, 1993). However, attempts in two laboratories to replicate these effects with human participants failed (Bizo & Doolan, 2008; Maes & van der Goot, 2006). In both cases, the target-only group (with no additional reinforcers presented) learned most rapidly. Several possible explanations have been suggested, including differences in relative frequencies of reinforcements for target responses versus variations, differences in levels of motivation in the animal versus human studies, and the “figure out what’s going on” type of instructions provided to the human participants, but why or when concurrent reinforcement of variations facilitates versus interferes with learning of new responses is not yet clear (see Neuringer, 2009).

Problem Solving Arnesen (2000; see also Neuringer, 2004) studied whether a history of explicit reinforcement of variations would facilitate later problem solving. Using a rat model, she provided food pellets to rats in an experimental group for varying their responses to arbitrarily selected objects. For example, a soup can was placed in the chamber, and responding to it in a variety of ways was reinforced. Each session provided a different object, with response variability being reinforced throughout. Members of a yoked control group experienced the same objects but received food pellets independent of their interactions. A second control group was simply handled for a period each day. After training, each rat was placed alone in a problem space, a room approximately 6 feet by 8 feet, on the floor of which were 30 objects—for example, a toy truck, metal plumbing pipes, a hair brush, a doll’s chest of drawers— arbitrarily chosen but different from those used

Operant Variability

during the training phase. Hidden in each object was a small piece of food, and the hungry rats were permitted to explore freely for 20 minutes. The question was how many food pellets would be discovered and consumed. The experimental animals found significantly more pellets than either of the control groups, which did not differ from one another. Furthermore, the experimental rats explored more—they seemed bolder—and interacted more with the objects than did the control rats, many of whom showed signs of fear. Thus, prior reinforcement of response variations transferred to a novel environment and facilitated exploration of novel objects and discovery of reinforcers. The advantages incurred by variations are discussed in the human literature (e.g., brainstorming), but tests of direct reinforcement-of-variability procedures for problem solving more generally have been few.

Creativity Although creative production requires more than variation, Donald Campbell (1960) argued that variations, and indeed random variations, are necessary. If so, then operant variability may make important contributions to creativity. Support comes from studies in which creativity was directly reinforced (e.g., Eisenberger & Armeli, 1997; Holman et al., 1977; Pryor et al., 1969; see also Stokes, 2001). Other studies, however, have indicated that reinforcement interferes with, or degrades, creative output (e.g., Amabile, 1983). This literature is deeply controversial and has been reviewed in several articles (e.g., Cameron & Pierce, 1994; Deci, Koestner, & Ryan, 1999; Lepper & Henderlong, 2000), but the research listed earlier may contribute to a resolution. As shown by Cherot et al. (1996) and others (Wagner & Neuringer, 2006), reinforcement of variations has two effects. As a reinforcer is approached, variability declines. Thus, situations that potentiate the anticipation of consequences on the basis of completion may interfere with creative activities. The contingencies may at the same time, however, maintain high overall levels of creativity. Consideration of both induced effects (anticipation of reinforcement) and contingency effects (reinforced variability and creativity) may help explain reinforcement’s contribution to creativity (see Neuringer, 2003).

Psychopathology Behavioral and psychological disabilities are sometimes associated with reduced control of variability. In autism and depression, for example, behaviors tend to be repetitive or stereotyped even when variations are desirable. In attention-deficit/hyperactivity disorder (ADHD), the opposite is true, with abnormally high variability observed when focused and repetitive responses are adaptive. All three of these disorders share a common characteristic, however: an apparent inability to move from one end or the other of the variability continuum. One question is whether reinforcement contingencies can modify abnormal levels of variability. The answer to this question may differ with respect to depression and autism, on the one hand, and ADHD, on the other. Depression. Hopkinson and Neuringer (2003) asked whether the low behavioral variability associated with depression (Channon & Baker, 1996; Horne, Evans, & Orne, 1982; Lapp, Marinier, & Pihl, 1982) could be increased by direct reinforcement. College students were separated into mildly depressed and not depressed on the basis of Center for Epidemiological Studies Depression Scale scores (Radloff, 1991). Each participant played a computer game in which sequences of responses were first reinforced independently of variability or probabilistically (PROB), as in the yoke procedures we have described, after which variable sequences were directly reinforced (VAR). Figure 22.9 shows that under PROB, the depressed students’ variability (U values) was significantly lower than that of the nondepressed students. When variability was explicitly reinforced, however, levels of variability increased in both groups and to the same high levels. This result, if general, is important because it indicates that variability can be explicitly reinforced in people manifesting mild depression (see also Beck, 1976). Autism. In an experiment conducted by Miller and Neuringer (2000), five individuals diagnosed with autism and nine control subjects received reinforcers independent of variability in a baseline phase (PROB), followed by a phase in which sequence variations were directly reinforced. Subjects with autism behaved less variably than the control subjects in both phases; however, variability increased 533

Neuringer and Jensen

Figure 22.9. Levels of variability (indicated by U values) for depressed and nondepressed college students when reinforcers were provided independent of response variability (PROB phase) versus when variations were required (VAR phase). Standard errors are shown by the error bars. From “Modifying Behavioral Variability in Moderately Depressed Students,” by J. Hopkinson and A. Neuringer, 2003, Behavior Modification, 27, p. 260. Copyright 2003 by Sage Publications, Inc. Adapted with permission.

significantly in both groups when it was reinforced. Thus, individuals with autism, although relatively repetitive in their responding, acquired high levels of operant varying. Ronald Lee and coworkers (Lee, McComas, & Jawor, 2002; Lee & Sturmey, 2006) extended this work. Under a lag schedule, individuals with autism received reinforcers for varying verbal responses to questions, and two of three participants in each of two experiments learned to respond appropriately and nonrepetitively. Thus, the experimental evidence, although not extensive, has indicated that the behavior of individuals with autism can benefit from reinforcers contingent on variability. Stated differently, the abnormally low levels of variability characteristic of individuals with autism may at least in part be under the influence of operant contingencies. Attention-deficit/hyperactivity disorder. Things may differ for individuals diagnosed with ADHD. Here, the abnormal levels of variability are at the opposite end of the continuum, with high variability a defining characteristic (Castellanos et al., 2005; 534

Rubia, Smith, Brammer, & Taylor, 2007). A second common identifier is lack of inhibitory control (Nigg, 2001). Can such behavior be influenced by direct reinforcement? The evidence has indicated that unlike the case for autism, variability may result mainly from noncontingent (i.e., inducing) influences. One example is provided by the beneficial effects of drugs such as methylphenidate (Ritalin). Another is the fact that variability in individuals with ADHD is higher than in control subjects when reinforcement is infrequent, but not when it is frequent (Aase & Sagvolden, 2006). Methylphenidate reduces variability. Low reinforcement frequencies induce high variability, and the effects on those with ADHD may be independent of direct reinforcementof-variability contingencies. Similarly, when reinforcement is delayed, the responses of subjects with ADHD weaken more than those of control subjects, possibly because of induced increases in variability (Wagner & Neuringer, 2006). Thus, variability may be induced in individuals diagnosed with ADHD by different attributes of reinforcement, but to date little evidence has indicated sensitivity to variabilityreinforcing contingencies. Operant Variability and the Emitted Operant Reinforced variability may help to explain some unique attributes of operant behavior. Operants are often compared with Pavlovian reflexes, and the two can readily be distinguished at the level of the procedures used to establish them. In Pavlovian conditioning, a conditional relationship exists between a previously neutral stimulus, such as a bell, and an unconditioned stimulus, such as food. The result is that the neutral stimulus becomes a conditioned stimulus that elicits a conditioned response. One view is that operant responses differ in that they depend on a conditional relationship between response and reinforcer. Thus in one case, a conditional relationship exists between two stimuli (if conditioned stimulus, then unconditioned stimulus), whereas in the other, the relationship is between response and reinforcer. However, according to Thorndike, Guthrie, and others, when a response is made in the presence of a

Operant Variability

particular stimulus and the response is reinforced, then over trials, the stimulus takes on the power of an elicitor (Bower & Hilgard, 1981). This finding led some researchers to conclude that both operant and Pavlovian responses were elicited by prior stimuli. That is, in both cases stimulus–response relationships were critical to predicting and explaining the observed behaviors. Skinner (1935/1959) offered a radically different view of the operant. Skinner’s position is difficult to grasp, partly because at times he assumed the point of view of an environmental determinist, whereas at other times he proposed probabilistic (and possibly indeterministic) outcomes. According to Skinner, eliciting stimuli could not be identified for the operant. Although discriminative stimuli signaled the opportunity for reinforcement, no discrete environmental event could be identified to predict the exact time, topography, or occurrence of the response. Skinner described operants as emitted to distinguish them from elicited Pavlovian reflexes. But how is one to understand emission? The term is ambiguous, derived from the Latin emittere, meaning “to send out.” To be sent out might imply being caused to leave, but there is a sense of emergence, rather than one-to-one causation, as in the emission of radioactive particles. More important, the term captures, for Skinner and others, the manifest variability of all operant behaviors. Skinner interpreted that variability as follows. An individual operant response is a member of a class C of instances, a generic class, made up of functionally similar (although not necessarily physically similar) actions (Skinner, 1935/1959). An example may help to explain this point. Jackie, a young child, desires a toy from a shelf that is too high for her to reach. Jackie might ask her mom to get the toy, jump to try to reach it, push a chair next to the shelf to climb up to the toy, take a broom from the closet and try to pull the toy from the shelf, or cry. Each of these acts, although differing in physical details, is a member of the same operant class because each potentially serves the same functional relationship between the discriminative stimulus (out-of-reach toy) and the goal (toy in hand). Some responses may be more functional than other members of the class, and cues may indicate which

of these responses is most likely to be reinforced. For example, if Jackie’s mother is nearby, the “Mommie, get my toy” response might be most likely. Alternatively, if the toy is just beyond reach, the child might be most likely to jump to get it. In many cases, however, the behavior appears to be selected with equal probabilities, and prediction of the instance becomes difficult. As just suggested, members of a particular class of behaviors may be divided into subclasses, and even here variability may characterize aspects of the response. For example, if “ask for the toy” is the activated subclass, the exact moment of a verbal request, the particular words used, or the rhythm or loudness may all be difficult to predict. Similarly, when a rat is pressing a lever to gain food pellets, the characteristics of the press (one paw vs. both, with short or long latency, with high or low force, etc.) are sometimes predictable, but often are not. Thus, according to a Skinnerian model, functionally equivalent instances emerge unpredictably from within a class or subclass, as though generated by a stochastic process (Skinner, 1938; see also Moxley, 1997). To state this differently, there is variance within the operant, manifested as the emission of instances from a set made up of functionally related but often physically dissimilar behaviors. Behavioral variability occurs for many reasons, as we have discussed. It decreases with training and experience. It is low when reinforcers are frequent and higher under intermittent schedules of reinforcement. It decreases with expectancy of and proximity to reinforcement. However, consequencecontrolled variability may play a special role in explaining the emitted nature of the operant. To see why, we next turn to experiments on volition. The operant is often referred to as the voluntary operant, in contrast to the Pavlovian reflex. The question is what about the operant indicates (and helps to explain) volition. Operant Variability and Voluntary Behavior Attempts to explain volition have been ongoing for more than 2,000 years, and heated debates continue to this day in philosophy (Kane, 2002), psychology 535

Neuringer and Jensen

(Maasen, Prinz, & Roth, 2003; Sebanz & Prinz, 2006; Wegner, 2002), and physiology (Glimcher, 2005; Libet, Freeman, & Sutherland, 1999). These debates often concern the reality of volitional behavior or lack thereof and, if real, how to characterize it. Research on operant variability has suggested that the descriptive term voluntary can be usefully applied; that is, voluntary behaviors can be distinguished from accidental reactions, such as stumbles; from elicited responses, such as reflexes, both unconditioned and Pavlovian; from induced ones, such as those caused by drinking alcohol or anticipating a reinforcer; and many other cases. The research has also indicated important ways in which voluntary actions differ from these others. In large part, the difficulty surrounding attempts to explain voluntary behavior comes from an apparent incompatibility between two often-noted characteristics. On the one hand, voluntary acts are said to be intentional, purposeful, goal directed, rational, or adaptive. These characteristics indicate the functionality of voluntary behaviors, and we use that term as a summarizing descriptor. On the other hand, voluntary actions are described as internally motivated and autonomously controlled. Unpredictability, demonstrated or potential, is offered as empirical evidence for such hypothesized autonomous control. Thus, unpredictability is thought to separate voluntary acts from other functional behaviors (e.g., reflexes) and to separate their explanation from Newtonian causes and effects. Proposed explanations of the unpredictability run the gamut from a soul or a mind that can function apart from physical causes to quantum-mechanical random events, but they are all ultimately motivated by the presumed inability of a knowledgeable (perhaps even supremely knowledgeable) observer to anticipate the particulars of a voluntary act. How can unpredictability (perhaps even unpredictability in principle) be combined with functionality? That is the critical question facing those of us who would argue that voluntary is a useful classification. The problem derives from the (erroneous) assumption that functionality necessarily implies potential predictability. That assumption goes something like this: If an observer knows what an individual is striving for, or attempting to accomplish, 536

then together with knowledge of the individual’s past experiences and current circumstances, at least somewhat accurate predictions can be made about the individual’s future goal-directed actions. Thus, because functionality is thought to require an orderly relationship to environmental variables, predictions must be (at least theoretically) possible. Again, though, voluntary acts are often characterized by their unpredictability, with this serving as a sign of autonomous control. An added complication is that unpredictability alone does not characterize voluntary actions. Researchers do not attribute volition to random events, such as the throw of dice or emission of atomic particles (Dennett, 2003; Popper & Eccles, 1977), and truly random responding would often be maladaptive. Yet another problem is that voluntary behaviors are not always unpredictable—they are quite predictable some of the time and, indeed, exist across the range of predictability. For example, when the traffic light turns red, a driver is likely to step on the brake. When you are asked for your name, you generally answer veridically, and so on. But even in cases of predictable behaviors, if voluntary, these responses can be—and sometimes are— emitted in more or less unpredictable fashion. The red light can cause speeding up, slowing down, or cursing. The name offered might be made up so as to fool the questioner, for example, during a game. In brief, voluntary responses have the potential to move along a variability continuum from highly predictable to unpredictable. A characteristic of all voluntary behaviors is real or potential variations in levels of variability. Operant variability helps to explain volition by combining functionality with variations in levels of variability. Operant responses are goal directed and functional, and the same holds for voluntary behaviors. (In some cases, researchers say that the voluntary response—and the operant—is intended to be functional because it is governed by previous experiences and because in a variable or uncertain environment, what was once functional may no longer be so.) Operant responses are more or less variable, depending on discriminative stimuli and reinforcement contingencies, and the same is true for voluntary behaviors. Thus, for both operant and voluntary

Operant Variability

behaviors, the ability of a knowledgeable observer to predict future occurrences will depend on the circumstances. Voluntary behavior is behavior that is functional (or intended to be so) and sometimes highly predictable, other times unpredictable, with predictability governed by the same functionality requirement as other attributes of operant behavior. We have just summarized a theory of volition referred to as the operant variability and voluntary action (OVVA) theory (Neuringer & Jensen, 2010). In the following sections, we provide experimental evidence consistent with OVVA theory. We begin with a discussion of choices under conditions of uncertainty, partly because choices are generally thought to be voluntary and partly because concurrent schedules of reinforcement, a method used to study choice, provided the means to test OVVA theory.

Choice Under Uncertainty In some choice situations, one (and only one) of many options provides reinforcement (e.g., the third key from the left in a row of eight keys), and both people and other animals learn to choose correctly and to do so repeatedly. In other cases, a particular pattern of choices is required (e.g., LLRR in a twolever chamber), and that pattern is learned and repeated. Individual choices in these situations are readily predicted. In many situations, though, fixed choices and patterns are not reinforced, and reinforcer availability is uncertain, both in time and place. As we discuss, these conditions often result in stochastic responding. Choices under conditions of reinforcement uncertainty have commonly been studied in behavioral laboratories with concurrent schedules of reinforcement. Reinforcers are independently programmed for two (or sometimes more) options, and subjects choose freely among them. Consider the example of concurrent VI schedules. In a VI 1 minute– VI 3 minute procedure, each schedule is applied to one of two response alternatives, left and right. Under this procedure, a reinforcer becomes available (or “sets up”) on average once per minute for responses on the left and independently on average every 3 minutes for choices of the option on the right. Once a reinforcer has set up, it is delivered on the next response to that alternative. Because time to

reinforcement is unpredictable, and the two alternatives are independent of one another, every response has the possibility of producing a reinforcer. However, in general, the left alternative is three times more likely to have reinforcement waiting than the right alternative. The VI values (or average times between reinforcer setups) generally differ across phases of an experiment. For example, a 1:3 ratio of setup time left to right in one phase might be followed by a 3:1 ratio in another, and a third might use a 2:2 ratio. When the ratios across these alternatives are systematically manipulated, an often observed finding is that the overall ratios of left-to-right choices are functionally related to ratios of left-to-right obtained reinforcers, a relationship commonly described as a power function and referred to as the generalized matching law (Baum, 1974): s

CX  kX   R X  =   •  . CY  k y   R y 

(3)

In Equation 3, CX refers to observed choices of alternative X, and RX corresponds to delivered reinforcers (CY and RY correspond to alternative Y, accordingly). The parameter kX refers to bias for X, with biases— because of side preferences, differences in the operanda, or any number of variables—not thought to be influenced by the reinforcer ratios. The s parameter refers to the sensitivity of choice ratios to reinforcement ratios. When s = 1.0, choice ratios exactly match (or equal) reinforcement ratios. With s parameter values less than 1.0, choice ratios are not as extreme as the ratio of reinforcers, with the opposite for s more than 1.0 (see the Psychophysical Test section later in this chapter). To the extent that the generalized matching law provides an accurate description (and there is much support for it), it permits predictions of the molar distribution of choice allocation; that is, overall ratios of choices can accurately be described as a function of obtained reinforcer ratios (Davison & McCarthy, 1988). Another observation from studies of concurrent VI schedules, however, is that individual choices are difficult to predict. Even when they conform to Equation 3, they often appear to be emitted stochastically (Glimcher, 2003, 2005; G. Jensen & Neuringer, 537

Neuringer and Jensen

2008; Nevin, 1969; see also Silberberg, Hamilton, Ziriax, & Casey, 1978, for an alternative view). In the VI 1 minute–VI 3 minute example given earlier, an observer might accurately predict that the left option will be chosen three times more frequently than the right but be unable to accurately predict any given choice. A recent example of such stochasticity was observed when pigeons’ choices were allocated across three concurrently available sources of reinforcement (G. Jensen & Neuringer, 2008). Figure 22.10 shows that run lengths—defined as the average number of choices on one key before switching to a different key—approximated those expected from a stochastic process.5 Thus, at the same time that overall choice proportions can readily be predicted, individual choices cannot. This combination of functionally related choice proportions and stochastic emission provided the means to assess the relationship between operant variability and volition. In particular, we asked whether

Figure 22.10. Mean run lengths by pigeons on each of three response keys as a function of the proportion of responses to that key. The drawn line is the expected function if responses were emitted stochastically. Adapted from “Choice as a Function of Reinforcer ‘Hold’: From Probability Learning to Concurrent Reinforcement,” by G. Jensen and A. Neuringer, 2008, Journal of Experimental Psychology: Animal Behavior Processes, 34, p. 44. Copyright 2008 by the American Psychological Association.

functionally varying behaviors yielded a perception of voluntary action.

Psychophysical Test OVVA theory predicts that responses will appear to be voluntary when levels of (un)predictability vary functionally (purposefully, adaptively). Choices under concurrent schedules of reinforcement provided a way to test this claim. Neuringer, Jensen, and Piff (2007) had human participants observe six different virtual actors (hereinafter called agents) as each agent made thousands of choices. The agents differed in how they went about choosing among the available options (the strategies are described in a subsequent paragraph). Each agent’s choices were shown separately on an individual computer, with six computers located close to one another on small desks in a laboratory. The participants were free to walk among the computers to compare the agents’ choice strategies. To minimize extraneous cues, such as whether the agent resembled a human figure, choices were represented in a simple manner, namely as dots moving around the screens. Participants were instructed that the agents were choosing among three alternative gambles, similar to slot machine gambles, with each gamble’s choice represented by the dot’s movement in one of three directions. Whenever a choice led to reinforcement—the agent won that gamble—the dot’s color changed as a sign of success. Thus, participants could observe how choices were made in relationship to the reinforcers received. Participants were asked to judge how well the choices made by the agents represented voluntary choices made by a real human player. Unknown to the participants, the agents’ choices were controlled by iterating the generalized matching power function (Equation 3) that was extended to a three-alternative situation (G. Jensen & Neuringer, 2008). Thus, the agents chose probabilistically among the three options on the basis of the proportions of reinforcers that they had received from the three alternatives. These calculations were done in real time, with current choice probabilities depending on previously obtained reinforcers.

See Jensen and Neuringer (2008) for discussion of these findings, including the small divergence of data from the theoretical curve.

5

538

Operant Variability

Choices by the six agents differed only with respect to the s exponent of the power functions governing the choices: Some agents had high values for their sensitivity parameters, and others had low values. Participants were told only that the dot movements represented choices of gambles and that their objective was to rate how closely those movements resembled the voluntary choices of real human players. Next, we describe how reinforcer availability was programmed and the effects of s values on the generated choices. Reinforcers set up probabilistically (and remained available until collected, as in concurrent VI schedules) for each of the three gambles. There were six different combinations of set-up rates, which participants were told constituted six different games. Thus, in some games, the agent’s X choices were most frequently reinforced; in other games, Y choices were most frequently reinforced; in others, the reinforcers were more equally distributed; and so on. Participants were free to observe each agent playing each of the six games for as long as needed to make their evaluation. After observing the choices in all games, the participants judged the degree to which each agent’s responses appeared to be those of a human player who was voluntarily choosing among the options. The key question was whether the agents’ different choice strategies—caused by differences in the s exponents—generated systematic differences in participants’ judgments of volition. The s values, and their effects on the agents’ choice allocations, were as follows: For one agent, s equaled 1.0, and choice proportions therefore strictly matched proportions of received reinforcers. Assume, for example, that this agent had gained a total of 100 reinforcers at some point in the game: 50 reinforcers for option X, 30 for Option Y, and 20 for Option Z. The probability of the agent’s next X choice would therefore equal 0.5 (50/100); a Y choice, 0.3 (30/100); and a Z choice, 0.2 (20/100). The s = 1.0 actor therefore distributed its choices probabilistically in exact proportion to its received reinforcers. Another agent was assigned an s value of 0.4, the consequence of which was that it tended to choose

among the three options with more equal probabilities than indicated by the reinforcement ratios throughout the six games. In the preceding example, this agent would choose X with probability of .399 (rather than .5 for the exact matcher), choose Y with probability of .325 (rather than .3), and choose Z with probability of .276 (rather than .2). In general, algorithms with s values less than 1.0 are referred to as undermatchers: They distribute choices more equally—and therefore more unpredictably—across the available options than the exact matcher. The opposite was the case for agents with s values more than 1.0, whose preferences were more extreme than indicated by the reinforcer ratios and were referred to as overmatchers. Over the course of several experiments, a wide range of s values was presented, spanning a range from 0.0 (extreme undermatcher) to 6.0 (extreme overmatcher) in one experiment, a range from 0.1 to 2.0 in another, and a range from 0.1 to 1.9 in a third. Results were consistent and clear: The strict matcher (s = 1.0) was judged to best represent volitional choices. Figure 22.11 shows data from two of the experiments. In one experiment, participants were informed in advance that all of the agents’ choices were generated by computer algorithms, and they were asked to rate the algorithms in terms of volitional appearance. In the second, participants were told that some agents’ choices were based on computer algorithms, that others depicted voluntary choices of real humans, and that their task was to identify the humans.6 As s values approached 1.0, the agents were rated as providing increasingly good representations of voluntary human choice, suggesting a continuum of more or less apparent volition. From the perspective of the participants, the s = 1.0 strict matcher sometimes responded unpredictably (when reinforcers were equally allocated across the three alternatives), at other times highly predictably (when most reinforcers were obtained from one alternative), and at yet other times at intermediate levels. In each case, however, the agent’s choices seemed to be governed by the reinforcement distribution in a particular game environment, an indicator of functional

This task was inspired by the Turing test, considered by many to be the gold standard of artificial intelligence.

6

539

Neuringer and Jensen

Figure 22.11. Judgments of how closely agents’ responses approximated voluntary human choices (on left y-axis) and probabilities (prob.) of identifying agents as a voluntarily choosing human player (on right y-axis) as functions of the agents’ s-value exponents. From “Stochastic Matching and the Voluntary Nature of Choice,” by A. Neuringer, G. Jensen, and P. Piff, 2007, Journal of the Experimental Analysis of Behavior, 88, pp. 7, 13. Copyright 2007 by Society for the Experimental Analysis of Behavior, Inc. Adapted with permission.

changes in behavior. The undermatchers tended to respond less predictably throughout, as we indicated earlier, and the overmatchers more predictably. Thus, the undermatchers demonstrated that unpredictability alone was not sufficient for apparent volition: It was necessary that agents display functional variations in levels of (un)predictability to receive the highest volitional ratings. A series of control experiments evaluated alternative explanations. For example, rates of reinforcement were overall slightly higher (across games) for the s = 1.0 matcher than for any of the other agents, and one control showed that differences in reinforcement rate were not responsible for the volitional judgments. In the experiment, agents who cheated (i.e., those who appeared to know where to respond for reinforcers) were compared with the strict—probabilistically choosing—matcher, and the matcher was evaluated as substantially more volitional in appearance, despite obtaining fewer reinforcers than the cheaters. An observer might appreciate the individual who gains more reinforcement than another, but that fact alone will not convince the observer that the individual is choosing in a voluntary manner. 540

Another control experiment tested whether matching alone implied volition (Neuringer et al., 2007). The question was whether the more or less (un)predictable responding contributed at all to the judgments. Stated differently, did matching or predictability or both generate the volitional judgments? Participants were therefore asked to compare two agents, both of which exactly matched choice proportions to reinforcer proportions; however, one agent matched by stochastically allocating its choices (as was done in all of the experiments described to this point), whereas the other agent allocated its choices in an easily predictable fashion. For example, if the stochastic matcher had received reinforcers in a ratio of 5:3:2, it responded to the left alternative with a .5 probability, to the center with a .3 probability, and to the right with a .2 probability. Because they were emitted stochastically, individual choices could not be predicted above chance levels. By contrast, the patterned matcher also matched exactly but did so in a patterned and therefore readily predictable way. In the example just given, it would respond LLLLLCCCRR, again and again cycling through the same 5:3:2 strings of responding until there was a change in obtained reinforcer

Operant Variability

proportions, at which point it would adjust the length of its strings accordingly. Because both agents matched, both received identical rates of reinforcement. The participants judged the stochastic matcher to significantly better represent a voluntary human player than the patterned one, showing that both functionality (matching, in this case) and stochasticity were jointly necessary for the highest ratings of volition. The combination of choice distributions (matching) and choice variability (more or less predictability) provided evidence for voluntary behavior. Choice distributions alone did not lead responses to be evaluated as highly voluntary, nor did choice unpredictability alone. Choices were most voluntary in appearance when probabilities and distributions of stochastic responses changed with distributions of reinforcers. According to OVVA theory, functionally changing variable behaviors are voluntary behaviors. Stated differently, voluntary behaviors are members of a class characterized by ability to vary levels of response (un)predictability in a functional manner. The psychophysical evidence just reviewed is consistent with OVVA theory. To review, the facts of operant variability show that levels, or degrees, of behavioral (un)predictability are guided by environmental consequences. A theory of volition, OVVA, proposes that exactly the same is true for voluntary actions. Voluntary behaviors are sometimes readily predictable, sometimes less predictable, and sometimes quite unpredictable. In all cases, the reasons for the predictability can be identified (given sufficient knowledge), but the precise behaviors may still remain unpredictable. For example, under some circumstances, the response to “How are you?” can readily be predicted for a given acquaintance. Even when the situation warrants unpredictable responses, as when responders wish to conceal their feelings, some veridical predictions can be made: that the response will be verbal, that it will contain particular parts of speech, and so on. The functionality of variability implies a degree of predictability in the resulting behaviors that is related to the activated class of possibilities from which the response emerges. The class can often be predicted on the basis of knowledge of the organism and environmental conditions. However, the

instance may be difficult or impossible to predict, especially for large response classes. Unpredictability, real or potential, is emphasized in many discussions of volition. Indeed, the size of the active set can be exceedingly large—and functionally so—because if someone was attempting to prove that he or she is a free agent, the set of possibilities might consist of all responses in that person’s repertoire (see Scriven, 1965). We return to the fact, though, that voluntary behaviors can be predictable as well as not predictable. The most important characteristic is functionality of variability, or ability to change levels of predictability in response to environmental demands. This is equally an identifying characteristic of operant behavior in which responses are functional and stochastically emitted. Thus, with Skinner, we combine voluntary and operant in a single phrase, but research has now shown why that is appropriate. Operant responses are voluntary because they combine functionality with (un)predictability. Conclusion Aristotle anticipated what many have referred to as the most influential law in psychology (Murray, 1988). When two events co-occur, presentation of one will cause recollection or generation of the other. Although he and many others were wrong in the details, laws of association have been the foundation of theories of mind and behavior throughout the history of Western thought, and the science of psychology and behavior has been well served by the search for them. From the British Associationists, to Pavlov, to Hebb and Rescorla, theoreticians and researchers have documented laws of the form “if A, then B” that help to explain thoughts and behaviors. Evolutionary theory offered a distinctly different type of behavioral law, involving selection from variations, laws that were developed by Skinner (1981) and others (Hull et al., 2001). In this chapter, we provided evidence of how selection interacts with variation: Parameters of variation are selected (via reinforcement of variability), and selections emerge from variations (via stochastic emission). This interaction, of equal importance to that of association, must be deciphered if researchers are to explain, at long last, voluntary behavior. 541

Neuringer and Jensen

References Aase, H., & Sagvolden, T. (2006). Infrequent, but not frequent, reinforcers produce more variable responding and deficient sustained attention in young children with attention-deficit/hyperactivity disorder (ADHD). Journal of Child Psychology and Psychiatry, 47, 457–471. doi:10.1111/j.1469-7610.2005.01468.x Akins, C. K., Domjan, M., & Gutierrez, G. (1994). Topography of sexually conditioned behavior in male Japanese quail (Coturnix japonica) depends on the CS–US interval. Journal of Experimental Psychology: Animal Behavior Processes, 20, 199–209. doi:10.1037/0097-7403.20.2.199 Amabile, T. M. (1983). The social psychology of creativity. New York, NY: Springer-Verlag. Antonitis, J. J. (1951). Response variability in the white rat during conditioning, extinction, and reconditioning. Journal of Experimental Psychology, 42, 273–281. doi:10.1037/h0060407 Arnesen, E. M. (2000). Reinforcement of object manipulation increases discovery. Unpublished undergraduate thesis, Reed College, Portland, OR. Bandura, A. (1982). The psychology of chance encounters and life paths. American Psychologist, 37, 747–755. doi:10.1037/0003-066X.37.7.747 Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242. doi:10.1901/jeab.1974.22-231 Baum, W. M. (1994). Understanding behaviorism. New York, NY: HarperCollins. Beck, A. T. (1976). Cognitive therapy and the emotional disorders. New York, NY: International Universities Press. Bizo, L. A., & Doolan, K. (2008). Reinforced behavioural variability. Paper presented at the meeting of the Association for Behavior Analysis, Chicago, IL. Blough, D. S. (1966). The reinforcement of least frequent interresponse times. Journal of the Experimental Analysis of Behavior, 9, 581–591. doi:10.1901/jeab. 1966.9-581 Bouton, M. (1994). Context, ambiguity, and classical conditioning. Current Directions in Psychological Science, 3, 49–53. doi:10.1111/1467-8721.ep10769943 Bower, G. H., & Hilgard, E. R. (1981). Theories of learning (5th ed.). Englewood Cliffs, NJ: Prentice-Hall. Brener, J., & Mitchell, S. (1989). Changes in energy expenditure and work during response acquisition in rats. Journal of Experimental Psychology: Animal Behavior Processes, 15, 166–175. doi:10.1037/00977403.15.2.166 Brugger, P. (1997). Variables that influence the generation of random sequences: An update. Perceptual and 542

Motor Skills, 84, 627–661. doi:10.2466/pms.1997. 84.2.627 Cameron, J., & Pierce, W. D. (1994). Reinforcement, reward, and intrinsic motivation: A meta-analysis. Review of Educational Research, 64, 363–423. doi:10.3102/00346543064003363 Campbell, D. T. (1960). Blind variation and selective retention in creative thought as in other knowledge processes. Psychological Review, 67, 380–400. doi:10.1037/h0040373 Castellanos, F. X., Sonuga-Barker, E. J. S., Schere, A., Di Martino, A., Hyde, C., & Walters, J. R. (2005). Varieties of attention-deficit/hyperactivity disorder-related intraindividual variability. Biological Psychiatry, 57, 1416– 1423. doi:10.1016/j.biopsych.2004.12.005 Catchpole, C. K., & Slater, P. J. (1995). Bird song: Biological themes and variations. Cambridge, England: Cambridge University Press. Chaitin, G. J. (1975). Randomness and mathematical proof. Scientific American, 232, 47–52. doi:10.1038/ scientificamerican0575-47 Channon, S., & Baker, J. E. (1996). Depression and problem-solving performance on a fault-diagnosis task. Applied Cognitive Psychology, 10, 327–336. doi:10.1002/(SICI)1099-0720(199608)10:43.0.CO;2-O Cherot, C., Jones, A., & Neuringer, A. (1996). Reinforced variability decreases with approach to reinforcers. Journal of Experimental Psychology: Animal Behavior Processes, 22, 497–508. doi:10.1037/0097-7403.22.4.497 Clouzot, H.-G. (Producer & Director). (1956). Le mystère Picasso [The mystery of Picasso; Motion picture]. France: Filmsonor. Cohen, L., Neuringer, A., & Rhodes, D. (1990). Effects of ethanol on reinforced variations and repetitions by rats under a multiple schedule. Journal of the Experimental Analysis of Behavior, 54, 1–12. doi:10.1901/jeab.1990.54-1 Craig, W. (1918). Appetites and aversions as constituents of instincts. Biological Bulletin, 34, 91–107. doi:10.2307/1536346 Davison, M., & McCarthy, D. (1988). The matching law: A research review. Hillsdale, NJ: Erlbaum. Deci, E. L., Koestner, R., & Ryan, R. M. (1999). A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychological Bulletin, 125, 627–668. doi:10.1037/0033-2909.125.6.627 Dennett, D. (2003). Freedom evolves. New York, NY: Viking Adult. Denney, J., & Neuringer, A. (1998). Behavioral variability is controlled by discriminative stimuli. Animal Learning and Behavior, 26, 154–162. doi:10.3758/ BF03199208

Operant Variability

Doughty, A. H., & Lattal, K. A. (2001). Resistance to change of operant variation and repetition. Journal of the Experimental Analysis of Behavior, 76, 195–215. doi:10.1901/jeab.2001.76-195 Driver, P. M., & Humphries, D. A. (1988). Protean behavior: The biology of unpredictability. Oxford, England: Oxford University Press. Eckerman, D. A., & Lanson, R. N. (1969). Variability of response location for pigeons responding under continuous reinforcement, intermittent reinforcement and extinction. Journal of the Experimental Analysis of Behavior, 12, 73–80. doi:10.1901/jeab.1969.12-73 Eisenberger, R., & Armeli, S. (1997). Can salient reward increase creative performance without reducing intrinsic creative interest? Journal of Personality and Social Psychology, 72, 652–663. doi:10.1037/00223514.72.3.652 Gharib, A., Gade, C., & Roberts, S. (2004). Control of variation by reward probability. Journal of Experimental Psychology: Animal Behavior Processes, 30, 271–282. doi:10.1037/0097-7403.30.4.271 Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kruger, L. (1989). The empire of chance: How probability changed science and everyday life. Cambridge, England: Cambridge University Press. Glimcher, P. W. (2003). Decisions, uncertainty, and the brain. Cambridge, MA: MIT Press. Glimcher, P. W. (2005). Indeterminacy in brain and behavior. Annual Review of Psychology, 56, 25–56. doi:10.1146/annurev.psych.55.090902.141429 Goetz, E. M., & Baer, D. M. (1973). Social control of form diversity and emergence of new forms in children’s blockbuilding. Journal of Applied Behavior Analysis, 6, 209–217. doi:10.1901/jaba.1973.6-209 Grunow, A., & Neuringer, A. (2002). Learning to vary and varying to learn. Psychonomic Bulletin and Review, 9, 250–258. doi:10.3758/BF03196279 Guthrie, E. R., & Horton, G. P. (1946). Cats in a puzzle box. New York, NY: Rinehart. Hachiga, Y., & Sakagami, T. (2010). A runs-test algorithm: Contingent reinforcement and response run structures. Journal of the Experimental Analysis of Behavior, 93, 61–80. doi:10.1901/jeab.2010.93-61 Herrnstein, R. J. (1961). Stereotypy and intermittent reinforcement. Science, 133, 2067–2069. doi:10.1126/ science.133.3470.2067-a Holman, J., Goetz, E. M., & Baer, D. M. (1977). The training of creativity as an operant and an examination of its generalization characteristics. In B. Etzel, J. LeBland, & D. Baer (Eds.), New development in behavior research: Theory, method and application (pp. 441–471). Hillsdale, NJ: Erlbaum. Hopkinson, J., & Neuringer, A. (2003). Modifying behavioral variability in moderately depressed

students. Behavior Modification, 27, 251–264. doi:10.1177/0145445503251605 Hopson, J., Burt, D., & Neuringer, A. (2002). Variability and repetition under a multiple schedule. Unpublished manuscript. Horne, R. L., Evans, F. J., & Orne, M. T. (1982). Random number generation, psychopathology, and therapeutic change. Archives of General Psychiatry, 39, 680– 683. doi:10.1001/archpsyc.1982.04290060042008 Hoyert, M. S. (1992). Order and chaos in fixedinterval schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 57, 339–363. doi:10.1901/jeab.1992.57-339 Hull, D. L., Langman, R. E., & Glenn, S. S. (2001). A general account of selection: Biology, immunology and behavior. Behavioral and Brain Sciences, 24, 511–528. doi:10.1017/S0146525X0156416X Jensen, G., Miller, C., & Neuringer, A. (2006). Truly random operant responding: Results and reasons. In E. A. Wasserman & T. R. Zentall (Eds.), Comparative cognition: Experimental explorations of animal intelligence (pp. 459–480). Oxford, England: Oxford University Press. Jensen, G., & Neuringer, A. (2008). Choice as a function of reinforcer “hold”: From probability learning to concurrent reinforcement. Journal of Experimental Psychology: Animal Behavior Processes, 34, 437–460. doi:10.1037/0097-7403.34.4.437 Jensen, R. V. (1987). Classical chaos. American Scientist, 75, 168–181. Kane, R. (Ed.). (2002). The Oxford handbook of free will. Oxford, England: Oxford University Press. Lapp, J. E., Marinier, R., & Pihl, R. O. (1982). Correlates of psychotropic drug use in women: Interpersonal personal problem solving and depression. Women and Health, 7, 5–16. doi:10.1300/J013v07n02_02 Lee, R., McComas, J. J., & Jawor, J. (2002). The effects of differential reinforcement on varied verbal responding by individuals with autism to social questions. Journal of Applied Behavior Analysis, 35, 391–402. doi:10.1901/jaba.2002.35-391 Lee, R., & Sturmey, P. (2006). The effects of lag schedules and preferred materials on variable responding in students with autism. Journal of Autism and Developmental Disorders, 36, 421–428. doi:10.1007/ s10803-006-0080-7 Lee, R., Sturmey, P., & Fields, L. (2007). Schedule-induced and operant mechanisms that influence response variability: A review and implications for future investigations. Psychological Record, 57, 429–455. Lepper, M. R., & Henderlong, J. (2000). Turning “play” into “work” and “work” into “play”: 25 years of research on intrinsic versus extrinsic motivation. In C. Sansone & J. M. Harackiewicz (Eds.), Intrinsic and 543

Neuringer and Jensen

extrinsic motivation: The search for optimal motivation and performance (pp. 257–307). San Diego, CA: Academic Press. doi:10.1016/B978-012619070-0/ 50032-5 Libet, B., Freeman, A., & Sutherland, K. (Eds.). (1999). The volitional brain: Towards a neuroscience of free will. Thorverton, England: Imprint Academic. Lopes, L. L. (1982). Doing the impossible: A note on induction and the experience of randomness. Journal of Experimental Psychology: Learning, Memory, and Cognition, 8, 626–636. doi:10.1037/0278-7393.8.6.626 Maasen, S., Prinz, W., & Roth, G. (Eds.). (2003). Voluntary action: Brains, minds, and sociality. New York, NY: Oxford University Press. Machado, A. (1989). Operant conditioning of behavioral variability using a percentile reinforcement schedule. Journal of the Experimental Analysis of Behavior, 52, 155–166. doi:10.1901/jeab.1989.52-155 Machado, A. (1992). Behavioral variability and frequencydependent selection. Journal of the Experimental Analysis of Behavior, 58, 241–263. doi:10.1901/ jeab.1992.58-241 Machado, A. (1993). Learning variable and stereotypical sequences of responses: Some data and a new model. Behavioural Processes, 30, 103–129. doi:10.1016/ 0376-6357(93)90002-9 Machado, A. (1997). Increasing the variability of response sequences in pigeons by adjusting the frequency of switching between two keys. Journal of the Experimental Analysis of Behavior, 68, 1–25. doi:10.1901/jeab.1997.68-1 Maes, J. H. R., & van der Goot, M. (2006). Human operant learning under concurrent reinforcement of response variability. Learning and Motivation, 37, 79–92. doi:10.1016/j.lmot.2005.03.003 Manabe, K., Staddon, J. E. R., & Cleaveland, J. M. (1997). Control of vocal repertoire by reward in budgerigars (Melopsittacus undulatus). Journal of Comparative Psychology, 111, 50–62. doi:10.1037/0735-7036. 111.1.50 McElroy, E., & Neuringer, A. (1990). Effects of alcohol on reinforced repetitions and reinforced variations in rats. Psychopharmacology, 102, 49–55. doi:10.1007/ BF02245743 Mechner, F. (1958). Sequential dependencies of the lengths of consecutive response runs. Journal of the Experimental Analysis of Behavior, 1, 229–233. doi:10.1901/jeab.1958.1-229 Metzger, M. A. (1994). Have subjects been shown to generate chaotic numbers? Commentary on Neuringer and Voss. Psychological Science, 5, 111–114. doi:10.1111/j.1467-9280.1994.tb00641.x Miller, N., & Neuringer, A. (2000). Reinforcing variability in adolescents with autism. Journal of Applied 544

Behavior Analysis, 33, 151–165. doi:10.1901/ jaba.2000.33-151 Mook, D. M., & Neuringer, A. (1994). Different effects of amphetamine on reinforced variations versus repetitions in spontaneously hypertensive rats (SHR). Physiology and Behavior, 56, 939–944. doi:10.1016/ 0031-9384(94)90327-1 Mosekilde, E., Larsen, E., & Sterman, J. (1991). Coping with complexity: Deterministic chaos in human decision making behavior. In J. L. Casti & A. Karlqvist (Eds.), Beyond belief: Randomness, prediction and explanation in science (pp. 199–229). Boca Raton, FL: CRC Press. Moxley, R. A. (1997). Skinner: From determinism to random variation. Behavior and Philosophy, 25, 3–28. Murray, D. J. (1988). A history of Western psychology (2nd ed.). Englewood Cliffs, NJ: Prentice Hall. Neuringer, A. (1986). Can people behave “randomly?”: The role of feedback. Journal of Experimental Psychology: General, 115, 62–75. doi:10.1037/00963445.115.1.62 Neuringer, A. (1991). Operant variability and repetition as functions of interresponse time. Journal of Experimental Psychology: Animal Behavior Processes, 17, 3–12. doi:10.1037/0097-7403.17.1.3 Neuringer, A. (1992). Choosing to vary and repeat. Psychological Science, 3, 246–250. doi:10.1111/ j.1467-9280.1992.tb00037.x Neuringer, A. (1993). Reinforced variation and selection. Animal Learning and Behavior, 21, 83–91. doi:10.3758/BF03213386 Neuringer, A. (2002). Operant variability: Evidence, functions, and theory. Psychonomic Bulletin and Review, 9, 672–705. doi:10.3758/BF03196324 Neuringer, A. (2003). Reinforced variability and creativity. In K. A. Lattal & P. N. Chase (Eds.), Behavior theory and philosophy (pp. 323–338). New York, NY: Kluwer Academic/Plenum. Neuringer, A. (2004). Reinforced variability in animals and people. American Psychologist, 59, 891–906. doi:10.1037/0003-066X.59.9.891 Neuringer, A. (2009). Operant variability and the power of reinforcement. Behavior Analyst Today, 10, 319– 343. Retrieved from http://www.baojournal.com/ BAT%20Journal/VOL-10/BAT%2010-2.pdf Neuringer, A., Deiss, C., & Olson, G. (2000). Reinforced variability and operant learning. Journal of Experimental Psychology: Animal Behavior Processes, 26, 98–111. doi:10.1037/0097-7403.26.1.98 Neuringer, A., & Jensen, G. (2010). Operant variability and voluntary action. Psychological Review, 117, 972–993. doi:10.1037/a0019499 Neuringer, A., Jensen, G., & Piff, P. (2007). Stochastic matching and the voluntary nature of choice. Journal

Operant Variability

of the Experimental Analysis of Behavior, 88, 1–28. doi:10.1901/jeab.2007.65-06 Neuringer, A., Kornell, N., & Olufs, M. (2001). Stability and variability in extinction. Journal of Experimental Psychology: Animal Behavior Processes, 27, 79–94. doi:10.1037/0097-7403.27.1.79 Neuringer, A., & Voss, C. (1993). Approximating chaotic behavior. Psychological Science, 4, 113–119. doi:10.1111/j.1467-9280.1993.tb00471.x Neuringer, A., & Voss, C. (2002). Approximations to chaotic responding depends on interresponse time. Unpublished manuscript. Nevin, J. A. (1969). Interval reinforcement of choice behavior in discrete trials. Journal of the Experimental Analysis of Behavior, 12, 875–885. doi:10.1901/ jeab.1969.12-875 Nickerson, R. S. (2002). The production and perception of randomness. Psychological Review, 109, 330–357. doi:10.1037/0033-295X.109.2.330 Nigg, J. T. (2001). Is ADHD a disinhibitory disorder? Psychological Bulletin, 127, 571–598. doi:10.1037/ 0033-2909.127.5.571 Notterman, J. M., & Mintz, D. E. (1965). Dynamics of response. New York, NY: Wiley. Page, S., & Neuringer, A. (1985). Variability is an operant. Journal of Experimental Psychology: Animal Behavior Processes, 11, 429–452. doi:10.1037/00977403.11.3.429 Popper, K. R., & Eccles, J. C. (1977). The self and its brain. New York, NY: Springer-Verlag. Pryor, K. W., Haag, R., & O’Reilly, J. (1969). The creative porpoise: Training for novel behavior. Journal of the Experimental Analysis of Behavior, 12, 653–661. doi:10.1901/jeab.1969.12-653 Radloff, L. S. (1991). The use of the Center for Epidemiological Studies Depression Scale in adolescents and young adults. Journal of Youth and Adolescence, 20, 149–166. doi:10.1007/BF01537606 Rhinehart, L. (1998). The dice man. Woodstock, NY: Overlook Press. Ross, C., & Neuringer, A. (2002). Reinforcement of variations and repetitions along three independent response dimensions. Behavioural Processes, 57, 199–209. doi:10.1016/S0376-6357(02)00014-1 Rubia, K., Smith, A. B., Brammer, M. J., & Taylor, E. (2007). Temporal lobe dysfunction in medicationnaïve boys with attention-deficit/hyperactivity disorder during attention allocation and its relation to response variability. Biological Psychiatry, 62, 999–1006. doi:10.1016/j.biopsych.2007.02.024 Schwartz, B. (1980). Development of complex stereotyped behavior in pigeons. Journal of the Experimental Analysis of Behavior, 33, 153–166. doi:10.1901/ jeab.1980.33-153

Schwartz, B. (1988). The experimental synthesis of behavior: Reinforcement, behavioral stereotypy and problem solving. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 22, pp. 93–138). New York, NY: Academic Press. Scriven, M. (1965). An essential unpredictability in human behavior. In B. Wolman (Ed.), Scientific psychology (pp. 411–425). New York, NY: Basic Books. Sebanz, N., & Prinz, W. (Eds.). (2006). Disorders of volition. Cambridge, MA: MIT Press. Silberberg, A., Hamilton, B., Ziriax, J. M., & Casey, J. (1978). The structure of choice. Journal of Experimental Psychology: Animal Behavior Processes, 4, 368–398. doi:10.1037/0097-7403.4.4.368 Skinner, B. F. (1938). The behavior of organisms. New York, NY: Appleton-Century. Skinner, B. F. (1959). Cumulative record. New York, NY: Appleton-Century-Crofts. (Original work published 1935) Skinner, B. F. (1981). Selection by consequences. Science, 213, 501–504. doi:10.1126/science.7244649 Smith, J. M. (1982). Evolution and the theory of games. Cambridge, England: Cambridge University Press. Staddon, J. E. R., & Simmelhag, V. L. (1971). The “superstition” experiment: A reexamination of its implications for the principles of adaptive behavior. Psychological Review, 78, 3–43. doi:10.1037/ h0030305 Stokes, P. D. (1995). Learned variability. Animal Learning and Behavior, 23, 164–176. doi:10.3758/BF03199931 Stokes, P. D. (2001). Variability, constraints, and creativity: Shedding light on Claude Monet. American Psychologist, 56, 355–359. doi:10.1037/0003-066X. 56.4.355 Stokes, P. D., & Harrison, H. M. (2002). Constraints have different concurrent effects and aftereffects on variability. Journal of Experimental Psychology: General, 131, 552–566. doi:10.1037/0096-3445.131.4.552 Taleb, N. N. (2007). The black swan. New York, NY: Random House. Thorndike, E. L. (1911). Animal intelligence. New York, NY: Macmillan. Townsend, J. T. (1992). Chaos theory: A brief tutorial and discussion. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), From learning theory to connectionist theory: Essays in honor of William K. Estes (Vol. 1, pp. 65–96). Hillsdale, NJ: Erlbaum. Vogel, R., & Annau, Z. (1973). An operant discrimination task allowing variability of reinforced response patterning. Journal of the Experimental Analysis of Behavior, 20, 1–6. doi:10.1901/jeab.1973.20-1 545

Neuringer and Jensen

Wagner, K., & Neuringer, A. (2006). Operant variability when reinforcement is delayed. Learning and Behavior, 34, 111–123. doi:10.3758/BF03193187 Ward, L. M., & West, R. L. (1994). On chaotic behavior. Psychological Science, 5, 232–236. doi:10.1111/j.1467-9280.1994.tb00506.x

546

Wegner, D. M. (2002). The illusion of conscious will. Cambridge, MA: MIT Press. Weiss, R. L. (1965). “Variables that influence randomgeneration”: An alternative hypothesis. Perceptual and Motor Skills, 20, 307–310. doi:10.2466/ pms.1965.20.1.307

Chapter 23

Behavioral Pharmacology Gail Winger and James H. Woods

In 1968, Roger Kelleher and William Morse wrote a review of behavioral pharmacology, describing the field as “a relatively new branch of pharmacology concerned with the effects of drugs on behavior” and an area that “has been stimulated by the discoveries of new drugs” (p. 2) between the mid-1950s and the mid-1960s. A comparison of the 1955 edition with the 1965 edition of a medical pharmacology text (Goodman & Gilman, 1955, 1965) makes this latter point convincingly. In the 1955 edition, the number of drugs that acted on the central nervous system was limited. Many of these are no longer used today, but the list of those that are in current use includes barbiturates as hypnotics, cannabis as a miscellaneous agent, morphine and codeine as narcotic analgesics, and methadone as a treatment for narcotic withdrawal. Nalorphine was included as a narcotic antagonist. Cocaine was described as a local anesthetic, its only approved use, and amphetamine was included as a sympathomimetic, although its central nervous system stimulant actions were acknowledged as the primary therapeutic use. By Goodman and Gilman’s 1965 edition, central nervous system drug use had been transformed. Two new chapters were added that dealt exclusively with psychoactive drugs. One, “Drugs Used in the Treatment of Psychiatric Disorders,” described the antipsychotic chlorpromazine, which had been introduced to Western medicine in 1954 as “among the most widely used drugs in the practice of medicine today” (Jarvik, 1965, p. 163). Benzodiazepines,

first given in clinical trials in 1961 and which became among the most widely used drugs in the practice of medicine in the 1970s, were discussed for the treatment of anxiety. Early but still useful treatments for depression were described in the form of monoamine oxidase inhibitors and tricyclic antidepressants. A section in this chapter on psychotogenic drugs described the behavioral and subjective effects of LSD and mescaline. In an update of the chapter “Narcotic Analgesics” (Jaffe, 1965b), the number of opiate drugs had increased to 20, although a relatively pure opioid antagonist (naloxone) would not be mentioned in this text for another 5 years, and then only briefly (Jaffe, 1970). The first chapter, “Drug Abuse and Drug Addiction” (Jaffe, 1965a), emphasized the behavioral aspects of the use of psychoactive drugs for recreational purposes. Both of these chapters included a description of some of the effects the drugs of interest had on intact, behaving animals. In other words, presentation of the effects of these psychoactive drugs included descriptions of their behavioral pharmacology and how they related to clinical use of the drugs. For example, although chlorpromazine and the tricyclic antidepressant imipramine have very different clinical uses, they both have sedative effects among humans. Animal studies of these drugs would be of limited utility if they showed only the common, sedative effects of these drugs. However, an important experiment conducted by Peter Dews (1962, as cited in Jarvik, 1965) demonstrated

Writing of this chapter was supported substantially by National Institute on Drug Abuse Grant DA 015449 to Gail Winger. DOI: 10.1037/13937-023 APA Handbook of Behavior Analysis: Vol. 1. Methods and Principles, G. J. Madden (Editor-in-Chief) Copyright © 2013 by the American Psychological Association. All rights reserved.

547

Winger and Woods

that chlorpromazine and imipramine had markedly different effects on the behavior of pigeons under specific experimental conditions. Dews’s findings indicated that behavioral pharmacology experiments using animal subjects could reveal much of relevance about the burgeoning number of new drugs to treat psychiatric symptoms in humans. They also suggested that a science that explores drug– environment–behavior interactions can help determine the conditions under which relevant behavioral effects of drugs can be observed and can increase researchers’ understanding of how pharmacological and contextual variables combine to affect behavior. Two additional factors contributed to the emergence and development of behavioral pharmacology. First, the 1960s saw an increase in the illicit use of psychoactive drugs. The health and cultural problems associated with these illicit activities motivated further scientific study of the behavioral effects of these drugs. Second, laboratory procedures developed within the experimental analysis of behavior were proving useful in identifying and categorizing psychoactive drugs, both in general and in the context of their therapeutic use. The combination of these factors made the birth of behavioral pharmacology an easy and natural one. Our goals in this chapter are first to define the field of behavioral pharmacology and place it in the context of other areas of research that involve studies of behavior–drug interactions. Our second goal is to provide the reader with some of the pharmacological concepts that are important to behavioral pharmacology, with a few examples of how behavioral pharmacology has benefitted from and contributed to the science of pharmacology. Finally, we describe the conditioning approaches that have contributed most to the field of behavioral pharmacology, with an emphasis on the consideration of drugs as stimuli that can interact with, modify, and be modified by behavior. What is Behavioral Pharmacology? What are the aspects, limits, and range of study of this branch of pharmacology concerned with the effects of drugs on behavior? Behavior is the ongoing, typically motoric behavior of an intact, living 548

organism, made in response to a stimulus that is either internally derived or externally applied. Drugs in this case are nearly always chemicals that enter and act primarily on the central nervous system to modify ongoing behavior, although they may have prominent action on peripheral neurons as well. As described by Kelleher and Morse (1968), behavioral pharmacologists are typically involved in one of two tasks. By far the more frequent task is to use wellstudied behavioral preparations to assist in explicating the effects of interesting, perhaps novel drugs. This task is the more frequent one in large part because the number of established, validated behavioral preparations are, or should be, fairly limited; however, the number of drugs are almost limitless. Less often, behavioral pharmacologists use a thoroughly studied drug to better understand a novel behavioral preparation. Occasionally, a scientist is ignorant of the actions of both the drug of interest and the behavior that seems best to reflect the drug’s actions. This undertaking is much riskier, but it can be quite useful if done carefully. It is also often the case that behavioral pharmacologists undertake a study in which both the behavior and the drug have been thoroughly investigated separately, but interesting questions remain as to how they interact.

Distinguishing Behavioral Pharmacology From Psychopharmacology Psychopharmacology is focused on the investigation of drugs that have or might have effects in treating psychiatric or neurological diseases such as schizophrenia or movement disorders. Psychopharmacologists are primarily interested in describing how these drugs modify human psychopathological behavior. Psychopharmacologists are likely to use large groups of patients, all having a particular psychiatric disease. Only one or two doses of the drug in question are typically given. If more than one dose is given, each dose is typically given to different groups of patients, and a separate group will be given a placebo or other control preparation. Psychopharmacologists generally want to know whether the disease gets better as a function of drug administration, and they use a large number of subjects so that even if the therapeutic effect is small, they will be able to detect it.

Behavioral Pharmacology

Behavioral pharmacologists can and do study the same drugs as psychopharmacologists, but they are likely to do so by giving the drugs to normal animals and observing how the animal’s behavior is changed. The animal’s behavior might be rendered abnormal by placing it in a swim tank (an animal model of depression) or in an elevated plus maze (an animal model of anxiety) or by administering a central stimulant (an animal model of psychosis). In these ways, the behavioral pharmacologist might be interested in some of the same questions that interest the psychopharmacologist; that is, how does this drug modify behavior that reflects a psychopathology? The scientific difference may be primarily in experimental methodology or in the type of inferences drawn from the research. In contrast to the psychopharmacologist, the behavioral pharmacologist typically uses a relatively small number of subjects, establishes stable predrug behavior patterns, and evaluates the effect of a number of doses of the test drug, including a dose that produces no effect (a placebo or nondrug control). Although this comparison suggests that behavioral pharmacologists study animal behavior and psychopharmacologists study the effects of drugs in humans, behavioral pharmacologists can study human behavior as well. Human behavioral pharmacologists, however, have the same fundamental approach to their investigation as do animal behavioral pharmacologists. They establish rigorous measures of behavior in a relatively small number of human subjects, administer several doses of the drug under study, carefully observe as many changes in the behavior as possible, and reestablish the non–drug-influenced behavior between drug administrations. Each subject is likely to receive each dose of the drug and placebo or a no-effect drug dose during the course of the study; the subjects may or may not have a psychiatric disease; and the outcomes may or may not be related directly to a potential therapeutic effect of the drug.

Distinguishing Behavioral Pharmacology From Neuroscience The overlap between some aspects of behavioral pharmacology and the neuroscience of drug action is sufficient to comment on how these two areas differ. The overlap is primarily in the drugs that are

studied. For example, central stimulants, excitatory amino acid agonists and antagonists, opioids, cannabinoids, and depressant drugs are used extensively by both behavioral pharmacologists and neuroscientists. Investigators in both areas may use similar behavioral tools to study these drugs. The difference lies primarily in where the two types of scientists look for explanatory principles (see Chapter 15, this volume). Neuroscientists are usually searching for an underlying brain mechanism that is related to a drug’s action. Their search often leads them to look for neural changes or other changes in the brain that develop as a consequence of drug administration. Even if they do not actually study neural changes that accompany drug effects, neuroscientists are quite willing to suggest what these changes might be. By contrast, behavioral pharmacologists typically do not have a strong investment in describing neural activity or neural changes that accompany drug action. They focus on how the drug alters ongoing behavior and find sufficient explanatory value in that behavioral change. Pharmacological Aspects of Behavioral Pharmacology Pharmacology has been defined as the science of the source, properties, physiological action, absorption, fate, excretion, and therapeutic uses of drugs (Gilman, 2001). There is nothing obvious in this definition, used since 1941, that is of direct relevance to behavioral pharmacology as we have defined it thus far. Nevertheless, some of the most interesting and important aspects of behavioral pharmacology have come from the study of drug action on isolated organs or on molecular events.

Pharmacodynamics Most of the drugs used to study behavioral or physiological systems produce effects by acting directly or indirectly on receptors. The receptors with which the drugs interact normally transduce stimuli provided by endogenous neurotransmitters, neuromodulators, or hormones and are usually located in the area in which synaptic connections between neurons occur. This area is where the various neurochemicals are released, have their actions, and are 549

Winger and Woods

quickly metabolized or removed. The endogenous chemicals either increase or decrease the chances that the downstream neuron will fire in response to an incoming stimulus, or occasionally even in the absence of such a stimulus. Drugs typically modify the probability of neuronal firing by acting directly on a specific receptor (e.g., opioid drugs), by altering the rate of removal of a specific endogenous neurotransmitter (e.g., cocaine and many other stimulant drugs), by changing the shape of a particular receptor and thereby modifying its sensitivity to the endogenous agent (e.g., benzodiazepines and barbiturates), or through one or more other chemical or physical interactions. Receptors are proteins located on cellular membranes. As suggested by their name, receptors have active sites that are sensitive to specific neurochemicals. They alter their structure when the active site is bound by the appropriate neurochemicals, which subsequently leads to a modification of the electrical or biochemical sensitivity of the cell. There are two fundamentally different types of receptors. One type is an ion pore in the cell membrane (an ionotropic receptor; Figure 23.1). As shown in Figure 23.1, ionotropic receptors are made up of several different

Figure 23.1. A drawing of an ionotropic receptor, located in the membrane of a postsynaptic neuron. This receptor has five sections, some of which are likely to have a different protein structure. Agonist interaction causes the receptor to modify its structure in such a way as to allow an ion to pass through to the inside of the neuron, which alters the possibility that the neuron will be activated. 550

subunits, and these subunits may have a slightly different protein structure. Subunits are typically given Greek labels. Thus, the nicotinic acetylcholine receptor is an ion pore that has a total of five subunits that may consist of alpha, beta, delta, and gamma subunits; the alpha and beta subunits themselves may exist in slightly different forms, which are distinguished numerically. For example, the nicotinic acetylcholine receptor, which appears to be responsible for the reinforcing effects of nicotine, is constituted with two alpha4 and three beta2 subunits and is often referred to as the alpha4beta2 nicotine receptor. These receptor form distinctions are important because receptors with specific subunit composition are located in distinct areas of the brain and likely subserve slightly different functions, even though they may be activated by the same endogenous agent. Some drugs have been synthesized to interact more or less selectively with receptors of different type or subtype composition. The collaboration between chemists and behavioral pharmacologists to design, synthesize, and evaluate these types of receptorsubtype–selective drugs has tremendous potential in furthering understanding of the specific behavioral function of that particular receptor type or subtype. The action of the neurochemical can result in the ion pore being open more or less frequently. When it is open, the pore allows ions to pass in or out of the cell, changing the neuron’s likelihood of firing. Action at ion channel receptors generally results in a fairly localized change in neuronal response through a brief change in the influx of charged ions through the receptor. The behavioral effects of drugs that act on specific ionotropic receptors often give important clues as to the function of that receptor class. The dissociative anesthetics phencyclidine and ketamine, for example, produce their effects by entering and blocking the ion pore of a specific type of ionotropic receptor that responds to actions of the major excitatory neurotransmitter, glutamate. Benzodiazepines and barbiturates, however, modify the shape of ionotropic GABA receptors so that they bind less readily to the primary endogenous inhibitory neurotransmitter, GABA. The second type of receptor is the G-proteincoupled receptor (GPCR), also called the metabotropic receptor. The GPCR has seven areas that pass

Behavioral Pharmacology

through the cell membrane (seven transmembrane domains), forming a circular structure somewhat like the ion pore (see Figure 23.2). GPCRs do not have distinct subunits; rather, they differ from each other in the amino acid arrangement of the receptor protein. The action of the drug or endogenous transmitter on this receptor initiates a biochemical cascade that differs depending on the particular receptor type. This cascade, in which each change in protein structure causes a change in another, downstream protein structure until a final modification of neuronal activity is produced, permits more amplification and modulation in outcome than does the ionotropic receptor. This presents the possibility of a much more nuanced and widespread outcome of the drug effect, which is especially important in relation to the heterogeneous distribution of GPCR receptors in the brain and spinal cord. All four types of opioid receptors (mu, kappa, delta, and nociception receptors) are GPCRs, as are receptors for dopamine and most receptors for

Figure 23.2. A drawing of a G-protein-coupled receptor, located in the membrane of a postsynaptic neuron. Also known as a metabotropic receptor, it contains seven transmembrane domains, connected by intracellular and extracellular amino acid loops. Agonist interaction at a specific extracellular site causes a change in protein structure, which causes an intracellular G-protein to detach from the receptor, activate a second messenger, and modify cell signaling processes.

serotonin. As with ionotropic receptors, there are frequently several subtypes of GPCR receptors, and evaluating drugs that have been designed to interact specifically with certain receptor subtypes has proven to be one of the most valuable contributions of behavioral pharmacology. The receptors that drugs act on are present in membranes to carry out the normal processes of the cell. In a few cases, the endogenous chemical that acts on these drug receptors has been identified. Frequently, the chemical was known before the receptor was discovered. For example, many smallmolecule neurotransmitters such as norepinephrine, dopamine, and serotonin were identified as neurotransmitters before their various receptors and receptor subtypes were discovered. These particular neurotransmitters are especially interesting to behavioral pharmacologists because the behavioral effects of many centrally acting drugs, including central nervous system stimulants (e.g., cocaine and amphetamine), hallucinogens (e.g., LSD and mescaline), and antidepressants (e.g., fluoxetine and sertraline) involve actions directly on one or more of these neurotransmitters’ receptors or blocking the reuptake and removal of one or more of these neurotransmitters. In other cases, the endogenous chemical was searched for and located after a receptor was found for a drug action. For example, researchers began looking for endogenous chemicals that interacted with the opioid receptor after these receptors were identified as the locus of action of morphine and morphinelike drugs. Identification of the various endogenous opioid peptides, endorphins, and enkephalins resulted from this investigation (Hughes, 1975). In still other instances, the drug receptor has been reasonably well described, but the endogenous ligand remains elusive. Endogenous ligands for the benzodiazepine receptor, for example, have not yet been definitively identified and may not exist. Description of the actions of drugs on receptors is the cornerstone of the development of receptor theory, which in turn is a cornerstone of the science of pharmacology. The description takes the form of dose–response functions wherein the effects of a range of doses of a particular drug are evaluated in a 551

Winger and Woods

specific assay (Kenakin, 1997). In most in vitro assay systems, such as smooth muscle preparations, the more drug that is given, the greater the effect, up to a certain maximum that usually reflects the limits of the assay’s response. When the dose is plotted logarithmically and the effect is plotted linearly, the resulting dose–response curve is sigmoidally shaped (Figure 23.3). When behavioral measures are used, dose–response curves of this shape are also frequently obtained. With many behavioral assays, however, across a substantial range of doses, target behavior may increase and then decrease, and the resulting dose–response curve has a bell shape or inverted-V shape (Figure 23.4). These curves may reflect several aspects of drug–behavior interaction. Action at one receptor may be responsible for the initial increasing function, and action at another receptor may account for the decreasing function. Alternatively, action at a single receptor may produce the increasing function, and continued action at this receptor may produce other behavior that is incompatible with the target behavior, which then decreases. The fact that behavioral dose–response curves may not precisely mimic the shape of the curves that have been used to establish and elaborate receptor theory should not detract from the necessity of obtaining complete dose–response curves for behavioral measures. Just as these curves are essential to understanding drug actions in nonbehavioral systems, permitting explication of the nature of the effect, allowing comparison of one drug’s effects to another drug’s effects, and confirming the receptor bases of the outcome through concurrent

Figure 23.3. A sigmoidally shaped dose–response function. Note the linearlog plot. 552

Figure 23.4. Increasing doses of the dopamine receptor agonist ropinirole produces an increase in yawning in rats. Further increases in dose yields a decrease in yawns, resulting in an inverted-V shaped dose–response curve. These previously unpublished data are courtesy of Greg Collins. Statistical significance of results versus vehicle controls according to a Dunnett’s test are indicated as * p .05, *** p .01.

administration of agonist and antagonist compounds, dose–response curves of whatever shape are critical in understanding precisely the same actions and interactions using behavioral preparations. Drug–receptor interactions. Receptor theory emphasizes three general parameters of drug– receptor interactions: affinity, or the tendency to bind to the receptor; efficacy, or the ability to produce a response once binding has occurred; and a complex interaction between affinity and efficacy called potency, which translates roughly into how much drug is required to produce an effect. Although affinity and efficacy can be compared in a meaningful way only among drugs that act at the same receptor, useful potency comparisons can be made among all drugs that produce the same effect. As an example, mu and kappa opioid agonists produce analgesia by actions on their separate receptors. Therefore, it is proper to compare the efficacies of the individual mu agonists by observing the amount of analgesia each produces (greater peak analgesia may indicate greater efficacy). However, comparisons between mu and kappa agonist efficacies cannot be made using this or any other assay. The potencies of these drugs, however, can be compared. If the dose–response function of a kappa agonist is located to the left of the function generated by

Behavioral Pharmacology

a mu agonist (i.e., the same responses are produced by smaller doses of the kappa agonist), a correct conclusion is that this particular kappa agonist is more potent than this particular mu agonist. Most drug–receptor interactions are quite temporary, with the drug attaching to and then rapidly coming off the receptor. Affinity, therefore, does not reflect how tightly or how long the drug attaches to the receptor but rather the likelihood that the drug will attach to the nearby receptor. Affinity is simple to measure in binding assays, in which the ability of a test compound to displace a radioactive ligand from the receptor is determined. Affinity is usually established by determining the drug’s ability to displace a labeled standard, and it is expressed as the Ki. The smaller this number is, the fewer drug molecules are necessary to inhibit binding of the standard, and the higher the affinity of the drug is. Table 23.1 displays the affinity of a group of opioids as reflected in their relative ability to displace radiolabeled DAMGO, a mu-selective agonist (Emmerson, Liu, Woods, & Medzihradsky, 1994). The agonist etonitazene and the antagonist quadazocine have the Table 23.1 Affinity of Several Opioid Agonists and Antagonists at the Mu Opioid Receptor, as Determined by Their Ability to Displace the Tritiated Mu Ligand DAMGO Compound DAMGO Etonitazene Fentanyl Levorphanol Morphine Nalmefene Naloxone Naltrexone Quadazocine Sufentanil

[3H]DAMGO 𝛍Ki; nM 1.23 0.02 1.48 1.23 2.66 0.13 0.62 (±0.04) 0.11 (±0.01) 0.03 0.19 (±0.01)

Note. From “Binding Affinity and Selectivity of Opioids at Mu, Delta, and Kappa Receptors in Monkey Brain Membranes,” by P. J. Emmerson, M.-R. Liu, J. H. Woods, and F. Medzihradsky, 1994, Journal of Pharmacology and Experimental Therapeutics, 271, p. 1634. Copyright 1994 by the American Society for Pharmacology and Experimental Therapeutics. Used with permission.

highest affinity among these drugs, whereas morphine has the lowest. Agonists are drugs that have both affinity and efficacy at their receptors; antagonists are drugs that have affinity but no efficacy at their receptors. Antagonists bind to the receptor in such a way as to reduce the likelihood of binding of both endogenous and exogenous ligands, but they may not cause a response themselves. There are three mechanisms by which apparent behavioral responses can be produced by established antagonists. First, a behavioral response may result if there is constitutive activity at the receptor (i.e., receptor activity in the absence of a stimulating agonist). If this constitutive activity is prevented by the antagonist, it could result in a behavioral response. Second, an antagonist response could result if tone is produced by the endogenous agonist at the receptor. An antagonist drug with no efficacy itself at this site could cause a response if it prevents full access to the receptor by the endogenous ligand (i.e., a reduction in tone). Finally, a drug that appears to be an efficacy-free ligand could produce a response if it acts as an inverse agonist at the site of action (vide infra). An inverse agonist is a drug that produces the opposite effect to the agonist or endogenous ligand. Antagonists come in a variety of types. Competitive antagonists, as do most agonists, bind to their receptors only briefly, and through the process of repeatedly attaching to and disengaging from the receptor, they prevent the agonist from having as much access to that receptor. However, agonists administered at large enough doses can produce their maximal effect by overwhelming the effects of the competitive antagonist, which is illustrated in Figure 23.5. Here, a single dose of the competitive mu opioid receptor antagonist nalmefene is given before administration of several intrathecal doses of the mu opioid receptor agonist morphine. Intrathecal morphine elicits scratching in the monkey, and nalmefene produces a systematic decrease in the potency of morphine in eliciting this response. Noncompetitive, or uncompetitive, antagonists bind to the receptor more tenaciously and thereby restrict access to agonists in a more absolute fashion than a competitive antagonist. Uncompetitive antagonism can also occur if the antagonist reduces the 553

Winger and Woods

Figure 23.5. Intrathecal morphine produces increased scratching in rhesus monkeys, which can be antagonized by administration of the opioid receptor antagonist nalmefene, which is shown by a rightward shift in the morphine dose– response curve. These previously unpublished data are courtesy of Mei-Chuan Ko.

effect of the agonist by distorting the shape or function of the receptor, as occurs with some N-methyl D-aspartate (NMDA) antagonists that enter and physically block a glutamate ionotropic receptor channel. Typically, the interaction between an agonist and its uncompetitive antagonist is described by a curve in which the maximal effect of the agonist can no longer be produced, even after administration of very large doses. Just as drugs can vary in their affinity for their receptor, they can also vary in their efficacy. Efficacy refers to a drug-based ability to activate a response once binding has occurred. A drug that produces the maximum possible response in all the systems in which it has been tested is usually called a full agonist. This label is slightly tenuous because the possibility always exists that a new drug will be synthesized that produces an even larger response than any previous drug and will become the new definition of a full agonist. If there is a maximum response that a given preparation is capable of, there may be several putative full agonists in this preparation, each producing the maximum response. To rank these drugs on the basis of their efficacy, another preparation is needed that is less sensitive to the effects of agonists at this receptor. The agonist with the highest efficacy is the one that produces the largest response in the least sensitive assay. This description should convey the notion that efficacy is a relative matter and involves comparisons 554

among drugs that act on the same receptor to produce their effects. Highly efficacious agonists (possible full agonists) produce larger effects than less efficacious agonists, and a group of drugs that act as agonists at the same receptor can be ranked with respect to their efficacy. An important rule is that the rank order of efficacy for a group of agonists acting at a receptor will not change across different preparations that measure an effect mediated through transduction of that particular receptor’s actions. An example follows: There are many mu opioid agonists, and the rank order of efficacy for three of them is first fentanyl, then morphine, then nalbuphine. Thus, fentanyl will produce a response greater than or equal to that of morphine, which will have a response greater than or equal to nalbuphine in all behavioral, physiological, or biochemical systems that reflect mu opioid receptor activity. If a system were found that ranked these drugs in anything other than this order, the drugs would almost certainly be acting on something other than or in addition to the mu opioid receptor alone. Test system sensitivity. Why, one might ask, would nalbuphine ever produce a response equal to that of fentanyl if fentanyl is an agonist with higher efficacy? The answer to this question raises another important point in drug–receptor pharmacology. The nature of the test system, behavioral or not,

Behavioral Pharmacology

plays a critical role in assessing efficacy. A very sensitive test system may reveal that one drug is more effective than another, whereas a less sensitive system may suggest that both drugs have full efficacy. For example, in analgesia evaluations using rhesus monkeys, fentanyl increases the latency of tail removal from warm water (55 °C). At a sufficiently large dose of fentanyl, latencies reach the maximum allowable time of 20 seconds. Nalbuphine also increases tail removal latencies but does not produce the maximum analgesic response when the water is 55 °C. After administration of substantial doses of nalbuphine, the monkey leaves its tail in very warm water for only about 1 second. This difference reflects the lower efficacy of nalbuphine relative to fentanyl. However, if the tests are done with water that is 48 °C, both drugs produce the full 20-second analgesic response. One cannot therefore determine any difference in the efficacy of nalbuphine and fentanyl using a 48 °C water analgesia assay (Gerak, Butelman, Woods, & France, 1994; Walker, Butelman, DeCosta, & Woods, 1993). What might sensitivity mean mechanistically? One way of thinking about this involves the concept of receptor reserve. Most drugs do not have a one-toone correspondence between their effect and binding to their receptor. Drugs with high efficacy can produce a full effect in an assay by binding to a fraction of the total available receptors. For these drugs, there is a surplus of receptors, or a receptor reserve. Drugs with less efficacy can produce a full effect by binding to a greater proportion of the total available receptors, leaving fewer receptors in reserve. Partial agonists, however, may not be able to produce a full effect even when they bind to all of the receptors. This concept applies to behavioral assays. Assume there are 1,000 mu opioid receptors involved in producing opioid-induced analgesia. When the water temperature is 55 °C, a high-efficacy agonist such as fentanyl may need to bind to 900 of these receptors to produce full analgesia. The most effective dose of nalbuphine in this assay may bind to all 1,000 of the receptors, but because it does not have fentanyl’s efficacy, it is unable to produce full analgesia. When the water temperature is decreased to 48 °C, full analgesia can be produced with less drug stimulus. Therefore, fentanyl may produce full

analgesia by binding to 50% of the available receptor population, and nalbuphine can produce full analgesia by binding to 90% of the available receptors. In this assay, at this water temperature, there is an increase in receptor reserve and an increase in the apparent relative efficacy of nalbuphine. Because each of these drugs is binding to the same receptor, a drug with less efficacy is able to antagonize the effects of drugs with greater intrinsic efficacy. This is the case only in assays in which the partial agonist is unable to produce as much response as a full agonist. If both agonists produce a full effect, the less efficacious agonists will not block the effect of the agonist with greater efficacy. When a partial agonist produces a partial effect, it can reduce the activity of a full agonist only to the level of that shown by the partial agonist. In the example given earlier of tail withdrawal analgesia, nalbuphine should be able to reduce the effect of fentanyl to that produced by nalbuphine alone when 55 °C water temperature is used as the noxious stimulus. Drug–receptor interactions have an additional complication: the occasional existence of drugs that act at a common receptor and produce effects that are opposite to those produced by the agonist. These drugs are termed inverse agonists. They have been well described primarily in the benzodiazepine system, although inverse agonist properties may account for some of the actions of nominal dopamine antagonists as well. Suppose the receptor is activated naturally only under certain conditions. For example, suppose natural ligands for the opioid receptor are released and bind to the receptor only after a painful stimulus or that natural ligands for the benzodiazepine receptor activate this receptor only when the individual is anxious. For these receptors, administration of the antagonist should have little effect as long as the organism is not in pain or anxious. Contrast this to receptors that are activated by their natural ligands on a regular basis. Dopamine receptors in the striatum, for example, are activated in the process of motor movements. In this case, an antagonist might have activity in all but sleeping or paralyzed organisms by blocking the naturally active dopamine. Administration of many dopamine antagonists has profound effects on the behavior of the organism. 555

Winger and Woods

These antagonist effects can be explained in at least two ways. One is that the antagonists block the naturally occurring tone in their particular system, and the resulting behavior change is the result of a reduced endogenous receptor occupation. The other possibility is that these drugs are not simply antagonists but are actually inverse agonists, producing effects that are opposite to those of the naturally occurring agonist. Separating these two possibilities can be a bit complicated in some systems and less difficult in others. Answering some of these questions about the mechanism of action of dopamine antagonists is an ongoing, important challenge, in part because of their importance in basic behavioral pharmacology and their increasing importance in clinical populations

Pharmacokinetics Drugs reach their receptors only after passing through various membranes that exist in the gastrointestinal tract (if the drug is taken orally), that are located between the blood and subcutaneous or muscle tissue (if the drug is given subcutaneously or intramuscularly), and that line blood vessels (if the drug is given by any route other than direct injection into the central nervous system). In addition, because centrally active drugs must access the brain, they must have the capacity to cross the extratight membrane (blood–brain barrier) that lines blood vessels throughout most of the central nervous system. Some drugs are more adept at crossing membranes than others, and the principles that govern crossing one membrane generally apply to all membranes. These principles are small size, fat solubility, and lack of an electrical charge. Because membranes have a central core of fat cells, fat-soluble drugs are more permeable than drugs that are largely water soluble. Membranes contain small pores (fenestrations) that are more resistant to large-molecule passage than small-molecule passage. In addition, membranes tend to reject the passage of ionized or charged drugs more than that of unionized drugs. The amount of ionization of a particular drug depends on whether it is in an acidic or basic environment, making passage of a drug through the highly acidic environment of the stomach different 556

from what it would be in the more basic environment of the blood. The blood–brain barrier is simply the membrane that surrounds the blood vessels that are located in the central nervous system. This membrane has relatively fewer fenestrations and is supported by additional cells that resist the passage of molecules. Therefore, for a drug to enter the brain from the blood, it must be even smaller or more fat soluble than drugs that pass through blood vessels in the rest of the body. The blood–brain barrier is far from absolute, and a number of active transport systems are designed to ensure that large, charged, or fatinsoluble compounds that the brain requires are given rapid access to the central nervous system. These compounds aside, this barrier restricts access of many compounds that are quite large or highly charged (e.g., peptides and protein-based drugs). In addition to studying how drugs pass through membranes, pharmacokinetics also involves studies of how drugs are absorbed, sequestered, and eliminated from the body. Behavioral pharmacologists frequently evaluate the duration of the behavioral effect they are studying, and their procedures are well suited to this task. Sometimes there are concerns that the duration of drug action does not last long enough for the behavior of interest to be measured in an experimental session. Under these conditions, time–action studies are appropriate. In these studies, behavior is plotted much like a dose– response curve but with time rather than dose on the x-axis. The duration of action of a drug is a critical aspect of its behavioral effect. Drugs with therapeutic actions—an antipsychotic or analgesic, for example—will be more useful if they last more than 24 hours and can therefore be taken once per day. Drugs used to treat drug abuse will have a distinct advantage if they have extremely long durations of action, and this determination is one that behavioral pharmacologists can easily make. Behavioral Aspects of Behavioral Pharmacology Drugs are capable of modifying virtually any type of behavior of an intact organism. Useful categorization of drug effects has been obtained from studies

Behavioral Pharmacology

that simply give drugs to animals and observe changes in ongoing behavior of whatever type is occurring (Koek, Woods, & Ornstein, 1987), which might be a recommended place to start with a drug whose behavioral effects are completely unknown, before selecting specific behavior to focus on more intensively, for example, a particular form of elicited behavior. More typically, the effects of drugs are studied on one of two types of conditioned behavior.

Drug Effects on Classically Conditioned Behavior Classically (Pavlovian) conditioned behavior, also known as respondent behavior or stimulus–stimulus learning, is a rich source of behavioral pharmacological study (see Chapter 13, this volume). When a previously neutral stimulus has been paired with another stimulus, an unconditioned stimulus (US), with positive or negative valence (e.g., food or electric shock), behavior similar to that elicited by the US is now elicited by the neutral stimulus (now termed a conditioned stimulus, or CS). Pavlov’s original example remains instructive: Dogs that originally salivated when food was presented but not when a tone was presented came to salivate to the tone after it had been paired several times with the delivery of food. Those of us who salivate while standing in front of a food-vending machine are demonstrating conditioning of exactly this type. Studies of the interaction of drugs with classically conditioned behavior often focus on the drugs’ ability to modify learning of the CS–US association. Studies of the interaction of various serotonin agonists on rabbits’ learning classically conditioned responses have been particularly interesting. There are several distinct serotonin (5HT) receptors, and many of these have receptor subtypes. The hallucinogen LSD is an agonist at the 5HT2A receptor, for example. LSD and other 5HT2A agonists enhance acquisition of several classically conditioned responses: an eyeblink response to a tone previously paired with an airpuff or a shock or jaw movement responses to a tone that was paired with water delivery (Gormezano, Harvey, & Aycock, 1980; Harvey, 2003). It is interesting that the drug enhances rates of learning when the time between the onset of the

CS and the delivery of the US is either very short or quite long (two conditions that typically produce slow learning; Harvey, 2003). Harvey (2003) showed that this effect of LSD was mediated through 5HT2A receptors by evaluating the ability of 5HT2A antagonists to block the LSDinduced enhancement of the acquisition of the conditioned response. Some of these antagonists had no effects of their own (so-called neutral antagonists), whereas others produced a decrease in the rate of acquisition of the conditioned response (relative to a no-LSD baseline), suggesting that they were inverse agonists. As predicted by receptor theory, the effects of both the agonist and the inverse agonists were blocked by a neutral antagonist, whereas the inverse agonist was able to block the effect of the agonist. Clearly, these drugs produce their effects on learning through the same receptor, and their categorization as agonists, inverse agonists, and antagonists was nicely made using a behavioral preparation. This is an example of using a well-established behavioral preparation to evaluate the effects of a particular drug whose effects on learning have not been thoroughly documented. It is also an example of the use of selective antagonists to confirm the receptor-selective nature of the effect of the agonist. These are fundamental demonstrations of these important procedures.

Drug Effects on Operant Behavior A second type of conditioned behavior is operant behavior, or behavior affected by its consequences. In contrast to studies of drug effects on the acquisition of classically conditioned behavior, studies of the effects of drugs on operant behavior usually evaluate the interaction of the drug with stable behavior. The early studies in behavioral pharmacology stressed the interactions of drugs with wellestablished, carefully controlled operant behavior, which continues to be the primary emphasis of a great deal of behavioral pharmacological research. The review mentioned at the beginning of this chapter (Kelleher & Morse, 1968) is a classic overview of the interaction of drugs and operant behavior and continues to instruct on this interaction. Kelleher and Morse (1968) began this review by making the important point that it is a mistake to 557

Winger and Woods

draw generalizations about drugs’ behavioral mechanisms of action on the basis of common notions about why behavior is maintained. As an example, consider a preparation used to examine the effects of drugs on avoidance behavior. In this preparation, a specific stimulus is followed by brief shock. A response in the presence of the shock turns it off (escape), and a response in the presence of the preshock stimulus turns off the stimulus and prevents the shock (avoidance). Drugs such as chlorpromazine, a major tranquilizer, and morphine, an analgesic, both reduce avoidance (behavior in the presence of the stimulus) but do not prevent the animal from making the same response to escape shock (Cook & Catania, 1964). A natural tendency might be to interpret this drug effect on the basis of the common notion that avoidance behavior is maintained by the termination of fear, anxiety, or both; that is, the drugs decrease these emotions, which reduces the motivation to engage in fear- or anxiety-terminating behavior. If this were the case, then chlorpromazine and morphine should also increase responding suppressed by the response-contingent presentation of electric shock (punishment). But neither of these drugs attenuates punished responding under most conditions (Geller, Bachman, & Seifter, 1963; Geller & Seifter, 1960; McMillan, 1973a, 1973b; McMillan & Leander, 1975; Morse, 1964), suggesting that their behavioral effects are not the result of a reduction in fear or anxiety (or of a reduction in the pain experienced with each shock). There are drugs that do increase rates of punished responding. Benzodiazepines, barbiturates, and to a lesser extent ethanol can produce increases in behavior suppressed by response-contingent shock or other aversive stimuli (e.g., Mansbach et al., 1988). However, benzodiazepines do not decrease shock avoidance in the same manner as chlorpromazine and morphine: The dose of a benzodiazepine required to suppress escape behavior is the same dose necessary to suppress avoidance behavior. In these situations, shock per se does not determine the effect of specific drugs but rather the way in which the shock is related to behavior. Whether an aversive stimulus serves as a negative reinforcer, increasing behavior that prevents it, or serves as a punisher, decreasing behavior that 558

produces it, makes a tremendous difference in determining the effects of various classes of drugs (Kelleher & Morse, 1964). A similar caution should be applied when considering the effects of drugs on behavior maintained by positive reinforcers such as food. Common notions about drug effects often do not, and should not, be used to explain their behavioral effects. Amphetamine, for example, increases locomotor activity across a limited dose range and is therefore considered a central stimulant (e.g., Lehr, Morse, & Dews, 1985). However, amphetamine usually decreases food consumption in a free-feeding situation (Siegel & Sterling, 1959), and stimulants have been used for weight reduction because of this effect. A common-notion account of these suppressing effects of amphetamine involves its ability to produce anorexia, or a decreased motivation for food. Extensive evaluation of the behavioral effects of amphetamine, however, has shown that its effects depend on the rate at which behavior is occurring before administration of the drug. If the rate is low, amphetamine tends to increase responding; if the rate is high, amphetamine tends to decrease responding (e.g., Dews, 1958). This rate-dependent effect holds true regardless of whether a positive reinforcer such as food or a negative reinforcer such as shock termination are maintaining the behavior (Davis, Kensler, & Dews, 1973). Under temporally defined schedules such as fixedinterval schedules, in which the first response after a specified time interval produces reinforcement, rates of responding are typically low at the beginning of each interval and increase as the time to reinforcer delivery approaches. At appropriate doses, amphetamine increases the low initial rates of responding and decreases the higher terminal rates of responding (Dews, 1958). Not all low rates of responding are increased by amphetamine, however. Rates of responding suppressed by punishment are not enhanced by doses of amphetamine that increase low fixed-interval rates of responding (Kelleher & Morse, 1968). Neither does amphetamine increase rates that are low because of extinction or because they have never resulted in reinforcer delivery (Kelleher & Morse, 1968). As noted by Kelleher and Morse (1968), the sedative barbiturates also have rate-dependent effects,

Behavioral Pharmacology

but these effects are somewhat different from those of amphetamine. Across a fairly narrow dose range, barbiturates can increase even rapid response rates, such as those that occur during the terminal parts of fixed intervals or during analogous segments of fixed-ratio schedules (in which reinforcement depends on a specified number of responses; Kelleher & Morse, 1968). An even more characteristic effect of barbiturates, however, and especially the sedative benzodiazepine agonists, with similar but more nuanced sedative effects, is their ability to increase response rates suppressed by punishment (Kelleher & Morse, 1968). This antipunishment effect of benzodiazepines is sufficiently profound that it is an effective method of evaluating the potential of a drug to produce anxiolytic effects in a clinical population (Witkin, Morrow, & Li, 2004). This is an example of using a drug to establish the diagnostic effectiveness of a procedure. Even a very mild, nonbenzodiazepine antianxiety compound, buspirone, has the ability to attenuate punished responding under fairly restricted conditions (Wojnicki & Barrett, 1993).

Drugs as Stimuli Another approach to the interaction of drugs and behavior is to consider psychoactive drugs as stimuli that occasion and reinforce behavior. Drugs may be conceived of as stimuli in much the same way as other environmental events—more specifically, as discriminative stimuli and reinforcing stimuli. Drugs as discriminative stimuli. The stimulus functions of drugs have been widely studied (e.g., Colpaert & Rosecrans, 1978). In fact, the psychoactive drugs that have not been established as discriminative stimuli and whose discriminative stimulus functions have not been compared across a range of doses and drug classes are very few. A database has been established and maintained that nicely catalogs studies of drugs as discriminative stimuli (http:// www.dd-database.org). In these studies, administration of a psychoactive drug is often used to direct behavior to one of two response options. When the drug has been given, behavior toward one option is reinforced; when the drug has not been given, behavior toward the other option is reinforced.

These interoceptive stimuli are similar in function to an external stimulus, such as a light or tone, that may be used to control operant behavior. These drug discrimination procedures have proven useful in the pharmacological classification of drugs. Because behaviorally active drugs usually produce a monotonic dose–response curve rather than a biphasic curve in a drug discrimination assay, the ability of a receptor-selective competitive antagonist to produce rightward shifts in an agonist drug discrimination curve can be used to determine the selective nature of the drug effect. Figure 23.6 provides an example of the ability of the selective nicotine antagonist, dihydroβerythroidine (DHβE), to competitively antagonize the discriminative stimulus effects of nicotine in rats. Using drug discrimination procedures to classify drugs was particularly helpful in early studies of various opioid agonists. Even before mu and kappa opioid receptors had been differentiated through receptor-binding studies, studies of the differences in the discriminative stimulus effects of these drugs left little room to doubt that these were distinct receptors. The drugs ethylketocyclazocine and morphine were designated as opioids because the behavioral effects of both could be reversed by the opioid antagonist naloxone. However, ethylketocyclazocine

Figure 23.6. The discriminative stimulus effects of nicotine are shifted in a parallel fashion to the right by increasing doses of the nicotine competitive antagonist dihydroβerythroidine (DHβE). Ordinate: Percentage of responding on the nicotine-appropriate lever. Abscissa: Dose of nicotine, given cumulatively, in milligrams per kilogram. These previously unpublished data are courtesy of Emily Jutkiewicz. 559

Winger and Woods

and morphine did not have the same interoceptive stimulus effects in monkeys (Hein, Young, Herling, & Woods, 1981). Both drugs were found to function as discriminative stimuli (demonstrating agonist action), but one drug would not substitute for the other. Therefore, the difference between these drugs were unlikely to be simply one of different efficacies at a single receptor. A thorough study of the discriminative stimulus effect of a variety of opioids with suggested mu and kappa selectivity was effective in confirming the distinction (Negus, Picker, & Dykstra, 1990). Among the important experiments were those involving administration of mu agonists and kappa agonists in combination. Several of the drugs that were discriminated as kappa agonists did not modify the discrimination produced by the mu agonist morphine, demonstrating that the kappa agonists were not antagonists or partial agonists at the mu receptor at which morphine acts. This finding confirmed that the effects of the two drug classes were mediated through distinct receptors and made the case that drug discrimination studies provide useful data to supplement and to stimulate analyses at more molecular levels (Negus et al., 1990). Although these studies on the differential discriminative stimulus effects of drugs acting on either mu or kappa opioid receptors supported the concept that these drugs were acting on distinct receptors, keep in mind that a new receptor should not be the first option to consider when two related drugs have different discriminative stimulus effects. Thorough assessment of potential differences in efficacy or affinity is required before reaching this conclusion. An example of more parsimonious options is found in studies of the behavioral effects of some drugs that bind to the benzodiazepine receptor. Benzodiazepines, including diazepam, chlordiazepoxide, oxazepam, and midazolam, among many others, have important uses as antianxiety or hypnotic agents, but with side effects of some concern, including abuse liability. These drugs produce their behavioral effects by binding to a site on the GABAA receptor and increasing the potency of GABA, the primary inhibitory neurotransmitter in the central nervous system. The ionotropic GABAA receptor on which benzodiazepines have a binding site consists of five subunits, each with several different 560

subtypes; there are four different alpha subunits, for example. The experimental drug L-838,417 was found to have selective affinity for GABAA receptors with certain alpha subunit types (McKernan et al., 2000). Animal studies revealed that L-838,417 did not have benzodiazepine-like discriminative stimulus effects, although it did have sedative and antianxiety effects (Rowlett, Platt, Lelas, Atack, & Dawson, 2005). Not only did these results suggest that it might be possible to separate the discriminative stimulus effects of benzodiazepines from the sedative effects, it also indicated that the discriminative stimulus effects of benzodiazepines might be mediated through actions at distinct subtypes of the GABAA receptor (Rowlett et al., 2005). As noted, however, other interpretations must be ruled out before action at a distinct receptor is demonstrated. McMahon and France (2006) replicated the findings that L-838,417 did not produce benzodiazepine-like discriminative stimulus effects. They then determined, however, that this drug was a competitive inhibitor of the discriminative stimulus effects of the benzodiazepine midazolam. This determination provided evidence that L-838,417 was not acting on a distinct GABAA receptor but was a low-efficacy agonist at the same site at which typical benzodiazepines act. The unusual pattern of effects of this drug (sedative effects, but no benzodiazepine-like discriminative stimulus effects) was because of the possibility that sedative effects may be a less sensitive behavioral outcome and can develop after administration of lower efficacy compounds, similar to the ability of the lower efficacy opioids described earlier to produce full analgesia in lower water temperature. There are several lessons here. One is that studies of the discriminative stimulus effects of drugs can provide a wealth of information about the pharmacological profiles of drugs. Another is that careful attention should be paid to the possibility that different discriminative stimulus effects of drugs that bind to the same class of receptors may be the result of their greater or lesser efficacies rather than of actions on unique receptors. The interactions between these drugs can be studied easily by administering them together and observing quantitative changes, or lack

Behavioral Pharmacology

thereof, that one drug produces on the effect of another. If a drug with reduced or no effect in a specific assay is able to antagonize the effect of the other drug in a competitive fashion, it is virtually certain that the two drugs are acting on the same receptor, albeit with important differences in efficacy at different subtypes. This was the case with L-838,417. Drugs as reinforcing stimuli. Whereas the discriminative stimulus effects of a drug refer to stimuli that precede responding (i.e., antecedent stimuli), the reinforcer effects of a drug refer to stimuli that follow a response (i.e., consequences). The reinforcing stimulus effects of drugs, particularly as models of human drug taking and abuse, have also been widely and usefully studied. The technique is typically referred to as a drug self-administration preparation and often involves implanting an intravenous catheter and allowing the animals to make an operant response, such as a lever press, that causes a small dose of the drug to be infused. This opportunity is repeatedly presented over daily sessions in the same way that the reinforcing effects of small portions of food or small amounts of water are studied in animals. Most drugs that are used extensively by humans for recreational purposes serve as reinforcing stimuli in rats or monkeys (Griffiths, Brady, & Bradford, 1979), and pharmacotherapies that reduce drug use in humans may also reduce the reinforcing stimulus functions of drugs in experimental animals (Winger & Woods, 1996). Thus, the drug self-administration model has potential in developing treatments for drug abuse, insofar as these treatments aim to reduce the reinforcing effects of drugs of abuse (Mello & Negus, 1996). Data collected in drug self-administration studies are typically not as useful for categorizing psychoactive drugs as data on discriminative stimulus effects of drugs. Whereas drug discrimination studies can group drugs into pharmacological classes (e.g., mu or kappa opioids, dopamine reuptake inhibitors, benzodiazepines, serotonin 2A agonists, or nicotinic agonists), self-administration studies usually have only one of two outcomes: The drug in question is either a reinforcer or it is not. Often, drugs that have clearly distinct discriminative stimulus and receptorbased effects (e.g., cocaine and alfentanil) serve

equally as well as reinforcing stimuli. This outcome might be modified depending on the drug selfadministration procedure used (e.g., drug history or fixed-ratio size), but generally there are only two categorical outcomes. Drug self-administration procedures have proven useful in evaluating agonist–antagonist interactions. Pretreatment with increasing doses of an opioid antagonist such as quadazocine or naltrexone will produce parallel rightward shifts in the reinforcing stimulus effects of alfentanil (Bertalmio & Woods, 1989), just as it will produce quantitatively identical rightward shifts in the analgesic and discriminative stimulus effects of opioid agonists (Bertalmio & Woods, 1987; Ko, Butelman, Traynor, & Woods, 1998). Thus, the same receptor type is responsible for each of these effects. Some interesting observations have been made regarding drug interactions while the effects of potential pharmacotherapies on drug self-administration have been studied. As indicated throughout this chapter, mu opioid antagonists can reduce the potency of mu opioid agonists in virtually all behavioral assays that have been studied, which conforms with what is known about the sites of actions of these drugs and with established receptor theory. As described in the previous paragraph, this interaction is observed with mu opioid receptor antagonism of the reinforcing effects of mu opioid receptor agonists, and it is also the case that mu opioid receptor antagonists provide exceptional treatment for abuse of heroin in humans, as long as full compliance with antagonist administration can be obtained. As a further example, buprenorphine, a partial mu agonist, is very effective in reducing heroin abuse. As a partial agonist, buprenorphine has both agonist and antagonist effects, and because both opioid agonists and opioid antagonists are able to reduce heroin abuse, discussion has been ongoing regarding which of these actions of buprenorphine is responsible for its therapeutic effects. Although the answer has not been clearly determined for human heroin abusers, animal studies make it quite clear that buprenorphine reduces the reinforcing effects of mu opioid agonists through its actions as an antagonist. Presession administration of buprenorphine produces dose-related rightward shifts in the dose–effect 561

Winger and Woods

curves described by opioid self-administration (Winger & Woods, 1996). This clear interaction between opioid receptor agonists and antagonists does not prevail when studying cocaine and dopamine antagonists, and nowhere is this more clear than in studies of these drugs in self-administration assays. One of the most likely mechanisms of cocaine’s behavioral effects, including its reinforcing effect, is blocking transporter reuptake of the neurotransmitter dopamine, thereby increasing the concentration of dopamine in the synaptic cleft. Dopamine acts on five different receptors, and these receptors are located in distinct areas and in different densities in the central nervous system. Theoretically, administration of selective antagonists at each of these receptors to animals that are selfadministering cocaine would provide an answer to the question of which receptor is most critical to the reinforcing effects of cocaine and also put forward potential pharmacotherapies for cocaine abuse. Agonists at this particular receptor should also serve as reinforcing stimuli. Unfortunately, the interaction between cocaine and dopamine antagonists has not been as clear cut as those between opioid agonists and antagonists. As reviewed by Witkin (1994), antagonists at the D1-like (D1 and D4) and D2-like (D2, D3, and D4) receptors modified some actions of cocaine, including apparent antagonism of its reinforcing effects. However, these findings had two consistent drawbacks. First, the effects were not robust, in that dose–effect curve shifts were small and limited; second, the drug doses that suppressed cocainemaintained responding also frequently suppressed food-maintained responding. More recent studies have not been much more promising, despite the synthesis of newer, more selective drugs. A D3 partial agonist, CJB090, produced parallel shifts to the right in cocaine’s discriminative stimulus effect and suppressed self-administration of a single dose of cocaine (no cocaine dose–response curves were obtained). However, this drug also suppressed food-maintained responding at doses that suppressed cocainemaintained responding, indicating that its ratedepressing effects may not have been selective (Martelle et al., 2007). A relatively selective D3 562

antagonist, NGB 2904, was unable to attenuate either the reinforcing or the discriminative stimulus effects of cocaine in rhesus monkeys (Martelle et al., 2007). The reasons why it is difficult to block cocaine’s behavioral effects selectively with dopamine receptor antagonists are unknown, but these findings are sufficiently consistent to have sent those seriously interested in cocaine abuse pharmacotherapies to dramatically different targets, including modifying the pharmacokinetics of cocaine (Collins et al., 2009). Other stimulus effects of drugs. When considering drugs as stimuli, their ability to serve as discriminative and reinforcing stimuli has received by far the most attention, and an extremely large literature has developed for both of these functions. However, these situations are certainly not the only ones in which drugs have stimulus properties. We describe five of these next; many more have been overlooked in the interests of time and space. First, drugs can act as stimuli that elicit specific behavior patterns. We described several examples earlier, usually in the form of drug-produced effects. Other examples include mu opioid receptor agonists, which elicit analgesia and scratching (Ko & Naughton, 2000); dopamine D3–preferring agonists, which elicit yawning (Depoortère et al., 2009); D2 receptor agonists, which elicit decreases in body temperature (Collins et al., 2007); amphetamine, which elicits increases in locomotor activity (e.g., Lehr, Morse & Dews, 1985); and d-amphetamine, which elicits focused stereotyped head movements in rats (Fowler, Pinkston, & Vorontsova, 2007). Thinking of these effects as drug stimulus–elicited effects rather than as drug-produced effects is a subtle modification, but one that can influence the general approach to behavioral pharmacology. If drugs are considered to act as stimuli, investigators and students are more likely to be challenged to consider their effects as subject to modification and conditioning, for example, rather than as immutable outcomes. Second, drugs can act as punishing stimuli to reduce behavior. Most studies of drug-contingent behavior have focused on drugs as positive reinforcers. As mentioned earlier, this work has tended to

Behavioral Pharmacology

categorize drugs as either reinforcers or nonreinforcers. Very little attention has been paid to a third possibility: that a drug may be aversive. This possibility can be evaluated in studies to determine whether a specific drug will serve as a punishing stimulus and, when delivered contingently on a response, reduce the rate of behavior maintained by another reinforcer. For example, responsecontingent delivery of intravenous histamine suppressed behavior maintained by food (Goldberg, 1980). Many drugs that are not self-administered by laboratory animals, including kappa opioid receptor agonists, many antipsychotic drugs, and hallucinogens such as LSD, should be evaluated in punishment paradigms so that their behavioral actions can be more thoroughly characterized. Third, drugs can modify the function of environmental contexts. When a specific environment (e.g., the inside of an experimental chamber) is correlated with the interoceptive stimulus effects of the drug, then the behavioral effects of the drug may be affected by the presence or absence of the context. For example, several effects of chronic drug administration are more robust in the environmental context in which they were produced. Cocaine, for example, produces an increase in locomotor activity that is facilitated when the drug is given over a period of days. This facilitation (or sensitization) is greater in the environment in which the drug has been given in the past; if cocaine is given in a novel environment, the locomotor facilitation effect is reduced (e.g., Post, Lockfeld, Squillace, & Contel, 1981). Similar outcomes have been reported with some measures of tolerance to the effects of morphine. With repeated administration of morphine, the analgesic effects of the drug are reduced, suggesting tolerance (the opposite of sensitization). However, tolerance occurs to a much greater extent when analgesia is tested repeatedly in the same environment and is less pronounced when analgesia is tested in a novel environment (Tiffany, Drobes, & Cepeda-Benito, 1992). Fourth, and related to the third, drugs themselves can function as the environmental context that modifies the action of other drugs. Haloperidol, for example, produces catalepsy the first time it is given, and this cataleptic response is facilitated

when the drug is given repeatedly (sensitization). After haloperidol sensitization, if haloperidol is tested in the presence of the NMDA antagonist MK-801, it reverts to producing the amount of catalepsy that occurred on the first administration. This response was originally interpreted as an ability of MK-801 to block sensitization to haloperidol. However, another interpretation is that MK-801 functions as a novel environmental context, and as previously noted, sensitization effects are reduced in novel contexts. Subsequent research showed that if MK-801 is given on every occasion on which haloperidol is given (making MK-801 the environmental context reliably correlated with haloperidol administration), sensitization in the form of increasing catalepsy occurred just as it would if haloperidol was given alone. If haloperidol is then given without MK-801 (a novel context), the cataleptic response is reduced to the initial response (Schmidt, Tzschentke, & Kretschmer, 1999). Clearly, sensitization to haloperidol is observed only under the environmental conditions in which it developed. Finally, drugs may serve as unconditioned stimuli in a classical conditioning context. The phenomenon of conditioned taste avoidance has received a large amount of research and thought. In conditioned taste avoidance, animals, rodents in particular, given any one of many drug types after exposure to a novel-tasting fluid, will subsequently decrease consumption of that fluid (Garcia & Koelling, 1966). Although there are many theories about the mechanism of this phenomenon, there is considerable, but not complete, consensus that the reduced intake of the novel-tasting fluid is the result of pairing it with the drug—a classical conditioning effect. The drug is the US, and the novel fluid is the CS. There are several arguments for and against these designations as well as for and against consideration of the conditioned taste avoidance paradigm as classical conditioning. This topic is far too complex to include in this discussion of behavioral pharmacology, but one side issue is of sufficient interest to include. That side issue is a modification of the typical conditioned taste avoidance situation to include an operant aspect. Here, animals are deprived of and respond for a consummatory stimulus such as flavored water and are given a drug after the session. 563

Winger and Woods

Two studies by the same investigators (D’Mello & Stolerman, 1978; Stolerman & D’Mello, 1978) suggested an interesting process underlying the findings, and it warrants more thorough investigation. In one of these studies (Stolerman & D’Mello, 1978), water-restricted rats were trained to respond on a fixed-ratio schedule, in which a specific number of responses produced water. In the second study (D’Mello & Stolerman, 1978), the conditions were the same, but a fixed-interval schedule was used. The water reinforcer was then replaced on occasion with a distinctively flavored solution, either salty or sour. For one group, when a salty solution replaced the water, the session was followed by administration of 1.0 milligrams per kilogram amphetamine; when the sour solution replaced the water, the session was followed by administration of saline. When water was the reinforcer, no injections were given. Only a temporary decrease was observed in responding maintained by the sour solution paired with saline injections. However, responding maintained by the salty solution paired with amphetamine administration was disrupted after the first paired amphetamine injection and declined further with additional pairings. The pattern of the decrease was that responding early in the session was unchanged, but response rates decreased over the course of the session. With repeated amphetamine–tastant pairings, responding decreased earlier and earlier in the session. As with studies of conditioned taste avoidance in general, arguments can be made about whether this is a demonstration of a drug serving as US in a Pavlovian sense to suppress responding in an operant paradigm. Whatever the theoretical interpretation, the findings are provocative and clearly show the power of drugs to act as stimuli. Many possibilities exist for further study of the interaction of conditioned responses, their contingencies, and drug administration. Conclusions and Summary Behavioral pharmacology is a relatively new field, born of the need to evaluate the burgeoning number of drugs used to treat behavioral disorders and sustained both by increasingly powerful behavioral 564

techniques for evaluating drug–behavior interactions and by chemical synthesis programs that provide more and more selective drugs for study using these techniques. The ultimate goal of behavioral pharmacology is to understand and treat human disease. With its distinct approach, it can provide information about drug–behavior interactions that are of considerable value in and of themselves (e.g., identification of drugs that can reduce the reinforcing effects of drugs of abuse) as well as procedures and information that can benefit neuroscience, medicinal chemistry, and human psychopharmacology. Behavioral pharmacology has shown itself to be a supportive member of the pharmacology community; rules and theories that were developed for other branches of pharmacology often apply well to studies of drugs and behavior. When they do not, investigations into why behavioral actions of drugs differ from actions measured at more molecular levels will certainly provide a deeper understanding of the drugs and the diseases they were designed to treat. There remains a lack of underlying principles to define and focus research on drug–behavior interactions. Various attempts over the years to develop organizing principles (Barrett, 2002; Branch, 2006; Thompson & Schuster, 1968) have provoked serious thought but are ultimately lacking. We hope that organizing principles may develop as behavioral pharmacology continues to grow and contribute to a deeper understanding of how the central nervous system functions in health and disease.

References Barrett, J. E. (2002). The emergence of behavioral pharmacology. Molecular Interventions, 2, 470–475. doi:10.1124/mi.2.8.470 Bertalmio, A. J., & Woods, J. H. (1987). Differentiation between mu and kappa receptor-mediated effects in opioid drug discrimination: Apparent pA2 analysis. Journal of Pharmacology and Experimental Therapeutics, 243, 591–597. Bertalmio, A. J., & Woods, J. H. (1989). Reinforcing effect of alfentanil is mediated by mu opioid receptors: Apparent pA2 analysis. Journal of Pharmacology and Experimental Therapeutics, 251, 455–460. Branch, M. N. (2006). How research in behavioral pharmacology informs behavioral science. Journal of the Experimental Analysis of Behavior, 85, 407–423. doi:10.1901/jeab.2006.130-04

Behavioral Pharmacology

Collins, G. T., Brim, R. L., Narasimhan, D., Ko, M.-C., Sunahara, R. K., Zhan, C.-G., & Woods, J. H. (2009). Cocaine esterase prevents cocaine-induced toxicity and the ongoing intravenous self-administration of cocaine in rats. Journal of Pharmacology and Experimental Therapeutics, 331, 445–455. doi:10.1124/ jpet.108.150029 Collins, G. T., Newman, A. H., Grundt, P., Rice, K. C., Husbands, S. M., Chauvignac, C., . . . Woods, J. H. (2007). Yawning and hypothermia in rats: Effects of dopamine D3 and D2 agonists and antagonists. Psychopharmacology, 193, 159–170. doi:10.1007/ s00213-007-0766-3 Colpaert, F. C., & Rosecrans, J. A. (1978). Stimulus properties of drugs: Ten years of progress. Amsterdam, the Netherlands: Elsevier/North Holland Biomedical Press. Cook, L., & Catania, A. C. (1964). Effects of drugs on avoidance and escape behavior. Federation Proceedings, 23, 818–835. Davis, T. R. A., Kensler, C. J., & Dews, P. B. (1973). Comparison of behavioral effects of nicotine, damphetamine, caffeine, and dimethylheptyl tetrahydrocannabinol in squirrel monkeys. Psychopharmacology, 32, 51–65. doi:10.1007/BF00421707 Depoortère, R., Bardin, L., Rodrigues, M., Abrial, E., Aliaga, M., & Newman-Tancredi, A. (2009). Penile erection and yawning induced by dopamine D2-like receptor agonists in rats: Influence of strain and contribution of dopamine D2, but not D3 and D4 receptors. Behavioural Pharmacology, 20, 303–311. doi:10.1097/FBP.0b013e32832ec5aa Dews, P. B. (1958). Studies on behavior: IV. Stimulant actions of methamphetamine. Journal of Pharmacology and Experimental Therapeutics, 122, 137–147. D’Mello, G. D., & Stolerman, I. P. (1978). Suppression of fixed-interval responding by flavor-amphetamine pairings in rats. Pharmacology, Biochemistry and Behavior, 9, 395–398. doi:10.1016/0091-3057(78) 90304-0 Emmerson, P. J., Liu, M.-R., Woods, J. H., & Medzihradsky, F. (1994). Binding affinity and selectivity of opioids at mu, delta and kappa receptors in monkey brain membranes. Journal of Pharmacology and Experimental Therapeutics, 271, 1630–1637. Fowler, S. C., Pinkston, J. W., & Vorontsova, E. (2007). Clorpromazine and prazosin slow the rhythm of head movements during focused stereotypy induced by d-amphetamine in rats. Psychopharmacology, 192, 219–230. doi:10.1007/s00213-007-0705-3 Garcia, J., & Koelling, R. A. (1966). Relation of cue to consequence in avoidance learning. Psychonomic Science, 4, 123–124. Geller, I., Bachman, E., & Seifter, J. (1963). Effects of reserpine and morphine on behavior suppressed by

punishment. Life Sciences, 2, 226–231. doi:10.1016/ 0024-3205(63)90002-X Geller, I., & Seifter, J. (1960). The effects of meprobamate, barbiturates, d-amphetamine and promazine on experimentally induced conflict in the rat. Psychopharmacology, 1, 482–492. doi:10.1007/ BF00429273 Gerak, L. R., Butelman, E. R., Woods, J. H., & France, C. P. (1994). Antinociceptive and respiratory effects of nalbuphine in rhesus monkeys. Journal of Pharmacology and Experimental Therapeutics, 271, 993–999. Gilman, A. G. (2001). Introduction. In J. G. Hardman, L. E. Limbird, & A. G. Gilman (Eds.), Goodman and Gilman’s the pharmacological basis of therapeutics (10th ed., pp. 1–2). New York, NY: McGraw-Hill. Goldberg, S. R. (1980). Histamine as a punisher in squirrel monkeys: Effects of pentobarbital, chlordiazepoxide and H1- and H2-receptor antagonists on behavior and cardiovascular responses. Journal of Pharmacology and Experimental Therapeutics, 214, 726–736. Goodman, L. S., & Gilman, A. (1955). The pharmacological basis of therapeutics (3rd ed.). New York, NY: Macmillan. Goodman, L. S., & Gilman, A. (Eds.). (1965). Goodman and Gilman’s the pharmacological basis of therapeutics (3rd ed.). New York, NY: Macmillan. Gormezano, I., Harvey, J. A., & Aycock, E. (1980). Sensory and associative effects of LSD on classical appetitive conditioning of the rabbit jaw movement response. Psychopharmacology, 70, 137–143. doi:10.1007/BF00435304 Griffiths, R. R., Brady, J., & Bradford, L. (1979). Predicting the abuse liability of drugs with animal self-administration procedure: Psychomotor stimulants and hallucinogens. In T. Thompson & P. B. Dews (Eds.), Advances in behavioral pharmacology (pp. 163–208). New York, NY: Academic Press. Harvey, J. A. (2003). Role of the serotonin 5-HT2A receptor in learning. Learning and Memory, 10, 355–362. doi:10.1101/lm.60803 Hein, D. W., Young, A. M., Herling, S., & Woods, J. H. (1981). Pharmacological analysis of the discriminative stimulus characteristics of ethylketazocine in the rhesus monkey. Journal of Pharmacology and Experimental Therapeutics, 218, 7–15. Hughes, J. (1975). Isolation of an endogenous compound from the brain with pharmacological properties similar to morphine. Brain Research, 88, 295–308. doi:10.1016/0006-8993(75)90391-1 Jaffe, J. H. (1965a). Drug addiction and drug abuse. In L. S. Goodman & A. Gilman (Eds.), The pharmacological basis of therapeutics (3rd ed., pp. 285–311). New York, NY: Macmillan. 565

Winger and Woods

Jaffe, J. H. (1965b). Narcotic analgesics. In L. S. Goodman & A. Gilman (Eds.), The pharmacological basis of therapeutics (3rd ed., pp. 247–284). New York, NY: Macmillan. Jaffe, J. H. (1970). Narcotic analgesics. In L. S. Goodman & A. Gilman (Eds.), The pharmacological basis of therapeutics (4th ed., pp. 276–313). New York, NY: Macmillan. Jarvik, M. E. (1965). Drugs used in the treatment of psychiatric disorders. In L. S. Goodman & A. Gilman (Eds.), The pharmacological basis of therapeutics (3rd ed., pp. 159–214). New York, NY: Macmillan. Kelleher, R. T., & Morse, W. H. (1964). Escape behavior and punished behavior. Federation Proceedings, 23, 808–817. Kelleher, R. T., & Morse, W. H. (1968). Determinants of the specificity of the behavioral effects of drugs. Ergebnisse der Physiologie, Biologischen Chemie und Experimentellen Pharmakologie, 60, 1–56. Kenakin, T. P. (1997). Pharmacological analysis of drugreceptor interaction. Philadelphia, PA: Lippincott-Raven. Ko, M.-C., Butelman, E. R., Traynor, J. R., & Woods, J. H. (1998). Differentiation of kappa opioid agonist-induced antinociception by naltrexone: Apparent pA2 analysis in rhesus monkeys. Journal of Pharmacology and Experimental Therapeutics, 285, 518–526. Ko, M.-C., & Naughton, N. N. (2000). An experimental itch model in monkeys. Anesthesiology, 92, 795–805. doi:10.1097/00000542-200003000-00023 Koek, W., Woods, J. H., & Ornstein, P. (1987). A simple and rapid method for assessing similarities among directly observable behavioral effects of drugs: PCPlike effects of 2-amion-5-phosphonovalerate in rats. Psychopharmacology, 91, 297–304. doi:10.1007/ BF00518181 Lehr, E., Morse, W. H., & Dews, P. B. (1985). Effects of drugs on schedule-controlled running of mice in a circular runway. Arzneimittelforschung, 34, 432–434. Mansbach, R. S., Harrod, C., Hoffman, S. M., Nader, M., Lei, Z., Witkin, J. M., & Barrett, J. E. (1988). Behavioral studies with anxiolytic drugs: V. Behavior and in vivo neurochemical analysis in pigeons of drugs that increase punished responding. Journal of Pharmacology and Experimental Therapeutics, 246, 114–120. Martelle, J. L., Claytor, R., Ross, J. T., Reboussin, B. A., Newman, A. H., & Nader, M. A. (2007). Effects of two novel D3-selective compounds, NGB 2904 [N-(4-(4-(2,3-dichlorophenyl)piperazin-1-yl)butyl)9H-fluorene-2-carboxamide] and CJB 090 [N-(4-(4-(2,3dichlorophenyl)piperazin-1-yl)butyl)-4-(pyridin-2-yl) benzamide], on the reinforcing and discriminative stimulus effects of cocaine in rhesus monkeys. Journal 566

of Pharmacology and Experimental Therapeutics, 321, 573–582. doi:10.1124/jpet.106.113571 McKernan, R. M., Rosahl, T. W., Reynolds, D. S., Sur, C., Wafford, K. A., Atack, J. R., . . . Whiting, P. J. (2000). Sedative but not anxiolytic properties of benzodiazepines are mediated by the GABAA receptor α1 subtype. Nature Neuroscience, 3, 587–592. McMahon, L. R., & France, C. P. (2006). Differential behavioral effects of low efficacy positive GABAA modulators in combination with benzodiazepines and a neuroactive steroid in rhesus monkeys. British Journal of Pharmacology, 147, 260–268. doi:10.1038/ sj.bjp.0706550 McMillan, D. E. (1973a). Drugs and punished responding: I. Rate-dependent effects under multiple schedules. Journal of the Experimental Analysis of Behavior, 19, 133–145. doi:10.1901/jeab.1973.19-133 McMillan, D. E. (1973b). Drugs and punished responding: III. Punishment intensity as a determinant of drug effect. Psychopharmacology, 30, 61–74. doi:10.1007/BF00422794 McMillan, D. E., & Leander, J. D. (1975). Drugs and punished responding: IV. Effects of drugs on responding suppressed by response-dependent and responseindependent electric shock. Archives Internationales de Pharmacodynamie et de Therapie, 213, 22–27. Mello, N. K., & Negus, S. S. (1996). Preclinical evaluation of pharmacotherapies for treatment of cocaine and opioid abuse using drug self-administration procedures. Neuropsychopharmacology, 14, 375–424. doi:10.1016/0893-133X(95)00274-H Morse, W. H. (1964). Effects of amobarbital and chlorpromazine on punished behavior in the pigeon. Psychopharmacology, 6, 286–294. doi:10.1007/ BF00413158 Negus, S. S., Picker, M. J., & Dykstra, L. A. (1990). Interactions between mu and kappa opioid agonists in the rat drug discrimination procedure. Psychopharmacology, 102, 465–473. doi:10.1007/ BF02247126 Post, R. M., Lockfeld, A., Squillace, K. M., & Contel, N. R. (1981). Drug-environment interaction: Context dependency of cocaine-induced behavioral sensitization. Life Sciences, 28, 755–760. doi:10.1016/00243205(81)90157-0 Rowlett, J. K., Platt, D. M., Lelas, S., Atack, J. R., & Dawson, G. R. (2005). Different GABAA receptor subtypes mediate the anxiolytic, abuse-related, and motor effects of benzodiazepine-like drugs in primates. Proceedings of the National Academy of Sciences of the United States of America, 102, 915–920. doi:10.1073/pnas.0405621102 Schmidt, W. J., Tzschentke, T. M., & Kretschmer, B. D. (1999). State-dependent blockade of haloperidolinduced sensitization of catalepsy by MK-801.

Behavioral Pharmacology

European Journal of Neuroscience, 11, 3365–3368. doi:10.1046/j.1460-9568.1999.00794.x

rhesus monkeys: Receptor mechanisms and temperature dependency. Journal of Pharmacology and Experimental Therapeutics, 267, 280–286.

Siegel, P. S., & Sterling, T. D. (1959). The anorexigenic action of dextro-amphetamine sulphate upon feeding responses of differing strength. Journal of Comparative and Physiological Psychology, 52, 179–182. doi:10.1037/h0045670

Winger, G., & Woods, J. H. (1996). Effects of buprenorphine on behavior maintained by heroin and alfentanil in rhesus monkeys. Behavioural Pharmacology, 7, 155–159. doi:10.1097/00008877-199603000-00006

Stolerman, I. P., & D’Mello, G. D. (1978). Amphetamineinduced taste aversion demonstrated with operant behavior. Pharmacology, Biochemistry and Behavior, 8, 107–111. doi:10.1016/0091-3057(78)90324-6

Witkin, J. M. (1994). Pharmacotherapy of cocaine abuse: Preclinical development. Neuroscience and Biobehavioral Reviews, 18, 121–142. doi:10.1016/ 0149-7634(94)90042-6

Thompson, T., & Schuster, C. R. (1968). Behavioral pharmacology. Englewood Cliffs, NJ: Prentice-Hall.

Witkin, J. M., Morrow, D., & Li, X. (2004). A rapid punishment procedure for detection of anxiolytic compounds in mice. Psychopharmacology, 172, 52–57. doi:10.1007/s00213-003-1618-4

Tiffany, S. T., Drobes, D. J., & Cepeda-Benito, A. (1992). Contribution of associative and nonassociative processes to the development of morphine tolerance. Psychopharmacology, 109, 185–190. doi:10.1007/ BF02245498 Walker, E. A., Butelman, E. R., DeCosta, B. R., & Woods, J. H. (1993). Opioid thermal antinociception in

Wojnicki, F. H. E., & Barrett, J. E. (1993). Anticonflict effects of buspirone and chlordiazepoxide in pigeons under a concurrent schedule with punishment and a changeover response. Psychopharmacology, 112, 26–33. doi:10.1007/BF02247360

567