[1st Edition] 9780128121696, 9780128121184

Psychology of Learning and Motivation, Volume 66, the latest release in this longstanding series publishes empirical and

313 79 5MB

Pages 320 [307] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

[1st Edition]
 9780128121696, 9780128121184

Table of contents :
Content:
Series EditorPage ii
CopyrightPage iv
ContributorsPages ix-x
Chapter One - Cracking the Problem of Inert Knowledge: Portable Strategies to Access Distant Analogs From MemoryOriginal Research ArticlePages 1-41Máximo Trench, Ricardo A. Minervino
Chapter Two - The Complexities of Learning Categories Through ComparisonsaOriginal Research ArticlePages 43-77Erin Jones Higgins
Chapter Three - Progress in Modeling Through Distributed Collaboration: Concepts, Tools and Category-Learning ExamplesOriginal Research ArticlePages 79-115Andy J. Wills, Garret O'Connell, Charlotte E.R. Edmunds, Angus B. Inkster
Chapter Four - Replicability, Response Bias, and Judgments, Oh My! A New Checklist for Evaluating the Perceptual Nature of Action-Specific EffectsOriginal Research ArticlePages 117-165Jessica K. Witt
Chapter Five - The Two Faces of Selective Memory Retrieval—Cognitive, Developmental, and Social ProcessesOriginal Research ArticlePages 167-209Karl-Heinz T. Bäuml, Alp Aslan, Magdalena Abel
Chapter Six - Prospective Memory in ContextOriginal Research ArticlePages 211-249Rebekah E. Smith
Chapter Seven - What Makes Everyday Scientific Reasoning So Challenging?Original Research ArticlePages 251-299Priti Shah, Audrey Michal, Amira Ibrahim, Rebecca Rhodes, Fernando Rodriguez
Contents of Previous VolumesPages 301-310

Citation preview

Series Editor

BRIAN H. ROSS Beckman Institute and Department of Psychology University of Illinois, Urbana, Illinois

Academic Press is an imprint of Elsevier 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States 525 B Street, Suite 1800, San Diego, CA 92101-4495, United States 125 London Wall, London EC2Y 5AS, United Kingdom The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom First edition 2017 Copyright © 2017 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

Notices

Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. ISBN: 978-0-12-812118-4 ISSN: 0079-7421 For information on all Academic Press publications visit our website at https://www.elsevier.com

Publisher: Zoe Kruze Acquisition Editor: Kirsten Shankland Editorial Project Manager: Hannah Colford Production Project Manager: Magesh Mahalingam Designer: Mark Rogers Typeset by TNQ Books and Journals

CONTRIBUTORS Magdalena Abel Regensburg University, Regensburg, Germany Alp Aslan Martin Luther University Halle-Wittenberg, Halle, Germany Karl-Heinz T. B€auml Regensburg University, Regensburg, Germany Charlotte E.R. Edmunds Plymouth University, Plymouth, United Kingdom Erin Jones Higgins Institute of Education Sciences, U.S. Department of Education, Washington, DC, United States Amira Ibrahim University of Michigan, Ann Arbor, MI, United States Angus B. Inkster Plymouth University, Plymouth, United Kingdom Audrey Michal Northwestern University, Evanston, IL, United States Ricardo A. Minervino University of Comahue, Cipolletti, Rio Negro, Argentina; National Council for Scientific and Technical Research (IPEHCS/CONICET-UNCo) Garret O’Connell Plymouth University, Plymouth, United Kingdom Rebecca Rhodes Johns Hopkins University Applied Physics Laboratory, Laurel, MD, United States Fernando Rodriguez University of California, Irvine, Irvine, CA, United States Priti Shah University of Michigan, Ann Arbor, MI, United States Rebekah E. Smith The University of Mississippi, Oxford, MS, United States

ix

j

x

Contributors

Maximo Trench University of Comahue, Bariloche, Rio Negro, Argentina; National Council for Scientific and Technical Research (IPEHCS/CONICET-UNCo) Andy J. Wills Plymouth University, Plymouth, United Kingdom Jessica K. Witt Colorado State University, Fort Collins, CO, United States

CHAPTER ONE

Cracking the Problem of Inert Knowledge: Portable Strategies to Access Distant Analogs From Memory M aximo Trench*, x, 1 and Ricardo A. Minervinox, { *University of Comahue, Bariloche, Rio Negro, Argentina x National Council for Scientific and Technical Research (IPEHCS/CONICET-UNCo) { University of Comahue, Cipolletti, Rio Negro, Argentina 1 Corresponding author: E-mail: [email protected]

Contents 1. Introduction 2. Diagnosing Our Ability to Retrieve Analogous Situations 2.1 Experimental Studies 2.2 Naturalistic Studies 2.3 Early Accounts of the Inconsistency Between Experimental and Naturalistic Results 2.4 A Hybrid Paradigm That Retains the Strengths of Experimental and Naturalistic Studies 2.4.1 Our Studies With Culturally Shared Base Analogs 2.4.2 Retrieval of Autobiographical Base Analogs

11 12 16

2.5 Alternative Means for Generating Distant Analogies 2.6 The Surface Bias of Analogical Retrieval: Friend and Foe 3. Overcoming Human Limitations for Retrieving Distant Analogs 3.1 Spontaneous Versus Voluntary Retrieval 3.2 The Late Abstraction Principle 3.3 Deriving “Portable” Cognitive Strategies Based on the Late Abstraction Principle 3.3.1 Comparing Disanalogous Targets 3.3.2 Inventing Analogous Targets 3.3.3 Constructing Idealized Representations of the Target

18 20 21 22 25 26 27 28 30

3.4 Can Late Abstraction Revive Inert Knowledge? 4. Conclusions References

Psychology of Learning and Motivation, Volume 66 ISSN 0079-7421 http://dx.doi.org/10.1016/bs.plm.2016.11.001

2 4 4 7 8

32 35 37

© 2017 Elsevier Inc. All rights reserved.

1

j

2

Maximo Trench and Ricardo A. Minervino

Abstract The retrieval of distant analogs lies at the core of the problem of inert knowledge. While experimental studies have documented our tendency to rely on surface similarities, naturalistic studies started to show a more balanced proportion of near and far retrievals, casting doubts on the validity of traditional experiments and on the adequacy of their associated computer simulations. By using a hybrid paradigm that retained both the ecological validity of naturalistic studies and the methodological control of the transfer paradigm, a series of studies carried out in our lab confirmed that surface similarity governs retrieval both in natural and experimental settings. After discussing these results, the present chapter reviews our successful and failed attempts to help learners overcome these competence limitations. Building upon Gentner et al.’s finding that distant retrieval can be boosted by comparing the target to a second analogous problem, our recent results demonstrate that the retrieval advantage of target elaborations can also be obtained without providing participants with target-specific information. We end by providing several plausibility arguments for the hypothesis that an abstraction of the target analog can enable access to distant analogs whose initial encoding had not highlighted their structural features.

1. INTRODUCTION Analogy lies at the core of human cognition (Hofstadter, 2001; Holland, Holyoak, Nisbett, & Thagard, 1986). Across activities as relevant as problem solving, hypothesis generation, argumentation or instruction, analogical comparisons enable the inductive projection of knowledge from a well-understood situation (the base analog or source analog) to a less known situation (the target analog). For this transfer to occur, the analogizer must have previously realized that the elements of the compared situations are linked by systems of relations that can be considered to be identical at a nontrivial level of abstraction (Gentner, 1983; Gentner & Markman, 2006; Holyoak & Thagard, 1995). As an example, the philosopher Cornelius Castoriadis compared the consumption of natural resources by capitalist societies with a segment of Hansel and Gretel’s tale by the Grimm brothers: just as Hansel and Gretel ate the chocolate walls without knowing that they were destroying their house, capitalist nations are devastating the forests without realizing that they are disturbing the ecosystem. Whereas in distant analogies like the above the corresponding objects and relations rarely maintain semantic similarity (e.g., Hansel 4 nations; eat 4 devastate, chocolate walls 4 forests), in near analogies the corresponding elements tend to be semantically similar. As an example of this second type of analogies, the environmental movie “Home” (Arthus-Bertrand, 2009) compared the devastation of forests by

Cracking the Problem of Inert Knowledge

3

current governments to the exhaustion of palm trees by the original inhabitants of Easter Island, who had to leave the island after running out of such resource. As opposed to the Hansel and Gretel analogy, in this case structural similarity comes together with a higher degree of semantic similarity (e.g., inhabitants 4 governments, exhaust 4 devastate, palm trees 4 forests). These semantic similarities have been called superficial similarities, to reflect the fact that they become negligible when more abstract identities are considered. In what follows we will refer to these two types of comparisons as superficially dissimilar and superficially similar analogies, respectively. Researchers of analogical reasoning agree that the process of understanding an analogy is not heavily dependent on surface similarities (Gentner, Rattermann, & Forbus, 1993; but see Ross, 1987, for cases where “crossmapped” surface similarity can be detrimental). As Castoriadis’ analogy makes clear, the absence of surface similarity between the compared situations does not complicate the process of finding the right analogical correspondences. In contrast to the role of surface similarities during mapping, the extent to which they determine the retrieval of base analogs from long-term memory (LTM) has been the subject of some debate. Whereas a wealth of experimental studies demonstrates that surface similarity exerts a powerful effect on retrieval (e.g., Gentner et al., 1993; Keane, 1987; Ross, 1989), a more recent generation of naturalistic studies suggest that the retrieval of participants’ own sources during real-world tasks does not require surface similarities (e.g., Blanchette & Dunbar, 2000, 2001). Besides its obvious relevance to the basic question about the nature of human intelligence, the empirical inconsistency between the experimental and the naturalistic traditions has important implications for the design of educational interventions. If the naturalistic tradition is right, then only by further inspecting natural environments will educators identify the secret ingredients behind successful transfer. If, on the contrary, it is experimental studies that got it right, then natural environments no longer represent a privileged window into successful transfer, and theory driven intuitions should retain their status as sources of inspiration for educational interventions. In the first part of the present chapter, we will flesh out in greater detail the counterpoint between the experimental and the naturalistic approaches to analogical retrieval and present data allegedly capable of settling the debate around human competence for distant retrieval. Based on this diagnosis, in the second part we review what we consider to be the most realistic avenues for honing in on a set of applicable interventions to make distant analogizing more likely.

4

Maximo Trench and Ricardo A. Minervino

2. DIAGNOSING OUR ABILITY TO RETRIEVE ANALOGOUS SITUATIONS Many years ago, educators have coined the expression “the problem of inert knowledge” to refer to their observation that students seldom apply their learning to novel situations with differing content. However, only in the last four decades the magnitude and the reasons behind this alleged difficulty were empirically scrutinized in more detail. Educators’ observations were followed first by highly controlled laboratory experiments and only next by more exploratory observational studies. In what follows, we present the findings of both traditions in a manner faithful to their chronological succession.

2.1 Experimental Studies The most widely used paradigm for investigating analogical retrieval comprises two phases. During the learning phase, participants receive the base analog, usually framed between several distracters. During the transfer phase, participants receive the target analog embedded in cognitive tasks for which retrieving the critical base analog can be potentially useful, and experimenters assess whether the processing of the target triggers the retrieval of the base analog. To assess the weight of surface similarities in analogical retrieval, Keane (1987, Experiment 1) presented two groups of participants with Duncker’s (1945) Radiation problem, which began stating that a certain doctor was urged to save a patient who had an inoperable tumor in his or her stomach. The doctor had a type of ray that could destroy the tumor if applied with the required intensity, but at such intensity it would also destroy healthy tissues that needed to be preserved. Participants of the study were asked to suggest possible ways of using such rays to destroy the tumor without harming the surrounding tissues. During a prior session, both groups of participants had read a base story that was analogous to the tumor problem, and whose solution was potentially useful for solving the Radiation problem. Participants in the low surface similarity condition received Gick and Holyoak’s (1980) story The General, which told about a small country that was ruled by a dictator from a fortress that could only be captured by a large army. The story has it that there was a rebel General who had a sufficiently large army, but who could not take such army along any of the roads that led to the fortress because they were mined to explode

Cracking the Problem of Inert Knowledge

5

if large groups passed over them. The story finished that the General managed to capture the fortress by dividing his army into smaller groups, locating each on a different road, and having them converge to the fortress without detonating the mines. The story received by participants of the high surface similarity condition maintained both structural and superficial similarity with the target. It told about a surgeon who wanted to destroy a brain cancer with a type of ray. The main problem was that if such rays were strong enough to destroy the cancer, they would also destroy healthy brain tissue. The story ended that the surgeon could cure his or her patient by means of dividing the ray up into batches of low intensity rays and by sending them simultaneously from different directions. In so doing, the low intensity rays did not harm the healthy tissue but converged on the cancer at full strength. Results showed that the retrieval of superficially similar sources (88%) was significantly higher than the retrieval of superficially dissimilar ones (12%). In a second experiment, Keane (1987) further investigated the precise kinds of similarities that were required for retrieval to occur. Just like in the previous experiment, in one of the conditions, experimenters assessed the retrieval of the story The General during the activity of solving the Radiation problem. In the two remaining conditions, the Radiation problem was preceded by one of two variations of the original military story in which the General needed to use high intensity rays (or high powered laser beams, depending on the condition) to destroy an intercontinental missile entering the atmosphere at great speed, with the restriction that rays (beams) of the required intensity would lose accuracy due to overheating the atmosphere through which they traveled. Retrieval of these two base analogs (58% and 53%, respectively) was much higher than that of the original military story, suggesting that retrieval does not necessitate that the base analog belongs to the same thematic domain as the target (i.e., medical situations). More importantly, the fact that the retrieval performance of these two conditions did not differ from each other was taken to demonstrate that retrieval does not require the base and the target to share lexically identical elements (e.g., two types of rays), being sufficient that they share a semantically similar element (rays vs. laser beams). As opposed to studies of problem solving, studies of story reminding seldom impose a contextual separation between the phases, since the salience of the critical sources is reduced by means of embedding them among a larger number of distracter stories. During the retrieval phase, participants

6

Maximo Trench and Ricardo A. Minervino

read the target analog with the instruction to indicate which of the stories read during the previous session this new story reminds them of. Just like in the problem-solving literature, studies of story-reminding reveal that sources bearing superficial similarity with the target are much more likely to be retrieved than sources that do not maintain such similarities (e.g., Catrambone, 2002; Gentner et al., 1993). The results from the problemsolving and the story-reminding traditions led researchers to conclude that superficial similarity represents a crucial precondition for analogical retrieval. To simulate this pattern of results, proponents of the structure mapping theory (Gentner, 1983) developed Many Are Called, Few Are Chosen (MAC/ FAC, Forbus, Gentner, & Law, 1995), an algorithm that divides retrieval into two phases: MAC, a fast superficial filter, and FAC, a computationally expensive structural matcher. The MAC phase begins by generating content vectors for the target and every situation stored in LTM with each content vector being generated by assigning a position in an ordered series to all concepts in LTM and counting how many times each concept appears in each of the stored situations. Upon calculating the vector products between the content vector of the target and the vectors of all situations in LTM, the MAC stage submits base situations yielding higher dot products (most of them superficially similar to the target) to the FAC stage. For each of these candidate sources, FAC starts by creating all possible correspondences between elements of the same formal type, with the added restriction that mapped relations must have identical meaning. The program then incrementally coalesces local matches into larger mappings that meet the constraints of parallel connectivity (if two predicates are mapped, their arguments must also be mapped) and one-to-one mapping (elements in one analog must map to only one element in the other analog). Finally, FAC estimates the quality of global mappings as a function of their size, their depth, and the semantic similarity of their corresponding objects. This last criterion further increments MACs’ bias towards base analogs bearing superficial similarity with the target. Albeit different from MAC/FAC in terms of representations and underlying computations, other influential models such as Analogical Retrieval by Constraint Satisfaction (ARCS, Thagard, Holyoak, Nelson, & Gochfeld, 1990 or Learning and Inference with Schemas and Analogies (LISA, Hummel & Holyoak, 1997) also yield a dominance of surface-based remindings over purely relational matches.

Cracking the Problem of Inert Knowledge

7

In sum, the results of a large body of experimental studies has motivated the theorization and modeling of analogical retrieval as a process that relies very heavily on surface similarities between the target situation and the potential source analogs stored in LTM.

2.2 Naturalistic Studies With the turn of the century, a number of observational studies focused on the use of analogies by trained professionals as they worked within their areas of expertise. In line with the experimental tradition, the observation of molecular biologists (Dunbar, 1997) and psychologists (Saner & Schunn, 1999) showed that most of the base analogs involved in their analogies tended to be thematically very close to the target situation. However, further classifications of the analogies in terms of the goal of the analogizer (e.g., fixing a failed experiment vs. accounting for an unexpected result vs. illustrating a complex idea) revealed that the goals of the analogizer can modulate the semantic distance of the generated analogies to some extent. In an influential study, Blanchette and Dunbar (2001) analyzed more than 400 articles that appeared in four important newspapers from Montreal prior to the 1995 referendum on the independence of Quebec. From a total of more than 200 analogies, around three-quarters were drawn to situations outside the fields of economy or politics, such as agriculture, family, sports, magic and religion. The authors interpreted their results as indicating that when people are drawing on their own memories to build analogies for a realistic and meaningful target situation, the retrieval of analogical sources is less constrained by superficial similarity than in the experimental tradition. In another observational study, Richland, Holyoak, and Stigler (2004) analyzed the spontaneous use of analogies by eighth-grade mathematics teachers. Transcripts of these lessons revealed that the degree of surface similarity of the sources varied greatly as a function of the type of knowledge the teacher was trying to impart. While instruction of procedures was grounded mainly on near analogies, the transmission of conceptual knowledge tended to be rooted on superficially dissimilar or even nonmathematical comparisons. In the words of the authors, the fact that most of the analogies involved minimal perceptual similarity between source and target objects was taken to suggest that “in the classroom setting, teachers are more successful than typical laboratory participants at developing structural analogies” (Richland et al., 2004, p. 49).

8

Maximo Trench and Ricardo A. Minervino

In a recently study on the creation of new products in the domain of medical plastics, Christensen and Schunn (2007) documented the analogies generated by design engineers during product-development meetings. The transcripts revealed a mean of 11.3 analogies per hour of verbal data, which emerged during activities as diverse as developing concepts, solving design problems, planning data collection or evaluating mock-ups and prototypes. After sorting analogies in terms of their purpose, the authors obtained that superficially dissimilar analogies were almost as frequent as superficially similar analogies when the goal was to solve a problem, and twice as frequent when the goal was to communicate or explain ideas to other members of the group. Yet more recently, Kretz and Krawczyk (2014) analyzed the spontaneous analogies produced by faculty and graduate students of Economy during weekly reading group meetings at the University of Texas. The authors documented an average of 16 analogies per hour of discussion, with an even proportion of superficially similar and superficially dissimilar analogies. An analysis of the relation between the purpose of the analogy and its semantic distance revealed that participants tended to use distant analogies when trying to generate visual images and close analogies for the purpose of exemplification. To summarize, a growing body of naturalistic studies on the spontaneous use of analogy in real-world activities suggests that professionals of various fields seem to flexibly access superficially dissimilar sources from their background knowledge in a manner that is certainly not predicted on the basis of traditional experimental results (e.g., Gentner et al., 1993; Keane, 1987).

2.3 Early Accounts of the Inconsistency Between Experimental and Naturalistic Results An obvious mismatch between the experimental and the naturalistic tradition concerns the expertise of the participants: while the former invariably involve novice university students, the later typically involve trained experts. To generalize to a nonexpert population the results of their observational study with journalists and politicians, Blanchette and Dunbar (2000) asked college students to generate persuasive analogies for another real-world political issue: the zero-deficit strategy. After providing participants with a description of the mechanisms by which public debts tend to increase, participants in the pro zero-deficit condition had to pretend they were hired by a nonprofit organization to generate analogies that could be used to

Cracking the Problem of Inert Knowledge

9

persuade the population of supporting massive cuts to spending in education, security and social programs on the grounds that future cuts would otherwise be more dramatic. Participants in the anti zero-deficit condition were asked to propose analogies to persuade the public that basic services should not be discontinued and that other ways of dealing with the deficit needed to be envisioned. Regardless of condition, this production paradigm (as Blanchette and Dunbar termed it) elicited an overwhelming majority of base analogs lacking surface similarities with the target, thus providing an in vitro replication of the previous naturalistic observation that journalists can easily access long-distance sources while generating analogies to realworld target situations. According to various authors (e.g., Dunbar, 2001; Hofstadter & Sander, 2013; O’keefe & Costello, 2008), the results of naturalistic studies call into question the psychological validity of laboratory experiments failing to obtain distant retrievals, as well as the accuracy of the psychological theories and computational models developed after such behavioral results. Dunbar (2001) interpreted these results as suggesting that naturalistic settings and ecologically valid tasks promote an abstract encoding of the base and target analogs, respectively. Given that the base and the target had received a structural encoding, the retrieval of the sources no longer requires that they maintain superficial similarity with the target. If naturally encountered situations are more prone to a structural encoding than the sources typically learned under experimental settings, where does such advantage originate? One of the many mismatches between naturally encountered situations and the typical experimental stimuli is that while the former tend to include an auditory component, the latter typically involve the presentation of written materials. Upon experimentally demonstrating that an auditory presentation of both the base and the target analogs leads to better recall of purely relational matches than a written presentation, Markman, Taylor, and Gentner (2007) conjectured that such advantage could underlie the alleged retrieval advantage of naturally learned base analogs. Another potentially important difference between naturally versus experimentally encoded situations could reside in the fact that the former tend to be relatively more meaningful for participants (Hofstadter & Sander, 2013). Albeit appealing, these hypotheses are to be taken with caution, since the idea that naturally encoded situations are more advantageous for distant retrievals is at minimum counterintuitive. On the one hand, the temporal

10

Maximo Trench and Ricardo A. Minervino

separation between learning and retrieval tends to be very brief in experimental studies and very large in real life, spanning from months to years, or even decades. On the other hand, both the physical and the functional contexts that surround learning and transfer tend to be weakly or moderately separated in the laboratory and very large in natural settings. And on top of this, while the base and the target analogs typically used in experimental studies (e.g., Keane, 1987 or Gentner et al.’s 1993 stimuli) tend to be cast in ways that maximize their structural overlap, natural base and target situations were not created with the explicit goal of taking part in analogies and thus present large differences in size, frequent structural mismatches and a bulk of irrelevant information. Not surprisingly, many interesting interventions that work well in the vacuum chamber of psychology laboratories do not work under the noisy conditions that characterize real educational environments. With these caveats in mind, we shifted from trying to explain the alleged facilitation of distant retrieval in naturalistic settings to trying to assess whether naturalistic settings in fact support this kind of superficially unconstrained retrieval. Even though the production paradigm implemented by Blanchette and Dunbar (2000) can potentially exert a higher degree of control than purely observational approaches, it still falls short of demonstrating a negligible role of superficial similarities during naturalistic retrieval. Compared to the traditional two-phase procedure followed in experimental studies, the production paradigm bears three important methodological shortcomings. The first limitation resides in not implementing any means of distinguishing true instances of analogical retrieval from instances of analogy fabrication (i.e., ad hoc invention of base analogs). For example, once the reasoner has understood the abstract structure of a target situation, he or she can trivially generate novel exemplars (e.g., infections of herpes/malaria/choose your favorite infection that were not treated in time, causing later treatment to be more difficult than would have been otherwise). Second, even if a means of distinguishing true retrievals from cases of analogy fabrication were implemented, the proportions of near versus far analogies reported by participants may not faithfully reflect the retrieval tendencies of the system, since under particular pragmatics like persuasion, participants could avoid reporting several surface similar analogies, which tend to be, by definition, very similar to each other. Finally, even if a means for revealing all retrieved sources were implemented, yet another insufficiency of the production paradigm concerns the unknown proportions of close versus

Cracking the Problem of Inert Knowledge

11

distant base analogs potentially available in LTM. To illustrate, consider a participant retrieving six near and six distant sources during an analogy generation task. In case this participant had, say, 10 near and 10 distant sources stored in LTM, the retrieval probabilities of near and distant base analogs could be considered superficially unconstrained (60% of each type). If, on the contrary, he or she had 10 near sources and 60 distant sources available in LTM, the very same retrieval outcome would now indicate that the retrieval of base analogs was superficially biased (60% of superficially similar vs. 10% of superficially dissimilar sources). Therefore, assessing the extent to which surface similarity determines the retrieval of naturally acquired base analogs requires knowing not only the number of far and near sources that were successfully retrieved, but also the number of instances of both types of base analogs that were potentially available for retrieval (i.e., for each kind of analogies, retrieval probability ¼ number of retrieved sources/number of available sources).

2.4 A Hybrid Paradigm That Retains the Strengths of Experimental and Naturalistic Studies Needless to say, it is empirically impossible to determine exhaustively how many near and far analogs are stored in a person’s memory, so as to carry out the above computation. To overcome this limitation, we took inspiration from the procedure used by Chen, Mo, and Honomichl (2004) to assess analogical retrieval after longer intervals than two-phase experimental studies permit. In such study, Chen et al. assessed Chinese and American participant’s retrieval of popular tales after analogous problems with different types of surface similarity (similarity of goal object  similarity of solution tool). Results showed a surprisingly high overall percentage of participants being reminded of the source tale, with slightly higher scores by participants receiving targets with relatively higher levels of surface similarity. Despite the subtlety of the surface similarity manipulation, even in the least superficially similar condition there were some identical relations and objects shared between the source and the target (e.g., the goal of weighing a heavy object or the explicitly stated absence of a sufficiently large scale). Given that Keane (1987, see previous section) had shown that even in experimental settings the presence of at least an identical or very similar object suffices to elicit retrieval, Chen et al.’s results do not tell whether the popular tales would have been retrieved in the absence of such similarities.

12

Maximo Trench and Ricardo A. Minervino

More recently, Dehghani, Gentner, Forbus, Ekhtiari, and Sachdeva (2009) assessed the retrieval of culturally shared fables while resolving dilemmatic choices between two alternative courses of action, a very engaging and ecologically valid task somewhat related to those of Blanchette and Dunbar’s naturalistic studies. Like in Chen et al., even in the conditions with lower surface similarity the average retrieval rates were higher than those typically obtained in laboratory experiments (between 70% and 90%). But again, the presence of several identical elements precludes using their results as a basis for inferring the extent to which naturalistic encodings and target tasks afford the retrieval of sources lacking identical elements with the targets. 2.4.1 Our Studies With Culturally Shared Base Analogs Despite the abovementioned limitations in the selected materials, we reasoned that the idea of assessing the retrieval of culturally shared episodes during ecologically valid target tasks had the potential of bridging the gap between the experimental and the naturalistic traditions on analogical retrieval, since it could easily be adapted in ways that retain the ecological validity of the production paradigm without sacrificing the methodological control of the standard, two-phase transfer experiments. Just as in the production paradigm, participants of Trench and Minervino (2015a, Experiment 1) had to generate persuasive analogies in response to a highly realistic target situationdan activity that according to Blanchette and Dunbar (2000) promotes an encoding of the targets that emphasizes their abstract structure. Second, participants were asked to draw on their own extra-experimental memories, which according to authors, such as Dunbar (2001) or O’keefe and Costello (2008), receive a more abstract encoding than the base analogs typically used in traditional experiments. Having retained these distinctive advantages of the production paradigm as originally implemented, our procedure also preserved those features of the standard experimental paradigm that afford controlling the distorting effects of analogy fabrication, report bias and potentially uneven availability of near versus distant base analogs in LTM. To control for this last factor, in Experiment 1 we calculated the probability of being reminded of specific episodes from popular movies (the sources) during the task of generating persuasive analogies for realistic situations that maintained different degrees of superficial similarity with such episodes (the targets). By restricting the data analysis to the retrieval of such specific base analogs, we were able to calculate (and therefore compare) the accessibility of close versus distant

Cracking the Problem of Inert Knowledge

13

naturalistic sources in terms of quotients between the number of retrieved cases and the number of cases available for retrieval. On the other hand, the improbability of inventing a situation identical to a culturally shared episode helped ensuring that all analogies built upon those episodes were originated in the retrieval of such episodes from memory. And finally, the possibility of directly querying participants about whether they were reminded of the critical episode was key to neutralizing an eventual bias in the report of near versus far retrievals. As an example of our materials, we assessed the retrieval of Jurassic Park (a movie seen by almost all participants independently of the study) while dealing with an analogous situation that had either high or low levels of surface similarity. In Jurassic Park, a millionaire has cloned dinosaurs from the Jurassic Period out of fossil DNA taken from a mosquito. Despite receiving expert advice about the impossibility of exerting total control over biological phenomena, the millionaire opens a park to exhibit the dinosaurs to the public, with the consequence that dinosaurs end up breaking the security system of the park and attacking humans. Superficially similar targets were generated replacing base objects and relations with similar ones. For instance, the superficially similar target for “Jurassic Park” stated that a businessman had replicated mammoths from the Pleistocene Era out of a frozen embryo found in a glacier and was entertaining the idea of opening a zoo with mammoths on show. Participants’ task consisted in using analogies to dissuade the main character from pursuing the project, warning him that as animal behaviors are not completely manageable, mammoths could destroy the zoo cages, thus endangering people. Superficially dissimilar target analogs were derived replacing base objects and relations with new objects and relations that were less similar than in the near targets. Continuing with the Jurassic Park set, the target stated that certain astrophysicist was imitating martian storms out of digital images captured by a space probe. The target ended up stating that he was planning to let his colleagues enter the experimental zone to study these storms. Participants had to dissuade the main character from pursuing his plan on the grounds that, as extraterrestrial climatic phenomena are not well-known, they could exert negative effects on her colleagues. Collapsing the results of all four sets of materials, superficially similar source analogs were retrieved in 70% of the trials. In contrast, superficially dissimilar sources were retrieved in only 15% of the cases (see Fig. 1A). This strong effect of superficial similarities on the retrieval of naturally

14

Maximo Trench and Ricardo A. Minervino

(A) 100%

Superficially similar target Superficially dissimilar target

80%

60%

40%

20%

0% The secret in their eyes

Shrek

Spiderman

Jurassic Park

(B) 100%

Superficially similar source Superficially dissimilar source

80%

60%

40%

20%

0% Set 1

Set 2

Set 3

Set 4

Figure 1 Retrieval of a naturally acquired base analog during activities of persuasive argumentation. (A) Retrieval of central episodes from popular movies after target situations that maintained high versus low degrees of superficial similarity with such episodes. (B) Retrieval of autobiographical episodes that maintained high versus low levels of surface similarity with the target. From Trench, M., & Minervino, R. (2015a). The role of surface similarity in analogical retrieval: Bridging the gap between the naturalistic and the experimental traditions. Cognitive Science, 39, 1292e1319.

acquired base analogs was consistent with a long experimental tradition using artificial stimuli (e.g., Gentner et al., 1993; Keane, 1987) and ran counter to the conclusions extracted from naturalistic studies. Given that the naturalistic encoding of our base analogs (i.e., the movies) happened

Cracking the Problem of Inert Knowledge

15

to have a strong auditory component, the observed effect of surface similarities also failed to support Markman et al.’s (2007) conjecture about the role of auditory presentation in rendering naturally encoded sources more retrievable. The fact that the movie episodes used as base analogs were consistently recalled in response to the superficially similar targets suggests that they had been adequately encoded in the LTM by participants prior to the study. However, the fictional nature of the stories prevented generalizing the results to the retrieval of naturally encoded situations at large. In an unpublished study with a similar procedure, we analyzed the retrieval of another kind of culturally shared sources: extensively publicized political affairs. One of such episodes was the story of Sergio Schoklender, the older of two brothers who murdered their parents in the early 1980s. During his confinement in prison, Schoklender’s determination to study and to resist harassment by other convicts called the attention of human rights activist Hebe de Bonafini, internationally famous for having stood against Argentina’s prosecution of political dissidents. Once out of jail, Schoklender became increasingly involved with the administration of Bonafini’s foundation and began receiving large amounts of government funding for the construction of welfare houses (the “shared dreams program”). In 2011, it became apparent that Schoklender had defrauded the government, leaving thousands of poor families without the houses that the government had funded. Participants of the surface similarity condition were presented with a hypothetical situation of a young man who had pursued in-jail studies of social care during his imprisonment for having poisoned his stepfather. After reading that the director of an NGO was planning to hire him to supervise a community kitchen for poor children, participants were asked to draw analogies that could be useful to dissuade the director of the NGO from hiring the prisoner, on the grounds that despite having paid his debt to society, his criminal background could still represent a risk for young children. As opposite, participants of the surface dissimilarity condition received a situation in which a dog that had severely attacked its owners was sent to a rehabilitation program, with the result that it learned to respect not only its trainers but also people in general. After reading that the director of an institute for the blind was planning to include the dog in an intervention program to assist blind people in their daily activities, participants were asked to use analogies to dissuade him from involving the dog, on the grounds that despite its impressive behavioral transformation, the aggressive background of the dog could still represent a risk for blind people. Just like in

16

Maximo Trench and Ricardo A. Minervino

Trench and Minervino (2015a, Experiment 1), participants succeeded in recalling the critical culturally shared episode during the work with the superficially similar target but failed to retrieve it in response to the superficially dissimilar situation. 2.4.2 Retrieval of Autobiographical Base Analogs And what about the alleged advantage of “personally significant” episodes for being spontaneously recalled during the processing of a superficially dissimilar target? Even though it seems likely that the natural sources employed in the above studies had been better learned than the artificial stimuli employed in experimental studies (e.g., by Gick & Holyoak, 1980), they might have lacked the personal significance that characterizes autobiographic episodes. Hofstadter and Sander (2013) argue that when dealing with real-world situations, the semantically distant source situations people retrieve spontaneously and effortlessly from their own memories are, in general, extremely familiar. According to the above authors, when making analogies we all depend on knowledge that is rooted in our experiences over a lifetime. This knowledge has been reconfirmed over and over again and has also been generalized over time, allowing it to be carried over fluidly to all sorts of new situations. In their own words: “It is very rare that, in real life, we rely on an analogy to a situation with which we are barely familiar at all” (Hofstadter & Sander, 2013, p. 339). Taking into account Hofstadter and Sander’s (2013) considerations about the key role of familiarity and personal significance in the retrieval of naturally encoded sources, in Trench and Minervino (2015a, Experiment 2) we set forth to assess the effect of superficial similarity on the retrieval of autobiographical memories. During the first phase of the experiment, participants were given schematic descriptions of four different situations with the instruction to recall known episodes conforming to any of those descriptions. Less than two weeks later, participants having reported an exemplar of at least one of the given descriptions were asked to participate in a study on analogical argumentation in a different place and by different experimenters from those administering the previous phase. After reading a realistic situation that, unbeknownst to them, was either superficially similar or superficially dissimilar to the source they had reported during the previous phase, they were asked to draw analogies to real episodes to dissuade the main character of that situation from carrying out his intended action (see materials in Table 1). In line with results of Trench and Minervino (2015a, Experiment 1), results from Experiment 2 demonstrated a strong and uniform

Table 1 Materials used in Trench and Minervino (2015a, Experiment 2) Descriptions used for eliciting an Target situation and analogy autobiographic base analog (phase 1) generation task (phase 2)

Set 1

Set 2

Set 3

Set4

Surface similarity condition: Have you ever consumed some new food to excess, with the consequence that you got disgusted with it? Surface dissimilarity condition: Have you ever played some new game to excess, with the consequence that you got bored of it? Surface similarity condition: Have you ever stopped making the monthly payments of a debt, with the consequence that clearing the debt later was much more costly than it would have been originally? Surface dissimilarity condition: Have you ever interrupted the prescribed treatment of an injury, with the consequence that later recovery was much more difficult than it would have been originally? Surface similarity condition: Have you ever made a quick fix to an artefact, which was partially successful, with the consequence that you ended up never seeking a fully adequate solution? Surface dissimilarity condition: Have you ever learned intuitively how to do something in a rather clumsy way, with the consequence that you ended up never learning the proper way of doing it? Surface similarity condition: Have you ever attended a meeting despite believing that you would not have fun, but with the result that you ended up having a great time? Surface dissimilarity condition: Have you ever started doing a physical activity despite believing that you would not enjoy it, but with the result that you ended up enjoying it very much?

Suppose that a friend of yours, who has discovered passion fruit sorbet, is now interested in preparing his own passion fruit cheesecakes, toppings, daiquiris, etc. Please use analogies to known situations to convince him that by eating so much passion fruit he might get fed up with it. A friend of yours, who has acquired a costly van with a loan, is tempted to postpone the monthly payments of the plan until he gets a raise in his salary. Please use analogies to known situations to dissuade him from postponing the monthly payments of the loan, because otherwise the debt be get much harder to clear. A friend of yours has discovered that the water closet is leaking. He is about to seal it temporarily with suprabond, even though the result would not be completely satisfactory. Please use analogies to known situations to dissuade him from fixing it by himself, since it can prevent him from ever seeking a completely satisfactory solution in the future. A friend of yours, who has been invited to a costume party, fears that she might have a hard time there, since she has not been to any costume parties before. Please use analogies to known situations to dissuade her from declining the invitation, since she might find the party more fun than she expects.

18

Maximo Trench and Ricardo A. Minervino

effect of superficial similarities on the retrieval of the autobiographic episodes that participants had reported during the first phase of the experiment (see Fig. 1B). This last result fails to support the thesis about the special status of familiar and mundane autobiographic episodes as candidates for superficially unconstrained analogical retrieval. Having remedied the insufficiencies detected in the production paradigm as originally implemented, the results of the above experiments converge in demonstrating that superficial similarities play a crucial role in the retrieval of naturally encoded sources during analogy generation. They demonstrate that a superficial bias governs analogical retrieval not only in laboratory conditions but also during the retrieval of participant’s own sources in the service of tasks of indubitable ecological validity. How, then, to explain the contrast between the massive failure of participants to retrieve a particular base analog in the laboratory or in the classroom and the profusion of distant analogies in naturalistic studies?

2.5 Alternative Means for Generating Distant Analogies Anecdotal reports and experimental studies on creative cognition suggest that the frequent use of distant analogies can be aided by several mechanisms that do not conform to the standard definition of analogical retrieval, that is, the activation of a base analog from LTM during the processing of a specific target analog in working memory. As mentioned before, a trivial alternative to scanning LTM for potential analogs consists in inventing possible variations of the target situation. On top of the obvious usefulness of this heuristic for making a target topic more understandable to a potential listener, there is some evidence that by drawing analogies to hypothetical variations of a target situation, the analogizer can also enhance her own understanding of the target (e.g., Clement, 1988). Another alternative for generating one’s own distant analogies consists in either consciously retelling an analogy heard from others, or else inadvertently reusing conceptual metaphors related to the target topic at stake. Within a conceptual metaphor, only some aspects of the base domain are projected onto the target, while other aspects tend to be left aside (Kovecses, 2002; Lakoff, 1990; Turner, 1996). One heuristic for generating novel analogies consists in extending the standard mapping between the base and the target domain. As conceptual metaphors connect distant domains (e.g., discussion is thought as war, Lakoff & Johnson, 1980), extending them tends to result in novel instances of a culturally

Cracking the Problem of Inert Knowledge

19

shared distant analogy. These extensions could result in products like metaphorical expressions (Lakoff & Turner, 1989; Trench & Minervino, 2015b), idioms (Gibbs, 1994), stories (Turner, 1996) or images (Forceville, 2006), which can be employed in the service of literature writing, teaching, humor, advertising or argumentation. As an example, in the A PROJECT IS A JOURNEY conceptual metaphor, some of the most frequently projected aspects of the base domain of traveling are the vehicle, the destination, the obstacles faced during the trip or the speed of the vehicle. Subtler aspects related to the base domain (e.g., the landscapes we see while traveling) are not usually transferred to the target. Just to exemplify with one of the naturalistic studies commented in the present work, in Blanchette and Dunbar (2001) some of the analogies that appeared in the press prior to the 1996 referendum on the independence of Quebec were created by extending conceptual metaphors. For instance, one of the antiemancipation analogies compared emancipation to the act of abandoning an ocean liner to board on a lifeboat in the middle of a storm, an analogy that involves extending the conceptual metaphor A PROJECT IS A JOURNEY (the change of vehicle in response to traveling difficulties is not usually employed as a base domain when thinking about projects). In a study on the use of analogies during argumentation (Olguin, Trench, & Minervino, in revision), we found that even though participants were explicitly asked to base their analogies on autobiographical episodes, many of the distant analogies were in fact extensions of conceptual metaphors. When attempts to generate analogies by probing either one’s own memory or that of the culture do not yield satisfactory outcomes, suitable analogs are sometimes provided by the environment. During creative worrying (Browne & Cruse, 1988; Olton, 1979), the reasoner is not being deliberately devoted to solving a given problem, but she occasionally interleaves daily activities with lapses of time during which she revisits the problem. During those lapses, the social or physical environment can serendipitously present the reasoner with a relevant analog, leading to the establishment of a useful analogy. Just for the sake of illustration, one could picture Castoriadis in his home environment, recurrently switching attention back and forth between his domestic assignments and his political elucubrations about consumerism in modern societies. It is easy to see how the reading of Hansel and Gretel to his grandchildren would have loaded both analogs in working memory, leading to an easy mapping between both situations.

20

Maximo Trench and Ricardo A. Minervino

In combination with the surface bias of our memory systems, these collateral mechanisms have the potential of explaining why a journalist would probably fail to retrieve Gick and Holyoak’s (1980) story The General during a typical experimental study of analogical retrieval but yet succeed in coming up with a couple of smart analogies during the making of a newspaper article.

2.6 The Surface Bias of Analogical Retrieval: Friend and Foe As eloquently conveyed by Gentner et al. (1993, p. 567): “How can the human mind, at times so elegant and rigorous, be limited to this primitive retrieval mechanism?” The neural machinery capable of relational reasoningda newcomer in evolutionary historydruns upon the output of retrieval systems that predate them several million years. In terms of adaptation, memory’s tendency to base retrieval of readily processable surface cues can be thought to represent no big loss, since most things that look alike are alike relationally as well (the kind world hypothesis, Gentner, 1989; Medin & Ross, 1989). In the words of Dedre Gentner, if something looks and roars like a tiger, it probably is a tiger. More critically, the environment in which our ancestors evolved was so dangerous that risk avoidance was the top priority. The risk of overlooking a real danger outweighed the cost of missing a truly deep analogy. As an example, suppose that after experiencing an almost deadly encounter with a strange animal he had never seen before, a hominid later stumbled across another animal with similar (though not identical) visual appearance. Falsely assuming identicality with regards to both animals’ behavior would surely incur some cost in terms of time and energy. But wrongly denying their identicality could have a lethal outcome! Luckily enough, our everyday environment is not nearly as dangerous as that of our ancestors. However, there are many situations where retrieving literally similar sources still represents a better alternative than retrieving superficially dissimilar analogs. Just like in category-based induction, where similar exemplars represent a more solid basis for inferences than dissimilar ones (Osherson, Smith, Wilkie, L opez, & Shafir, 1990; Rips, 1975), the fact that two situations maintain a wide array of surface similarities increases the probabilities that other less obvious features will also be shared. As stated by Hofstadter and Sander (2013, p. 156), “every act of thinking, no matter how small, relies on such analogies, and the tighter the analogy, the more unavoidable the conclusions it leads to would seem to be” (italics added).

Cracking the Problem of Inert Knowledge

21

To illustrate, consider the task of persuading somebody else that eating too much passion fruit will make him or her stop liking it. Even though analogies with (1) somebody who got fed up of ingesting too many lychees and (2) somebody who got tired of playing too many computer games might probably obtain similar scores on traditional measures of analogical soundness, it is clear that by relying on the same set of biological foundations, the precedent of having been fed up with lychees confers greater probability to the conclusion. The problem arises when a pressing situationdno matter if natural or artificialdis so novel or unique that literal similes are lacking. Despite the fact that superficially dissimilar analogs would now represent the most promising basis for inductive inferences, the surface bias of the memory systems will favor the retrieval of mere-appearance matches (Gentner et al., 1993), which involve similar elements but dissimilar structure. These situations get increasingly common as we move into unfamiliar territories and become ubiquitous in educational environments. What, if anything, can be done to make distant retrievals more likely?

3. OVERCOMING HUMAN LIMITATIONS FOR RETRIEVING DISTANT ANALOGS Theorists and computational modelers of analogical thinking have been mostly interested in the spontaneous retrieval of base analogs in response to the representation of a target analog in working memory. Implicit in the description of influential computer simulations such as MAC/FAC, ARCS or LISA is the idea that the contents of working memory are continuously (and rigidly) selected as cues to probe LTM for analogically related information. Consistent with this “automatic” conception of retrieval mechanisms, the first interventions to promote interdomain retrieval sought to promote a more abstract encoding of the base analogs, so as to render them more accessible during later encounters with similar situations lacking surface similarities. Two successful strategies consisted in coupling the base analog with its abstract schema (Goldstone & Wilensky, 2008; Ross & Kilbane, 1997) or with a second analogous situation (Catrambone & Holyoak, 1989; Gick & Holyoak, 1983; Ross & Kennedy, 1990) and asking participants to compare them. More stripped-down interventions included asking participants to discuss the base analog with another student (Schwartz, 1995), to

22

Maximo Trench and Ricardo A. Minervino

explain the problem to themselves (Ahn, Brewer, & Mooney, 1992), to construct a structurally equivalent problem (Bernardo, 2001a) or to describe the base problem at a higher level of abstraction (Mandler & Orlich, 1993). When trying to promote transfer without even asking participants to elaborate on the base situationsda condition that is prevalent in noninteractive materials such as textbooks of audiovisual presentationsdtransfer advantages can still be obtained by means of removing irrelevant information (Goldstone & Sakamoto, 2003) or by replacing domain-specific terms of the base situation with domain-general ones (e.g., replacing “typing” by “writing,” Clement, Mawby, & Giles, 1994). What all these interventions have in common is the highlighting of the abstract structure of the base analogs. The shared assumption is that as future target analogs will have a stronger match with such more abstract representations than they would with specific examples having mismatching features, the future retrievability of these relationally encoded base analogs will increase. Despite the relative success of interventions aimed at promoting a more abstract encoding of the sources, a sensible question concerns whether improving people’s initial encoding is the only way to favor the retrieval of analogs. As posited by Loewenstein (2010, this book series), if changing the initial encoding represents the only means for enhancing analogical reminding, then we can offer very little grounds for improvement to someone who has already acquired the relevant sources in a suboptimal manner. What is needed in these cases are interventions taking place at the time of retrieval.

3.1 Spontaneous Versus Voluntary Retrieval Consistent with the dominant modeling of analogical retrieval as an automatic process, the overwhelming majority of behavioral studies (e.g., Gick & Holyoak, 1980; Holyoak & Koh, 1987; Kurtz & Loewenstein, 2007; Spencer & Weisberg, 1986) have assessed the spontaneous activation of a source in response to the processing of the target. In contrast, very few studies have dealt with voluntary retrieval, that is, with the deliberate search for source analogs in memory. In an exploratory study on learning computerbased text editing, Sander and Richard (1997) confronted participants with several text-editing challenges. While analogizing between text editing and typewriting (two domains at a similar level of abstraction) provided solutions to some of the tasks, other challenges could only be solved by analogizing to more abstract domains such as writing in general or manipulating objects. In the course of solving such problems, participants’ active

Cracking the Problem of Inert Knowledge

23

search for suitable analogies showed a progression from more concrete to more abstract base analogs. Beyond this preliminary evidence, little is known about the extent to which participants deliberately search for analogous situations while performing activities like problem solving, teaching or argumentation. While the results of a handful of studies suggest that participants can generate many analogies upon explicit request (e.g., Blanchette & Dunbar, 2000; Trench, Oberholzer, Adrover, & Minervino, 2009), the extent to which an explicit indication can yield a transfer advantage over and above the baseline tendency to embark on this kind of exploration remained unclear. Echoing the growing interest in the distinction between voluntary and involuntary reminding among studies of autobiographic memory (see Mace, 2010 for a review), we set forth to extend this line of inquiry to the realm of analogical retrieval, with an eye on its potential contribution to educational interventions. In a recent study (Trench, Olguín, & Minervino, 2016; Experiment 1), we presented participants of various groups with a dilemmatic situation and asked them to persuade the protagonist of the stories of pursuing one of those possibilities. While one of the groups was further asked to base such arguments on analogies, a control group was not given this indication. Results showed that the number of analogies retrieved in the prompted condition was nearly 10 times higher than in the unprompted condition, where the spontaneous use of analogies was close to zero. These results show that at least for certain important pragmatics like communicating ideas to others, the mechanisms of analogical retrieval are not automatically triggered in response to the processing of the target and benefit largely from a deliberate disposition to seek for analogous cases. Despite this dramatic increase in the use of analogies due to explicit prompting, an analysis of the analogies produced by participants revealed a majority of near analogies over far analogies, thus suggesting that voluntary retrieval, just like with spontaneous remindings, is heavily guided by surface similarities. As noted when discussing the results of naturalistic studies, however, the observed proportions of close versus distant remindings should not be read off straightforwardly as reflecting the natural tendencies of the retrieval systems, since they could also reflect many other factors such as analogy fabrication or an uneven availability of far versus close analogs in LTM. In a recent study, Martínez Frontera (2015) investigated whether the difference between the prompted and unprompted conditions would still hold under a traditional transfer paradigm. During the transfer phase of such study, she had two groups of 60 participants read a story about a dangerous

24

Maximo Trench and Ricardo A. Minervino

human disease (the H flu) which had only recently been eradicated. This target analog also stated that a group of scientists was planning to preserve the last samples of the virus for research purposes. Participants of one of the groups (the unprompted group) were asked to generate persuasive arguments that could be used to convince the scientists that the samples should be entirely destroyed. Participants of the prompted condition received a similar instruction, with the additional requirement to base their persuasive arguments on analogies to prior situations. The learning phase of both conditions took place between 30 and 45 min prior to the argumentation task and was presented incidentally during a weekly class of a psychology course. During this phase, participants read a base situation that was structurally analogous to the target situation, but which had either high or low levels of surface similarity with such target. In the superficially similar condition, the base analog was a story about a cow disease that had been successfully eradicated, with the samples of the virus being preserved in several laboratories of the United States and Russia. The story ended that after the mysterious disappearance of a test tube from one of the labs, there was a reappearance of the disease that killed thousands of cows. The base analog received by participants of the remaining condition was generated via replacing target elements by new elements that were semantically less similar than in the superficially similar condition. Such story stated that at some time in the past, the United States and Russia had agreed to discontinue the production of the FLZ nuclear weapons, with extant missiles being kept in secure arsenals supervised by both countries. The story ended that after the mysterious disappearance of two missiles from one of the arsenals, a spread of nuclear radiation caused the death and illness of thousands of persons. The analysis of participants’ responses during the second phase replicated the results of Trench et al. (2016): while very few participants (12%) recalled the base analog without a prompt to base their arguments on analogous situations, twice as many participants did so in the prompted condition. With regards to the surface similarity manipulation, both in the prompted and the unprompted conditions the retrieval of superficially similar base analogs was much higher than that of superficially dissimilar ones. In sum, this evidence suggests that voluntary search can increase the number of analogous situations that would have been spontaneously retrieved but without altering the basic tendency of the system to favor the retrieval of superficially similar situations. Future work is underway to determine whether this finding generalizes to the retrieval of analogies during other educationally relevant activities like problem solving or hypothesis generation.

Cracking the Problem of Inert Knowledge

25

3.2 The Late Abstraction Principle As reviewed in Section 3.1, promoting a more abstract encoding of the base analogs renders such sources more likely to be retrieved during a later encounter with a superficially dissimilar target. The limitation of this approach is mainly one of applicability, since it provides no solution to the retrieval of situations learned in conditions that were not especially engineered to enforce this kind of elaborations. To figure out a solution to the retrieval of suboptimally encoded learning, Kurtz and Loewenstein (2007), and Gentner, Loewenstein, Thompson, and Forbus (2009) reasoned that as retrieval depends on the degree of match between the stored items and the memory probe, the beneficial effect of schema abstraction should also apply when elaborating on the target analog at retrieval time (henceforth, the late abstraction principle). Assuming MAC/FAC to be an appropriate model of analogical retrieval, the authors hypothesized that a more abstract representation of the target analog could enhance distant retrievals in two ways. As more abstract target representations will have fewer specific object attributes and other low-level features, they will attract fewer mere-appearance matches that could compete with more appropriate analogical matches. On the other hand, given that the total weight of an analog is distributed among the units of its corresponding vector (i.e., unit-vector normalization), the removal of surface features will entail allocating more weight to the relational structure of the target. Thus, the dot product between a structurally similar base analog and the schema will be higher than that between such base analog and an unabstracted target. To test this prediction, Kurtz and Loewenstein (2007) assessed the effectiveness of an intervention that consisted in providing participants with a second (unsolved) problem that was isomorphic to the target problem to be solved and asking them to compare both problems prior to attempting their solution. The base analog read by participants during the learning phase was Gick and Holyoak’s (1980) story The General, and the targets were the Radiation and the Red Adair problems from Gick and Holyoak (1983). In this last problem, a fire in an oil well was so severe that it could only be suffocated by placing a large quantity of fire retardant at the base of the well, with the problem being that there was no single hose large enough to carry the quantity of foam needed to extinguish the fire. As was the case with the base comparison interventions, this “targetcomparison” procedure resulted in enhanced transfer of the base solution as compared to the standard base-target condition. Results from further

26

Maximo Trench and Ricardo A. Minervino

controls confirmed that the comparative processdand not merely the provision of a second targetdwas key for successful performance. More importantly, they confirmed that the augmented proportion of correct solutions stemmed from transferring the learned procedure, and not simply from an enhanced ability to solve the target problem from first principles, that is, from a better intrinsic comprehension of the target problem. In subsequent work, Gentner et al. (2009) generalized the benefits of the target-comparison strategy to autobiographic memories and also simulated the process of backward transfer by means of feeding MAC/FAC (Forbus et al., 1995) with targets that consisted either in the original stories from the Karla the Hawk series of studies (Gentner et al., 1993) or in their respective abstract schemas, and having it run on a LTM comprising analogical matches, mere-appearance matches, and several filler stories. As suggested by the results of the target-comparison studies, the process of late analogical abstraction opens a promising avenue for retrieving base situations learned within contexts that were not specially engineered to highlight their underlying structures, and which may represent the vast majority of the learning conditions that people experience within and outside instructional settings (Loewenstein, 2010). In contrast to the potential applicability of the late abstraction principle, however, the specific target-comparison intervention still falls short of representing a truly portable cognitive strategy, since participants depend on the provision of a second isomorphic target for every target problem they are to solve. Given the originality of target abstraction as a means for promoting cross-domain retrieval of suboptimally encoded sources, our current research program aims at devising target elaborations capable of capitalizing on the late abstraction principle, but which could be implemented by analogizers in fully autonomous ways. The remainder of the chapter reviews our successful and our failed attempts to generate portable interventions based on the late abstraction principle and finally discusses our intuitiondshared with Loewenstein et al. but by no means undisputeddthat late abstraction affords the retrieval of suboptimally encoded learnings.

3.3 Deriving “Portable” Cognitive Strategies Based on the Late Abstraction Principle Admittedly incremental, the heuristic we followed for deriving portable interventions based on the late abstraction principle consists of (1) surveying interventions credited with having promoted a more abstract encoding of the base analogs, (2) selecting those strategies that learners could conceivably

Cracking the Problem of Inert Knowledge

27

apply to the target without the ad hoc provision of target-related information and (3) assessing whether their application to the target analog in fact promotes the retrieval of superficially dissimilar sources. 3.3.1 Comparing Disanalogous Targets Conceived as a minimal variation to the target-comparison intervention, Minervino, Olguín, and Trench (2016, Experiment 1) represents our first attempt to derive a portable intervention based on the late abstraction principle. The only difference with Kurtz and Loewenstein’s (2007) study was that participants were asked to compare the target problem to a random nonisomorphic problem, something that could in principle be achieved by participants without receiving any target-specific information. We speculated that the activity of comparing disanalogous problems might still lead to the extraction of abstract cues that could eventually favor the retrieval of distant analogs. Our hypothesis was that the canonical partition of each of the problems in terms of its components (i.e., its goal, restrictions, operators, etc.) would frame the establishment of a set of correspondences between the (dissimilar) concepts that fill such slots in each of the problems. This would in turn result in conceptualizing commonalities and differences between those fillers, which could eventually result in appropriate cues for retrieving distant analogs. As an example, consider that prior to solving the Radiation problem, participants were asked to compare such problem to Duncker’s (1945) Candle problem, where participants are presented with candles and a matchbox-size box filled with tacks, with the challenge of fixing the candle to a wall so that when lit, the wax would not drip onto the floor. Faced with the task of comparing both problems, an analysis of their respective restrictions could lead to considering them as two cases of “avoiding to affect nearby elements” (a structural similarity). On the other hand, an analysis of the permitted operators could lead to conceptualize them as “using a force vs. using some office items to reach the objective.” Similarities and differences of this kind tend to be more abstract than the particular elements in which they originate. Therefore, we speculated that including them in a representation of the target could aid the retrieval of distant sources by means of providing additional and more appropriate cues. An analysis of the comparisons produced in the experimental condition showed that participants who managed to construct abstract schemas had greater probability of producing a convergence solution to the tumor problem than those who failed to produce such schemas. However, the fact that less than a quarter of participants could generate successful schemas prevented the comparison group from reliably outperforming participants in the control conditions.

28

Maximo Trench and Ricardo A. Minervino

3.3.2 Inventing Analogous Targets In a subsequent attempt to identify a portable way of inducing a more abstract encoding of the target analog (Minervino et al., 2016; Experiment 2) we turned our attention to the activity of inventing an analogous problem. The construction of structurally equivalent problems had been considered a stringent test of students’ comprehension of different types of mathematics and algebra problems, as well as of their ability to relate them to the real-world situations to which they can refer (see, e.g., Nathan, Kintsch, & Young, 1992). In a series of studies using probability problems, Bernardo (2001a, 2001b) had demonstrated that the activity of constructing an isomorphic problem can be beneficial for later analogical transfer, on occasion even surpassing the problem-comparison task in efficacy. Evidence about the advantage of generating analogous exemplars has also been obtained outside the domain of mathematics and even of problem-solving in general. In an unpublished study described by Dunbar (2001), students who read fablelike stories with the task of generating analogous episodes were more likely to retrieve them during the processing of analogous stories lacking surface similarities. Based on the above studies, we ventured that the generation of an analogous situation would be effective in directing participants’ attention towards the structural features of the target problem being processed, thus enhancing the retrieval of potentially useful base analogs (the late abstraction principle). Sticking to the Radiation problem as an example of a possible target analog, a person could cope with the task of inventing an analogous problem by tentatively replacing the patient’s stomach by the bone of a fruit, guided by a surface resemblance such as their spatial location inside an object. Next, she would have to relate the bone to another entity (e.g., a mold inside the bone), whose functionally relevant attributes (Keane, 1985) allowed it to play in the fruit scenario the same role played by the tumor within the Radiation problem (i.e., threatening the rest of the fruit). The chosen element, in turn, constrains the selection of possible operators to those that would eliminate the focus of the problem when applied at the required intensity (e.g., by boiling the whole fruit) but at the cost of harming an element that needs to be preserved (e.g., the pulp). As the above example illustrates, the derivation of an analogous target problem in a nonformal domain seems to demand a deep and systematic exploration of the relational structure of the problem that serves as a basis for problem construction. In terms of portability, the advantage of target construction over target comparison would lie in the fact that participants would be able to apply it to any potential target situation without the

Cracking the Problem of Inert Knowledge

29

need to be externally provided with a second target analog. In line with our expectations, participants in the fabrication condition outperformed participants in the standard transfer condition in generating convergence solutions to the Radiation problem. This increased performance could not be attributed to an effect of target invention on the ability to solve the target problem from first principles, since participants who invented an analogous target without having previously received a base analog did not outperform participants in the standard transfer condition (10% vs. 10%). Thus, the activity of constructing a novel unsolved target problem at test time seems to improve transfer by fostering the retrieval of the base story and its convergence solution. To have an estimate of how our target invention fares against target comparison in terms of transfer efficacy, we compared our results with a prior replication of Kurtz and Loewenstein (2007) that we had carried out with our own population of participants. A comparison between both experiments showed that target invention, though potentially more portable, was somehow less effective than target comparison (25.71% vs. 34.29%, respectively). An analysis of the relation between participants’ success in inventing a problem and their later transfer performance revealed that those who succeeded in generating an appropriate problem were highly likely to transfer the convergence solution to the Radiation problem. The lower overall transfer performance of target construction as compared to target comparison thus originates in the fact that participants found it quite hard to come up with an analogous target. To understand the sources of this difficulty, it should be noted that most prior interventions involving problem construction had been developed as ways of promoting and/or testing the acquisition of concepts and procedures in mathematical subdomains such as algebra (Nathan et al., 1992), arithmetic (Lampert, 1986; Rudnitsky, Etheredge, Freeman, & Gilbert, 1995) or probability (Bernardo, 2001a). Problem generation within formal domains seems to be characterized by a great freedom for selecting the objects that instantiate the given quantities of the problem (e.g, apples, candy, etc.), as well as by a set of easily accessible actions that can potentially reinstantiate the mathematical operations represented by the base relations (e.g., replacing buying by another instance of adding, such as putting, and replacing selling by another instance of subtracting, such as removing). In contrast, generating novel problems outside the realm of formal disciplines might require applying more aggressive rerepresentation mechanisms, as well as a more open and creative exploration of our (sometimes poor) knowledge about

30

Maximo Trench and Ricardo A. Minervino

certain domains. To illustrate, the generation of a problem like Red Adair on the basis of the Radiation problem could have begun by shifting the goal of the problem from destroying a tumor to suffocating a fire. Once this new goal was set, the remaining components of the problem must be adjusted in a coordinated fashion, a process of multiple constraint satisfaction that can on occasions be rather taxing (Hofstadter, 1985; Ward, Smith, & Finke, 1999). While the replacement of rays by fire retardant is straightforward, it is less easy to think about what target element could fill the role of the surrounding tissues. It is thus not surprising that participants failed to carry out these replacements in a complete and coordinated manner. Despite these intrinsic difficulties, however, it is conceivable that training participants in the generation of analogous problems could render this strategy easier to implement. In view of the much greater portability of target construction as compared to the ad hoc provision of a second analogous target, it seems that any investment in enhancing this ability would promise to pay off well. 3.3.3 Constructing Idealized Representations of the Target In an ongoing study (Trench, Tavernini, & Goldstone, in preparation), we focused on the idealization of the target analog as another portable strategy potentially capable of enhancing the retrieval of distant analogs at retrieval time. Goldstone and Sakamoto (2003) had demonstrated that even though concrete instantiations of an abstract principle are more helpful than idealized representations for inducing an initial understanding of such principle, the reverse is true when it comes to recognizing such principle during a later encounter with a different instantiation of the principle. To assess whether a comparable transfer advantage can be obtained by inducing a more idealized representation of the target analog at retrieval time, we had two groups of 30 participants learn how to solve a “collision” problem in which a plane and a helicopter traveled towards each other at different speeds (by using the r1$t þ r2$t ¼ d formula). After a contextual separation, participants were presented with a “work” problem in which they had to calculate the time that two painters would need to jointly paint a wall, given the times that each of them would have needed to paint it on his or her own. Before being asked to actually solve the problem, both groups were presented with a set of manipulatives and were tasked with carrying out an approximate representation of the situation described by the target problem as it unfolded from the initial moment until the moment when the wall got completed. While participants in the concrete condition received a photograph of a horizontally laden wall and two smaller

Cracking the Problem of Inert Knowledge

31

4 cm  1.3 cm rectangles printed with drawings of painters, participants in the idealized condition received similarly sized paper rectangles without any figurative illustrations (see Fig. 2). To discard that an eventual advantage of any condition in solving the target stemmed not from a transfer process but rather from a more accurate comprehension of the target per se, two additional groups received the same target problem and the same simulation materials but without having learned the base problem during a prior phase. In line with the late abstraction principle, while participants who constructed an idealized simulation of the target problem after receiving a base analog were nearly twice as likely to apply the base strategy than those constructing a concrete simulation, no such effect was found among the corresponding groups that had not received a base analog and its solution.

Figure 2 Manipulation of the degree of concreteness during the encoding of base and target analogs. (A) Snapshot of concrete versus idealized dynamic representations of the collective behavior of ants and food (an instance of the principle of competitive specialization) used as the main manipulation during the encoding of the base analog. (B) Concrete versus idealized representations of the situation model of a typical “work” problem, used as the main manipulation during the encoding of the target analog. (A) From Goldstone, R. L., & Sakamoto, Y. (2003). The transfer of abstract principles governing complex adaptive systems. Cognitive Psychology, 46, 414e466. (B) From Trench, M., Tavernini, M., & Goldstone, R. L. Enhancing the retrieval of distant analogs via idealizing target representations (in preparation).

32

Maximo Trench and Ricardo A. Minervino

Yet another alternative explanation for the observed facilitation could be that while idealized materials induced a concentric representation that dynamically matched that of the collision problem, the concrete materials could have invited a socially laden representation of the painters as both moving in the same directionda depiction that is no less intrinsically adequate than the concentric one, but which did not happen to match that of the base problem. Counter to this hypothesis, an analysis of the situation models constructed by participants showed that the proportion of “concentric” representations did not differ across groups. Taken collectively, we regard these data as suggesting a parallelism between the abstraction process that takes place in tasks like problem comparison or problem construction and the kind of idealization induced by our manipulation. Akin to the advantage of abstract retrieval cues in the MAC/FAC simulations of the late abstraction principle, idealized representations of the target are (1) perceptually more similar than their alternative concrete representations to any base representation and (2) less likely to evoke spurious remindings that could eventually outcompete the critical base analog. Building incrementally on prior interventions credited with having promoted an abstract encoding of the base analogs, the above studies assessed whether applying such elaborations to the target analog could enhance the retrieval of distant sources, as predicted by the late abstraction principle. Most critically, the selection of self-accomplishable strategies such as target construction and target idealization was meant to overcome a serious applicability limitation of the target-comparison intervention: the need to provide analogizers with a specially tailored second analog for every new target situation they are to process. As these mental operations do not require the provision of additional information about the target, learners can potentially apply them across a wide variety of environments, both formal and informal. Current work is underway to assess whether other successful interventions such as Mandler and Orlich (1993, see Section 3.1) could also be converted into portable target elaborations, as well as whether they generalize to other overlooked but educationally relevant activities like hypothesis generation or communication, for which the retrieval of far analogies could certainly be a valuable resource.

3.4 Can Late Abstraction Revive Inert Knowledge? Most of the “enthusiasm” about interventions at recall time stems from the supposition that they can help participants retrieve suboptimally encoded

Cracking the Problem of Inert Knowledge

33

knowledge from LTM (see Gentner et al., 2009; Loewenstein, 2010). Even though neither our procedures nor those of Loewenstein et al. encouraged an abstract encoding of the sources, our results are silent about the extent to which the sources retrieved by participants in the late abstraction conditions had received an abstract processing during their initial encoding. Based on the widely accepted encoding-specificity hypothesis (Tulving, 1983; Tulving & Thompson, 1973), it could be argued that as participants’ spontaneous encodings of the base analogs vary along a continuum from the very shallow to the very deep, the late abstraction manipulations were only able to aid retrieval in those cases where the base analog happened to receive an abstract encoding akin to the one generated during late abstraction. Needless to say, empirically demonstrating the retrieval of suboptimally encoded sources would require assessing the nature of an individual person’s particular encoding without altering it in any way, something that might be difficult to achieve. In the meantime, such discussion can benefit from gaining further precision with regards to the different ways in which base encodings could be suboptimal, the mechanisms by which such suboptimally stored knowledge could eventually be retrieved, and the possible role of late abstraction in facilitating these particular kinds of analogical retrieval. One slightly suboptimal way of encoding the base analogs could consist in generating its abstract schema but without stressing its centrality relative to other more superficial aspects. As suggested by Gentner et al. (2009), the representation of the target could include its generic structure plus the specific superficial features or its generic structure but little about the specific superficial aspects. While in the former case numerous surface matches will be retrieved, in the latter case there will be fewer surface matches to eventually outcompete and crowd out relational matches (Catrambone & Holyoak, 1989; Gentner et al., 1993). On top of this, the match between a schema and an example will be greater than the match between two analogous examples (Gentner et al., 2009). Target interventions like problem comparison or problem construction could potentially aid the retrieval of this kind of base analogs, since diminished surface competition alone should be capable of accounting for an effect of analogical abstraction on backward relational retrieval. Simulation studies using Forbus et al.’s (1995) MAC/FAC model of analogical retrieval bear out the computational plausibility of this account. Gentner et al.’s (2009) interpretation represents a parsimonious explanation of how the process of late abstraction could enhance access to base

34

Maximo Trench and Ricardo A. Minervino

analogs whose structural traits were generated but not especially highlighted. However, the success of various interventions aimed at inducing a more abstract encoding of base analogs suggests that participants are unlikely to encode the abstract structure of the source in ways that will neatly match the structure of the particular target they will have to solve. To illustrate, some participants might encode the goal of capturing the fortress as a case “overpowering an object with a force,” a representation that could later be useful to describe the goal of destroying the tumor with rays in the Radiation problem. But now consider that instead of receiving the Radiation problem, participants were tasked with figuring out how to bring certain amount of water to a town while attending to the restriction that sending such amount of water through either of the available canals alone would flood the surrounding lands. For this particular problem, it would have been more useful to conceptualize the goal of capturing the fortress as a case of, say, “carrying a resource to a destination.” There are innumerable ways to encode exemplars (Hofstadter & FARG, 1995; Markman & Ross, 2003), and people can hardly guess which one will best fit a target problem that they do not yet know. Even in cases like the above, late analogical abstraction could conceivably show a retrieval advantage over the raw processing of the target. Traditional models of analogical mapping have postulated several rerepresentation mechanisms aimed at revealing latent identities between initially nonidentical elements (e.g., minimal ascension, Falkenhainer, Forbus, & Gentner, 1989; decomposition, Gentner & Wolff, 2000; or coactivation over distributed representations, Hummel & Holyoak, 1997). Employing a variant of minimal ascension, retrieval algorithms like ARCS (Thagard et al., 1990) can access base analogs having similardbut not identicaldconcepts by means of probing LTM with memory cues that were derived from target elements through conceptual links such as superordination or part-whole relationships (e.g., the target concept destroy could activate a source containing the relation capture based on both being instances of overpowering). Given that the connections between a superordinate concept and its instances require less links than those between two particular instances (Gentner, 1989), interventions promoting late abstraction could in principle be capable of boosting the retrieval of base analogs whose structural encoding did not neatly match the structure of the target. Yet another way in which an initial encoding could be considered suboptimal would consist in having stored a base situation under a poor or wrong principle, as compared to the more proper principle under which

Cracking the Problem of Inert Knowledge

35

the target situation is currently being framed (Gentner et al., 2009). For instance, imagine that an instructor wanted to illustrate the well-known regression fallacy (Tversky & Kahneman, 1974) with an example in which the son of an extraordinary soccer player displayed skills that, although competitive, were not as outstanding as those of his father (according to the topic of the lesson, people’s natural tendency to generate causal explanations for these situations is ungranted, since such drops in performance could be more parsimoniously accounted for on purely statistical grounds). Now imagine that in an attempt to capitalize on her learners’ prior knowledge, the instructor asked them to recall further examples. Faced with such challenge, a successful retrieval cue could consist of a structure where the fact that a descendent of a high-performance player did not display such level of performance gave rise to causal interpretations. By promoting a shift between a particular type of extreme values (e.g., striking soccer skills) and more general categories of outstanding performance (e.g., extraordinary levels of any kind of professional achievement), interventions aimed at eliciting target elaborations could in principle enhance the retrieval of distant analogs that were indexed under a poor or wrong abstract principle. In sum, while Gentner et al. (2009) have modeled the beneficial effect of late abstraction for retrieving sources whose surface content has not been sufficiently deemphasized, there are other types of suboptimal encoding that may complicate subsequent retrieval (Loewenstein, 2010), and which deserve further investigation. Despite the methodological difficulties mentioned above, we believe that the educational implications of the late abstraction principle justify any efforts directed to assess whether suboptimally encoded sources do in fact get retrieved, as well as the extent to which interventions such as analogical comparison or analogy fabrication can foster their retrieval.

4. CONCLUSIONS Realistic solutions to the problem of inert knowledge require an accurate estimation of our ability to retrieve distant analogs from memory. The first half of the present chapter began by reviewing the debate between experimental and naturalistic studies on the extent to which analogical retrieval relies on surface similarities. At one side, traditional experiments using a two-phase transfer paradigm consistently documented the improbability of retrieving analogous cases in the absence of at least some minimal amount of surface similarities. On the other side, a rapidly growing body

36

Maximo Trench and Ricardo A. Minervino

of naturalistic studies reveals a frequent use of far analogies by experts and novices, a result that allegedly calls into question both the validity of experimental studies and the adequacy of the computer models engineered to mimic such patterns of results. After discussing the strengths and weaknesses of both empirical approaches, we presented our attempt to decide between these accounts by means of a hybrid paradigm that was explicitly designed to retain both the ecological validity of naturalistic studies and the methodological control of experimental procedures. In line with traditional experimental results, the application of this hybrid paradigm to three different kinds of naturally acquired sources (popular movies, highly publicized political affairs and autobiographic episodes) converged in demonstrating that surface similarity governs retrieval even when highly ecologically valid tasks and materials are involved. Given that our retrieval mechanisms do not distinguish the natural from the artificial, the inspection of natural environments should no longer be regarded as a privileged window into how retrieval difficulties could possibly be alleviated. The first half concludes by reviewing a series of cognitive strategies that do not conform to a stringent definition of analogical retrieval, but which could collectively explain why a person would fail to retrieve the critical distant analog during a typical experiment or classroom activity, yet succeed in generating far analogies under less constrained environments. These alternative routes to distant analogizing have rarely been the target of systematic investigation, with current evidence for its existence being mostly anecdotal. Future studies should address how reliable they are and via which cognitive mechanisms they operate. In light of the limitations of our memory systems for accessing far analogs, the second half of the chapter began by reviewing early attempts to make distant connections more likely. Consistent with an implicit conception of our retrieval systems as responding passively to the representation of the target analog in working memory, several interventions were aimed at enforcing a more abstract encoding of the base analogs, thus rendering them more accessible during the future processing of superficially dissimilar targets. Despite their success, interventions following this approach have little to offer to someone needing to retrieve base situations whose encoding conditions had not emphasized their abstract features. The remainder of the chapter reviews various attempts to help participants retrieve this kind of representations. Albeit commonsensical, one of the avenues we began to explore concerned whether certain pragmatics

Cracking the Problem of Inert Knowledge

37

reliably elicit a search for analogs in memory. Results from our studies on persuasive argumentation suggest that while argumentation seldom elicits spontaneous analogical retrievals, access to both near and far base analogs increases significantly with the deliberate intention to base arguments on analogous cases. The other approach we have followed builds upon the late abstraction principle postulated by Dedre Gentner et al., which states that the benefits of abstract representations should also apply to elaborations of the target analog at retrieval time. Since recently, the only evidence for the psychological reality of late abstraction came from Kurtz and Loewenstein (2007) and Gentner et al. (2009), who elicited a retrieval advantage by presenting the target analog together with another unsolved problem with similar structure and having participants compare them. Enthused by the potential of late abstraction to overcome the applicability limitations of interventions based on initial encoding, we developed a series of interventions designed to help participants capitalize on the late abstraction principle but without depending on the external provision of target-specific information. While the task of comparing the target to a random nonisomorphic problem proved unsuccessful, activities like constructing an idealized representation of the target problem or inventing a new problem with a similar structure increased participants’ ability to transfer the base solutions to the target problem. Despite the advantage of these interventions in terms of applicability, the debate remains as to whether their beneficial effect involves the retrieval of suboptimally encoded sources, as opposed to improving the retrieval of only those base analogs which had received a structural encoding in first place. We concluded by advancing several plausibility arguments for the thesis that the process of late abstraction can potentially enhance access to suboptimally encoded sources, but emphasized the need to address the difficult challenge of submitting such hypothesis to empirical test.

REFERENCES Ahn, W. K., Brewer, W. F., & Mooney, R. J. (1992). Schema acquisition from a single example. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 391e412. Arthus-Bertrand. (2009). Home [Motion Picture]. Los Angeles, CA: 20th Century Fox. Bernardo, A. B. I. (2001a). Principle explanation and strategic schema abstraction in problem solving. Memory & Cognition, 29, 627e633. Bernardo, A. B. I. (2001b). Analogical problem construction and transfer in mathematical problem solving. Educational Psychology, 21, 137e150. Blanchette, I., & Dunbar, K. (2000). How analogies are generated: The roles of structural and superficial similarity. Memory & Cognition, 28, 108e124.

38

Maximo Trench and Ricardo A. Minervino

Blanchette, I., & Dunbar, K. (2001). Analogy use in naturalistic settings: The influence of audience, emotion, and goals. Memory & Cognition, 29, 730e735. Browne, B. A., & Cruse, D. F. (1988). The incubation effect: Illusion or illumination? Human Performance, 1, 177e185. Catrambone, R. (2002). The effects of surface and structural feature matches on the access of story analogs. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 318e334. Catrambone, R., & Holyoak, K. J. (1989). Overcoming contextual limitations on problemsolving transfer. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 1147e1156. Chen, Z., Mo, L., & Honomichl, R. (2004). Having the memory of an elephant: Long-term retrieval and use of analogues in problem solving. Journal of Experimental Psychology: General, 133, 415e433. Christensen, B. T., & Schunn, C. D. (2007). The relationship of analogical distance to analogical function and pre-inventive structure: The case of engineering design. Memory & Cognition, 35, 29e38. Clement, J. (1988). Observed methods for generating analogies in scientific problem solving. Cognitive Science, 12, 563e586. Clement, C., Mawby, R., & Giles, D. (1994). The effects of manifest relational similarity on analog retrieval. Journal of Memory and Language, 33, 396e420. Dehghani, M., Gentner, D., Forbus, K., Ekhtiari, H., & Sachdeva, S. (2009). Analogy and moral decision making. In B. Kokinov, D. Gentner, & K. Holyoak (Eds.), New frontiers in analogy research (pp. 1e10). Sofia: NBU Press. Dunbar, K. (1997). How scientists think: Online creativity and conceptual change in science. In T. B. Ward, S. M. Smith, & S. Vaid (Eds.), Creative thought: An investigation on conceptual structures and processes (pp. 461e493). Washington, DC: APA Press. Dunbar, K. (2001). The analogical paradox: why analogy is so easy in naturalistic settings, yet so difficult in the psychology laboratory? In D. Gentner, K. J. Holyoak, & B. Kokinov (Eds.), The analogical mind: Perspectives from cognitive science (pp. 313e334). Cambridge, MA: The MIT Press. Duncker, K. (1945). On problem solving. Psychological Monographs, 58 (5, Whole No. 270). Falkenhainer, B., Forbus, K. D., & Gentner, D. (1989). The structure-mapping engine: Algorithm and examples. Artificial Intelligence, 41, 1e63. Forbus, K., Gentner, D., & Law, K. (1995). MAC/FAC: A model of similarity-based retrieval. Cognitive Science, 19, 141e204. Forceville, C. (2006). Non-verbal and multimodal metaphor in a cognitivist framework: Agendas for research. In G. Kristiansen, M. Achard, R. Dirven, & F. Ruiz de Mendoza (Eds.), Cognitive linguistics: Current applications and future perspectives (pp. 379e 402). New York: Mouton de Gruyter. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155e170. Gentner, D. (1989). The mechanisms of analogical transfer. In S. Vosniadou, & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 199e242). Cambridge, UK: Cambridge University Press. Gentner, D., Loewenstein, J., Thompson, L., & Forbus, K. (2009). Reviving inert knowledge: Analogical abstraction supports relational retrieval of past events. Cognitive Science, 3, 1343e1382. Gentner, D., & Markman, A. B. (2006). Defining structural similarity. The Journal of Cognitive Science, 6, 1e20. Gentner, D., Rattermann, M. J., & Forbus, K. D. (1993). The roles of similarity in transfer: Separating retrievability from inferential soundness. Cognitive Psychology, 25, 431e467.

Cracking the Problem of Inert Knowledge

39

Gentner, D., & Wolff, P. (2000). Metaphor and knowledge change. In E. Dietrich, & A. Markman (Eds.), Cognitive dynamics: Conceptual change in humans and machines (pp. 295e342). Mahwah, NJ: LEA. Gibbs, R. W., Jr. (1994). The poetics of the mind: Figurative thought, language and understanding. Cambridge: Cambridge University Press. Gick, M. L., & Holyoak, K. J. (1980). Analogical problem solving. Cognitive Psychology, 12, 306e355. Gick, M. L., & Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15, 1e38. Goldstone, R. L., & Sakamoto, Y. (2003). The transfer of abstract principles governing complex adaptive systems. Cognitive Psychology, 46, 414e466. Goldstone, R. L., & Wilensky, U. (2008). Promoting transfer by grounding complex systems principles. Journal of the Learning Sciences, 17, 465e516. Hofstadter, D. R. (1985). Metamagical Themas: Questing for the essence of mind and pattern. London: Viking. Hofstadter, D. R. (2001). Epilogue: Analogy as the core of cognition. In D. Gentner, K. J. Holyoak, & B. Kokinov (Eds.), The analogical mind: Perspectives from cognitive science (pp. 499e538). Cambridge, MA: The MIT Press. Hofstadter, D. R., & Sander, E. (2013). Surfaces and essences: Analogy as the fuel and fire of thinking. New York: Basic Books. Hofstadter, D. R., & The Fluid Analogies Research Group. (1995). Fluid concepts and creative analogies: Computer models of the fundamental mechanisms of thought. New York: Basic Books. Holland, J. H., Holyoak, K. J., Nisbett, R. E., & Thagard, P. R. (1986). Induction: Processes of inference, learning and discovery. Cambridge, MA: The MIT Press. Holyoak, K. J., & Koh, K. (1987). Surface and structural similarity in analogical transfer. Memory & Cognition, 15, 332e340. Holyoak, K. J., & Thagard, P. R. (1995). Mental leaps: Analogy in creative thought. Cambridge, MA: The MIT Press. Hummel, J. E., & Holyoak, K. J. (1997). Distributed representations of structure: A theory of analogical access and mapping. Psychological Review, 104, 427e466. Keane, M. T. (1985). On drawing analogies when solving problems: A theory and test of solution generation in an analogical problem solving task. British Journal of Psychology, 76, 449e458. Keane, M. T. (1987). On retrieving analogues when solving problems. Quarterly Journal of Experimental Psychology, 39, 29e41. Kovecses, Z. (2002). Metaphor: A practical introduction. Oxford, NY: Oxford University Press. Kretz, D. R., & Krawczyk, D. C. (2014). Expert analogy use in a naturalistic setting. Frontiers in Psychology, 5, 1333. Kurtz, K., & Loewenstein, J. (2007). Converging on a new role for analogy in problem solving and retrieval: When two problems are better than one. Memory & Cognition, 35, 334e341. Lakoff, G. (1990). The Invariance Hypothesis: Is abstract reason based on image-schemas? Cognitive Linguistics, 1, 39e75. Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: Chicago University Press. Lakoff, G., & Turner, M. (1989). More than cool reason: A field guide to poetic metaphor. Chicago: University of Chicago Press. Lampert, M. (1986). Knowing, doing, and teaching multiplication. Cognition and Instruction, 3, 305e342. Loewenstein, J. (2010). How one’s hook is baited matters for catching an analogy. In B. H. Ross (Ed.), The psychology of learning and motivation: Advances in research and theory (Vol. 53, pp. 149e182). San Diego, CA: Elsevier.

40

Maximo Trench and Ricardo A. Minervino

Mace, J. H. (2010). Involuntary remembering and voluntary remembering: How different are they? In J. H. Mace (Ed.), The act of remembering: Toward an understanding of how we recall the past. Wiley-Blackwell. Mandler, J. M., & Orlich, F. (1993). Analogical transfer: The roles of schema abstraction and awareness. Bulletin of the Psychonomic Society, 5, 485e487. Markman, A. B., & Ross, B. H. (2003). Category use and category learning. Psychological Bulletin, 129, 592e613. Markman, A. B., Taylor, E., & Gentner, D. (2007). Auditory presentation leads to better analogical retrieval than written presentation. Psychonomic Bulletin & Review, 14, 1101e1106. Martínez Frontera, L. (2015). Retrieval of base analogs from long-term memory: Differences between automatic and voluntary search (Unpublished Master’s thesis). Buenos Aires, Argentina: FLACSO-Universidad Aut onoma de Madrid. Medin, D. L., & Ross, B. H. (1989). The specific character of abstract thought: Categorization, problem-solving, and induction. In R. J. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 5, pp. 189e223). Hillsdale, NJ: Erlbaum. Minervino, R., Olguín, V., & Trench, M. (2016). Promoting interdomain analogical transfer: When creating a problem helps to solve a problem. Memory & Cognition (in press). Nathan, M. J., Kintsch, W., & Young, E. (1992). A theory of algebra word problem comprehension and its implications for the design of computer learning environments. Cognition and Instruction, 9, 329e389. Olguín, V., Trench, M., & Minervino, R. Attending to individual recipients’ knowledge when generating persuasive analogies (in revision). Olton, R. M. (1979). Experimental studies of incubation: Searching for the elusive. Journal of Creative Behavior, 13, 9e22. Osherson, D. N., Smith, E. E., Wilkie, O., L opez, A., & Shafir, E. (1990). Category-based induction. Psychological Review, 97, 185e200. O’keefe, D., & Costello, F. (2008). A fast computational model of analogical retrieval (and mapping). In B. C. Love, K. McRae, & V. M. Sloutsky (Eds.), Proceedings of the 30th Annual Conference of the Cognitive Science Society (pp. 2003e2008). Austin, TX: Cognitive Science Society. Richland, L. E., Holyoak, K. J., & Stigler, J. W. (2004). Analogy use in eighth-grade mathematics classrooms. Cognition and Instruction, 22, 37e60. Rips, L. (1975). Inductive judgments about natural categories. Journal of Verbal Learning and Verbal Behavior, 14, 665e681. Ross, B. H. (1987). This is like that: The use of earlier problems and the separation of similarity effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 629e639. Ross, B. H. (1989). Distinguishing types of superficial similarities: Different effects on the access and use of earlier problems. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 456e468. Ross, B. H., & Kennedy, P. T. (1990). Generalizing from the use of earlier examples in problem solving. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 42e55. Ross, B. H., & Kilbane, M. C. (1997). Effects of principle explanation and superficial similarity on analogical mapping in problem solving. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 427e440. Rudnitsky, A., Etheredge, S., Freeman, S. J. M., & Gilbert, T. (1995). Learning to solve addition and subtraction word problems through a structure-plus-writing approach. Journal for Research in Mathematics Education, 26, 467e486. Sander, E., & Richard, J.-F. (1997). Analogical transfer as guided by an abstraction process: The case of learning by doing in text editing. Journal of Experimental Psychology: Learning, Memory, & Cognition, 23, 1459e1483.

Cracking the Problem of Inert Knowledge

41

Saner, L., & Schunn, C. D. (1999). Analogies out of the blue: When history seems to retell itself. In M. Hahn, & S. Stoness (Eds.), Proceedings of the 21st Annual Conference of the Cognitive Science Society (pp. 619e624). Mahwah, NJ: Erlbaum. Schwartz, D. L. (1995). The emergence of abstract representations in dyad problem solving. The Journal of the Learning Sciences, 4, 321e354. Spencer, R. M., & Weisberg, R. W. (1986). Context-dependent effects on analogical transfer. Memory & Cognition, 14, 442e449. Thagard, P., Holyoak, K., Nelson, G., & Gochfeld, D. (1990). Analog retrieval by constraint satisfaction. Artificial Intelligence, 46, 259e310. Trench, M., & Minervino, R. (2015a). The role of surface similarity in analogical retrieval: Bridging the gap between the naturalistic and the experimental traditions. Cognitive Science, 39, 1292e1319. Trench, M., & Minervino, R. (2015b). Creativity training from a continuist perspective: Reviving dormant analogies to generate novel metaphorical expressions. Creativity Research Journal, 27, 188e197. Trench, M., Oberholzer, N., Adrover, J. F., & Minervino, R. (2009). La eficacia del paradigma de producci on para promover la recuperaci on de analogos base interdominio. Psykhé, 18, 39e48. Trench, M., Olguín, V., & Minervino, R. (2016). Seek, and Ye shall find: Differences between spontaneous and voluntary analogical retrieval. Quarterly Journal of Experimental Psychology, 69, 698e712. Trench, M., Tavernini, M., & Goldstone, R. L. Enhancing the retrieval of distant analogs via idealizing target representations (in preparation). Tulving, E. (1983). Elements of episodic memory. New York: Oxford University Press. Tulving, E., & Thompson, D. M. (1973). Encoding specificity and retrieval processes in episodic memory. Psychological Review, 80, 352e373. Turner, M. (1996). The literary mind: The origins of thought and language. Oxford: Oxford University Press. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124e1131. Ward, T. B., Smith, S. M., & Finke, R. A. (1999). Creative cognition. In R. J. Sternberg (Ed.), Handbook of creativity (pp. 189e212). Cambridge: Cambridge University Press.

CHAPTER TWO

The Complexities of Learning Categories Through Comparisonsa Erin Jones Higgins Institute of Education Sciences, U.S. Department of Education, Washington, DC, United States E-mail: [email protected]

Contents 1. Introduction 1.1 Evidence That Comparison-Type Matters 2. Analogical Reasoning as a Lens for Understanding the Comparison Benefits 2.1 Benefits of Structural Alignment 2.2 Do People Spontaneously Notice Structural Commonalities Between Items? 2.3 Structural Alignment and Alignable Differences 3. Effects of Item Order on Learning 3.1 The Benefits of Interleaving 4. Integrating the Analogical Reasoning and Item Order Research Into a Coherent Framework 4.1 The Highlighter Hypothesis 5. Testing the Highlighter Hypothesis Through a Series of Studies 5.1 Are Between-Category Comparisons Better When Learners Can Rely on Feature Values to Determine Category Membership? 5.1.1 Predictions and Results

57 58 58 60

5.2 Are Within-Category Comparisons Better When the Relational Structure of a Category is Necessary for Determining the Category Membership, but Difficult to Notice? 5.2.1 Predictions and Results

61

63

5.3 What Are the Effects of Comparisons on Learning Relative to Cases Where No Comparisons Are Made During Learning? 5.3.1 Predictions and Results

65 67

6. Implications and Conclusions 6.1 Alternative Explanations

70 71

6.1.1 Attentional Bias Framework

a

44 45 46 46 50 52 53 53 56

71

Disclaimer: This article was written in the author’s nonofficial capacity and does not necessarily represent the views of the Institute of Education Sciences, the U.S. Department of Education, or the United States.

Psychology of Learning and Motivation, Volume 66 ISSN 0079-7421 http://dx.doi.org/10.1016/bs.plm.2016.11.002

© 2017 Elsevier Inc. All rights reserved.

43

j

44

Erin Jones Higgins

6.1.2 Prior-Knowledge Explanation 6.1.3 Summary of Alternative Explanations

6.2 Comparisons in the Real World 6.3 Conclusions Acknowledgments References

72 73

73 74 75 75

Abstract Comparisons have been suggested as central to category learning, yet we are only beginning to understand how different types of comparisons affect what people learn. Prior research has established that different ways of learning affect what information learners acquire, suggesting that different types of comparisons may also affect learning in different ways. An important comparison-type distinction in category learning is between-category versus within-category comparisons. The results of prior studies looking at these types of comparisons are mixed, so it remains unclear how each type of comparison affects category learning. This chapter lays out a framework based on the idea that the benefits of comparisons depend on both the type of comparison being made as well as what information needs to be learned. Specifically, between-category comparisons highlight distinguishing information between categories while within-category comparisons highlight commonalities and the relational structure of items.

1. INTRODUCTION When a math teacher goes through an example math problem on the board, she expects her students to learn more than just how to solve that specific problem. When a child learning to speak uses a new word like “dog,” part of what she needs to learn is what to apply that new word to (small dogs, big dogs) and what not to apply it to (lions and bears, for instance). Knowing what category an item belongs to is powerful because it enables the use of a large body of knowledge about that class of items, which can be drawn upon for a variety of tasks such as making decisions, solving problems, making predictions and constructing explanations. Given that categories underlie many cognitive tasks, it is critical to understand how they are learned. Learning about a category often occurs as a function of using it, so in developing an understanding of how categories are learned, it is important to consider how different kinds of active processing lead to differences in what is learned. Recent work has demonstrated that different types of learning tasks affect what category information is eventually

Comparisons and Category Learning

45

acquired (Anderson, Ross, & Chin-Parker, 2002; Chin-Parker & Ross, 2002, 2004; Jones & Ross, 2011; Yamauchi & Markman, 1998; see Markman & Ross, 2003 for a review). One type of active processing that is thought to be central to category learning and use is comparison (e.g., Goldstone, Day, & Son, 2010; Spalding & Ross, 1994). Comparisons occur all the time and can serve a variety of purposes. Sometimes, these comparisons are explicitly made, like when a teacher tells students to think about how a story they are reading is similar to or different from the one they finished last week. Other times, comparisons happen spontaneously, without explicit prompting, like when a student is solving a set of math problems for homework and spontaneously notices some similarities and differences between the problem he just solved and the one he is about to solve. How does making comparisons affect a learner’s understanding of a situation and the items involved? How does it affect a person’s mental representation of the category those items are members of? It is not the case that experiences are objectively encoded and represented in the mind, but rather, they are perceived and subsequently stored in a subjective way based on our prior knowledge and the current context of an experience. This chapter presents a framework for thinking about how comparisons affect category learning and, specifically, the degree to which different types of comparisons may lead to differences in what is learned.

1.1 Evidence That Comparison-Type Matters When considering the role of comparisons in category learning, it is important to consider that there are different types of information to be learned, such as the features that are important for distinguishing one category from another and the feature values that are common to most members of a category. Additionally, there are different types of comparisons learners can engage in, which may affect how learners allocate their attention. For example, when learning about birds, comparing a finch to a sparrow is likely to highlight information that distinguishes them (e.g., their different body shapes, coloring) while comparing one finch to another finch may highlight commonalities between the two birds (e.g., their beak shape). Learners may compare items from different categories or items from the same category, and even within a category, learners may be comparing items that share many feature values or items that share few feature values with each other. Consequently, understanding the role of comparisons in category learning requires knowing when, how and why particular types of comparisons are effective.

46

Erin Jones Higgins

There is some evidence that supports this idea that different types of comparisons affect what and how features of an item are represented. Medin, Goldstone, and Gentner (1993) showed that people perceive an ambiguous item in a way that is consistent with the comparison they make. For example, if shown an item with three prongs and a shorter prong that could be interpreted either as a fourth prong or as another part of the item, participants were more likely to say the item had four prongs if they had compared it to another item with four distinct prongs and three prongs if they had compared it with another item with only three distinct prongs. These results demonstrate how influential the type of comparison can be in determining how features of an item are interpreted and encoded. The type of comparison can also affect what information is learned. Rittle-Johnson and Star (2009) showed that certain types of comparisons lead to better conceptual understanding of math problems than other types. Learners studied algebra problems by either comparing similar problems using the same solution method, different problem types using the same solution method or the same problem solved with two solution methods. Learners who compared different solution methods for the same problem showed greater conceptual knowledge (knowledge of various algebra concepts) and procedural flexibility (the ability to generate multiple solutions if asked, as well as the ability to choose appropriate, accurate and efficient solutions at test) than learners who compared different problem types solved with the same solution. The majority of research examining comparisons focuses on (1) how considering multiple items as opposed to one item at a time impacts problem solving and reasoning (e.g., Gentner, Loewenstein, & Thompson, 2003; Gick & Holyoak, 1983; Rittle-Johnson & Star, 2007) and (2) whether an interleaved or blocked item order leads to better learning (e.g., Kang & Pashler, 2012; Kornell & Bjork, 2008; Rohrer & Taylor, 2007; Taylor & Rohrer, 2010). The summaries of these two lines of research are described below.

2. ANALOGICAL REASONING AS A LENS FOR UNDERSTANDING THE COMPARISON BENEFITS 2.1 Benefits of Structural Alignment Research looking directly at the role of comparisons in the analogicalreasoning domain has shown benefits for comparing items from a single category as opposed to studying the same items one at a time. Analogical

Comparisons and Category Learning

47

reasoning refers to the process of identifying how aspects of one item correspond to aspects of another item. One can think of making an analogy as similar to making a within-category comparison, as the goal in each is to determine how one item is like another item. When making an analogy, a mapping process is used to determine the one-to-one correspondences between the components of one item and the components of the other item. According to structure-mapping theory (Gentner, 1983), this mapping process boosts attention to the relational structure of each item and takes attention away from the item’s specific feature values. In other words, this structural alignment process highlights the role each feature plays and downplays the actual value for each feature. For example, when children are learning to solve division problems, they may use a word problem involving 10 apples that need to be divided between two people as an analogy to help them solve a new problem involving 90 dollars that needs to be divided among three cash registers. Even though the values of the features of the problems are different, the roles are the same. The 10 apples correspond to 90 dollars, both are dividends, while the two people correspond to three cash registers, both are divisors. Structure-mapping theory predicts that when relational structure needs to be learned, analogy can be a powerful tool for highlighting it, even if the feature values of the various items encountered during learning are very different. A number of studies have demonstrated the benefits of structural alignment in learning the relational structure (e.g., Doumas & Hummel, 2004; Gentner et al., 2003; Gentner & Namy, 1999; Gick & Holyoak, 1983; Kotovsky & Gentner, 1996; Namy & Gentner, 2002). Some of the earliest evidence for the benefits of comparison comes from Gick and Holyoak (1983). Across a series of studies, they had participants study one or two examples of problems that required a solution using converging forces. In one story, an army general sends small groups of soldiers to attack a fortress from different locations, and in a second story, a fire fighter has multiple people spray hoses simultaneously from different locations to put out a fire. After being exposed to one or multiple problems where the solution was to use converging forces, participants attempted to solve Duncker’s (1945) radiation problem, where the solution is to have rays converge on a tumor from multiple directions in order to shrink it. Participants who actively compared across the army-general and fire-fighter problems were more likely to solve the radiation problem than those who did not compare. Gick and Holyoak (1983) argue that participants who compared across problems were more likely to solve the radiation

48

Erin Jones Higgins

problem because comparison facilitated the development of a problem schema that represented the structural characteristics of problems requiring a converging-forces solution. For example, these participants may have learned that all of the problems present a situation that requires a large amount of force to solve; however, due to the characteristics of the barrier between the force and the object the force needs to be applied to, only a small amount of force can be used at any given spot on the barrier. Participants who did not make comparisons were not able to abstract this schema, as they focused primarily on the surface features of the problem. Gentner et al. (2003) demonstrated similar comparison benefits. In their study, participants either compared two examples of a negotiation strategy or studied each one independently. Those who compared the examples developed a better schema and consequently demonstrated better performance when asked to solve a novel problem requiring the same negotiation strategy. The explanation for their results is that comparisons benefit learning because they allow learners to see the underlying structure shared by both examples and ignore the surface features. On the other hand, when comparison processes are not engaged, learners focus on surface features of the examples and ignore the underlying structure. Doumas and Hummel (2004) used the structure-mapping framework to motivate their study looking at the role of comparisons in relationalcategory learning. Relational categories are defined by the role each feature plays rather than each feature’s value, and Gentner and Kurtz (2005) point out that relational categories are at least as common as feature-based categories (categories defined by feature values rather than the relations between features) in the world. Doumas and Hummel (2004) predicted that when learning to distinguish between categories requires learners to discover and predicate relations, analogical mapping (i.e., comparison across items from the same category) should facilitate learning. To address this prediction, they tested whether or not making a comparison across items from the same relational category affected classification performance. Participants learned about two categories of cells. Each cell had five features: its location, shape, membrane thickness, nucleus roundness and number of organelles. A higher-order relation between the cells’ membrane thickness and the roundness of the nucleus defined category membership while the other feature’s values were random across items from both categories. All participants classified items with feedback for one block of trials. Next, one group of participants was given a mapping task, where their job was to map the elements of one item to the elements of another from the

Comparisons and Category Learning

49

same category. The other group of participants was simply told to study the two items. After the mapping task, participants performed another block of classification trials. The learners who participated in the mapping task showed higher classification performance in the second block of trials than the learners who were only given an opportunity to study the items, demonstrating the benefits of comparing items from the same category. Kotovsky and Gentner (1996) also demonstrated the importance of comparisons in discovering relations. In their first experiment, children of varying ages were shown a pattern of shapes and were asked to identify which of two other patterns was most like it. This pattern-matching task required children to notice relational commonalities between the presented pattern and the correct response in order to successfully perform this task. The surface similarity between the presented pattern and the responses was manipulated across trials. For half of the trials, the presented pattern and responses varied along the same dimension (e.g., the presented pattern was small circle, large circle, small circle and the correct match was the pattern of small square, large square, small square). For the other half of the trials, referred to here as cross-dimension trials, the presented pattern and responses varied along different dimensions (e.g., the presented pattern was small circle, large circle, small circle and the correct match was the pattern of white square, black square, white square). Within the samedimension and cross-dimension trials, half were polarity-matched trials (as the previous two examples demonstrated) while the other half were polarity-mismatched trials (e.g., the presented pattern was white circle, black circle, white circle and the correct match was black square, white square, black square). Older children were able to successfully recognize the relational choice, regardless of the trial type. Four-year-olds had a difficult time with this task, and performed at chance on all trial types with the exception of the same dimension, polarity-matched trials. In subsequent experiments, Kotovsky and Gentner (1996) attempted to increase four-year-olds’ performance on the pattern-matching task in various ways. In these studies, there were only two trial types: same dimension, polarity-matched trials and cross-dimension, polarity-matched trials. They found that progressive alignment (i.e., ordering the trials in such a way that learners viewed the easier same-dimension trials at the beginning of learning and the cross-dimension trials at the end of learning) and labeling (i.e., teaching children labels for the relational and nonrelational answer choices before having them do the pattern-matching task)dboth of which most likely encouraged children to make comparisonsdimproved

50

Erin Jones Higgins

four-year-olds’ pattern-matching performance, even for the crossdimension trials. Kotovsky and Gentner (1996) argue that progressive alignment was effective in the pattern-matching task because it prompted children to make comparisons across trials (due to the high similarity between trials at the beginning of the experiment). When making these comparisons, children most likely aligned their representations of the trials they were comparing on the basis of surface similarity, but when they made the comparisons, the common relational structure the trials shared was highlighted. Similarly, the label-learning task most likely encouraged children to compare across the answer choices that shared labels, leading them to develop a more abstract representation than they would have had they not completed the label-learning task. Kotovsky and Gentner (1996) use these results as evidence that comparisons enabled four-year-olds to develop an abstract representation of relations.

2.2 Do People Spontaneously Notice Structural Commonalities Between Items? In all of the studies reviewed so far (perhaps with the exception of Kotovsky & Gentner, 1996), participants have been explicitly told to make comparisons across items. In most cases, the items used in these studies shared relational structure but not surface features. An important question is whether people can make the same kinds of comparisons on their own without some sort of prompt to do so. In other words, can people notice structural commonalities across items on their own when the items do not share many surface features? The findings by Gentner et al. (2003) and Doumas and Hummel (2004) suggest that people fail to make comparisons across items of the same type, whose surface features vary, unless they are told to do so. In both studies, participants saw all of the same items, but the group of participants explicitly told to make comparisons outperformed the other group. However, there is some evidence showing that people can sometimes spontaneously make comparisons. These spontaneous comparisons typically occur when items of the same type share surface features (e.g., one sees two of the same type of math problem, which are both about car mechanics working on cars; Ross, 1984, 1987) although successful use of the comparison is determined by the relational correspondences (Ross, 1987). These findings show that when learning about items that are not similar on the surface, guided comparison is often necessary in order for people to notice structural

Comparisons and Category Learning

51

commonalities between those items. When learning about items whose surface features are similar, people are more likely to make effective comparisons on their own. The idea that people fail to notice similarities between items that do not share many surface features is consistent with a large body of expertise research, which shows that novices tend to sort items and think about items differently than experts. For example, Chi, Feltovich, and Glasser (1981) showed that physics novices sort physics problems based on their surface features (e.g., “these problems deal with blocks on an inclined plane”) while experts sort them by their underlying structural features (e.g., “these are conservation of energy problems”). In domains like physics where novices will be misled by the surface features of the problem, encouraging withincategory comparisons may facilitate conceptual understanding. Children have the same tendency to focus on surface features and ignore the underlying relational structure of items (e.g., Imai, Gentner, & Uchida, 1994). Imai et al. (1994) used a task where children were shown a familiar item (e.g., an apple) and told a new, unfamiliar label for it (e.g., “This is a dax”). Next, they were shown three additional items: a perceptually similar, out-of-category item (e.g., a balloon), an item that shared the same taxonomic category but was perceptually dissimilar (e.g., a banana), and a thematic match that was also perceptually dissimilar (e.g., a knife). Their task was to choose the item that most likely shares the first item’s label. Children were more likely to select the perceptually similar, out-of-category item than the other two items. Gentner and Namy (1999; also see Namy & Gentner, 2002) extended this work through a number of studies that investigated how children approach this label-matching task when they are given the opportunity to make comparisons. To invite comparison, children were presented two items and told that they shared a new, unfamiliar label. Next, they were asked to choose which of two new items (a perceptually similar, out-ofcategory item and an item that shared the same taxonomic category but was perceptually dissimilar) most likely shares the same new label with the items they had just compared. Children demonstrated that they were capable of matching based on taxonomic category as long as the comparison made prior to the label-matching task was between two items from the same taxonomic category. What is striking about this finding is that if either of the items from the comparison was presented alone, children would have been more likely to choose the perceptually similar, out-of-category item. However, when given the opportunity to compare, children overcame their

52

Erin Jones Higgins

bias to focus on perceptual (i.e., surface) features and successfully recognized structural commonalities between items. In a recent follow-up study, Namy and Clepper (2010) showed that making contrasts (i.e., showing children perceptually similar items from other taxonomic categories and telling them they were not part of the target category) was not sufficient for overcoming this bias to choose the perceptually similar item over the conceptually similar item. Together, these results suggest that withincategory comparisons in particular highlight the common relational structure that items from a category share (e.g., the functional commonality that both apples and bananas are edible) and downplay learners’ reliance on perceptual features. In summary, when novices in a domain encounter items whose surface features do not clearly predict category membership, they perseverate on them anyway. When the surface features are similar across items from the same category but not items from other categories, this bias to focus on surface features can lead people to acquire the relational structure (because they are more likely to spontaneously make comparisons between the items as in Ross, 1987). When the surface features are not similar across items from the same category, as is the case with most complex domains, people will still focus on surface features and will fail to notice the relational commonalities across items. In order for novices to become experts in domains where the relational structure of items is not obvious and surface features are not similar across items, novices need to do something that will emphasize the common relational structure across the items. Structure-mapping theory explains that comparisons highlight common relational structure, leading novices to eventually acquire a schema that reflects structural rather than surface features. Therefore, in cases where novices fail to identify the relational structure of the items on their own, guided within-category comparisons will provide a means of bootstrapping their understanding of the relational structure.

2.3 Structural Alignment and Alignable Differences While the emphasis in this section has been on uncovering the common relational structure across items using structural alignment, this same process can also highlight important differences between items (Gentner & Markman, 1994; Markman & Gentner, 2000). Determining the important differences that exist across items is not a trivial matter, as there are an infinite number of differences between items that one could list. How does one decide which differences matter and which differences do not?

Comparisons and Category Learning

53

Gentner and Markman (1994) and Markman and Gentner (2000) distinguish between two types of differences: alignable and nonalignable differences. An alignable difference is a feature-value difference between two items that occurs within a feature shared across both items. For instance, cars have four wheels and bicycles have two wheels. The difference between the two items is in the value of the wheel feature, which both items share. A nonalignable difference is a difference that does not have a corresponding feature across the two items. For example, a car can have a moonroof or a bicycle can have flat handlebars. There is no bicycle feature value that corresponds to a car’s moonroof (and vice versa). Alignable differences are favored over nonalignable differences. People are more likely to list alignable differences between items, are more likely to use them in making similarity judgments, and are more likely to attend to them when assessing similarities and differences between items (Markman & Gentner, 2000). In category learning, especially when the learning goal is to distinguish between similar categories, noticing alignable differences (i.e., a featurevalue difference between two items that occurs within a feature shared across both items) is critical. Without an understanding of the relational structure of items within each category, it is impossible to know which between-categorydifferences are alignable or nonalignable. Under circumstances where the relational structure is not yet understood, structuremapping theory predicts that within-category comparisons would be helpful first even if the goal is to identify alignable differences. This may seem counter-intuitive, but the idea is that one needs an understanding of relational structure before being able to determine which differences are alignable. Once the learner has an understanding of the relational structure of items from a category, he or she now has a framework for interpreting differences between categories. To summarize, while within-category comparisons may not be optimal for noticing critical between-category differences, this type of comparison may be critical for first establishing the framework necessary to evaluate whether a difference is alignable or not.

3. EFFECTS OF ITEM ORDER ON LEARNING 3.1 The Benefits of Interleaving When learning about categories, the learner’s goal is to generalize across items rather than remember specific items. When generalization is

54

Erin Jones Higgins

the goal, interleaving different items from the same category between items from other categories can be thought of as an opportunity to compare across different types of items while blocking items from the same category together can be thought of as an opportunity to compare across items of the same type. Recently, a number of researchers have focused on whether interleaving items from multiple categories or blocking items from the same category facilitates learning. For instance, Kornell and Bjork (2008) had participants learn about paintings by multiple artists. When paintings from each artist were spaced between paintings from the other artists, learners were better able to classify novel paintings from those artists at test than if they had studied the paintings from each artist massed together. Kornell and Bjork (2008) argue that spacing allows for more effective discrimination learning. By having items from one category interspersed between items of other categories, learners could more easily compare items from different categories to see what features were diagnostic of category membership. Other researchers have shown similar effects (e.g., Kang & Pashler, 2012; Rohrer & Taylor, 2007; Taylor & Rohrer, 2010). In addition, researchers now have data suggesting that these interleaving effects are not due to increased temporal spacing between items of the same type (Carvalho & Goldstone, 2014a; Kang & Pashler, 2012; Vlach, Ankowski, & Sandhofer, 2012; Zulkiply & Burt, 2013), though temporal spacing could add a unique contribution to learning effects, which is separate from the benefits of interleaving (Birnbaum, Kornell, Bjork, & Bjork, 2013). The common explanation for these findings, initially suggested by Kornell and Bjork (2008) and further clarified by Kang and Pashler (2012) is that interleaving provides conditions that encourage learners to notice differences between categories while blocking provides conditions that encourage learners to notice similarities within a category. The discriminative contrast hypothesis (Birnbaum et al., 2013; Kang & Pashler, 2012; Kornell & Bjork, 2008) is that interleaving will facilitate category learning when learners need to notice differences while blocking will facilitate category learning when learners need to notice similarities. For example, when learning about birds, comparing a finch to a sparrow emphasizes the features that distinguish them (e.g., their different body shapes, coloring) while comparing one finch to another finch emphasizes similarities between the two birds (e.g., their beak shapes). An additional prediction made by this framework is that interleaving should be beneficial when the test task is classification, as classification focuses on discriminating between categories, while blocking should be more

Comparisons and Category Learning

55

beneficial for tasks that require knowledge of within-category similarities (e.g., making inferences about category members; Birnbaum et al., 2013). Despite the strong evidence in favor of the discriminative-contrast hypothesis, there is one finding that does not fit within this view, suggesting that perhaps the role of different types of comparisons in category learning is more complicated than this view suggests. Kurtz and Hovland (1956) showed a benefit for blocked presentations of items over interleaved presentations of items. In their experiment, participants learned about geometric patterns that varied on four features (size, shape, color and position). Each drawing could be classified into one of four categories, determined by rules. Participants saw items one at a time in either a massed or spaced sequence, and those who learned through the blocked sequence performed better at test (both on a classification test as well as when asked to provide a verbal description of each category). Carvalho and Goldstone (2014b, 2015) have provided a different explanation of how item order affects category learning, which provides an explanation as to why one might see a benefit for blocked study, as in the Kurtz and Hovland (1956) study. In their attentional bias framework, they propose that blocking emphasizes within-category similarities while interleaving better emphasizes between-category differences. Depending on the category structure to be learneddwhich in this case is defined by within- and between-category similaritiesdinterleaving or blocking may be the more optimal item order. Their framework predictsdand has been supported through empirical work (Carvalho & Goldstone, 2014a, 2014b; Zulkiply & Burt, 2013)dthat categories with high between- and within-category similarities (low discriminability categories) will be better learned through interleaving while categories with low between- and within-category similarities (high discriminability categories) will be learned better through blocking (Carvalho & Goldstone, 2014b). Others have made similar claims about the importance of category discriminability and similarity in the literature (e.g., Hammer, Bar-Hillel, Hertz, Weinshall, & Hochstein, 2008; Zulkiply & Burt, 2013). In their review of the literature in this area, Carvalho and Goldstone (2015) attempted to sort the studies in the literature on the basis of similarity, and the sorting appears to align with these predictions (studies in the literature that used low discriminability categories were better learned through interleaving and studies that used high discriminability categories were better learned through blocking).

56

Erin Jones Higgins

4. INTEGRATING THE ANALOGICAL REASONING AND ITEM ORDER RESEARCH INTO A COHERENT FRAMEWORK Both the research on analogical reasoning and the research on item order fall short of fully understanding how comparisons affect category learning. The analogy research typically focuses on learning one concept rather than multiple concepts. In a meta-analysis of 57 experiments on comparisons, Alfieri, Nokes-Malach, and Schunn (2013) demonstrated that comparisons consistently benefitted learning compared to control conditions such as studying with single cases or studying cases sequentially. However, the authors note that most of the studies they included focused on one concept and the learning goal was to identify similarities between instances of that concept, making it difficult to generalize these findings to questions about how comparisons affect learners’ ability to learn about differences between instances and discriminate between multiple concepts. The research on item order is conducted under the assumption that item order matters due to comparison, but these studies do not actually examine the act of making explicit comparisons during learning. Instead, it is inferred that comparisons are made based on seeing differences in learning between the two item orders. A small handful of these item-order studies have presented items simultaneously (Carvalho & Goldstone, 2014b; Kang & Pashler, 2012; Vlach et al., 2012; Wahlheim, Dunlosky, & Jacoby, 2011), though none of these studies asked participants to explicitly compare the presented items. In addition, the finding by Kurtz and Hovland (1956) that blocking facilitated category learning cannot be accounted for by the current version of the discriminative-contrast hypothesis. The attentional bias framework provides a potential explanation for why some studies show an interleaving benefit while others show a benefit for blocking, and it makes some similar arguments that are made here, specifically, that item ordering provides an opportunity for particular types of comparisons. However, this framework has not yet provided a comprehensive account as to how it fits in with the analogical reasoning literature, which has also emphasized the value of comparisons in concept learning, and perceives the value of comparisons in a somewhat different fashion. Developing a comprehensive understanding of how comparisons affect learning requires a different perspective than the one that was taken in each of the research areas summarized here. First, learning a single category

Comparisons and Category Learning

57

is a process that occurs in the context of learning and using multiple categories, yet much of the analogical reasoning research on comparisons examines how people learn a single, complex concept without consideration of how that concept may be learned in relation to other concepts. Second, while interleaving is like between-category comparison and blocking is like within-category comparison, they are not the same as explicit betweencategory and within-category comparisons. Prior research has shown that for some concepts, explicit comparison (as opposed to presenting the same items sequentially without a prompt to compare) is necessary in order for learners to actively compare across items and benefit from the comparison (e.g., Loewenstein, Thompson, & Gentner, 2003; Rittle-Johnson & Star, 2007; Vendetti, Matlen, Richland, & Bunge, 2015). The remainder of this chapter considers how the act of explicitly making between-category and within-category comparisons affects learning.

4.1 The Highlighter Hypothesis This chapter presents a different view of how between-category and withincategory comparisons affect category learning, referred to here as the highlighter hypothesis. This framework was developed independently of the discriminative-contrast hypothesis and the attentional bias framework, based on the observation that the findings from the analogy and research areas initially seem inconsistent but, upon closer look, are complimentary, as they focus on very different types of category information (Higgins & Ross, 2011). More specifically, the analogical reasoning studies tend to focus on how learners acquire relational information while the item-order studies tend to focus on how learners acquire categories that can be learned by focusing on predictive feature values. The view presented here is that the benefits of between-category and within-category comparisons are not necessarily determined based on the task one needs to perform; instead, they are based on the information one needs to learn (relational or feature-based). Similar to the attentional bias framework, this framework predicts that factors such as category structure could modulate effects of betweencategory and within-category comparison benefits, even when the task one needs to perform is classification. However, unlike the attentional bias framework, this framework proposes that whether the category structure is relational or feature-based (as opposed to whether there is high or low discriminability) is the key component driving whether between-category or within-category comparisons will be more beneficial for learning. Given

58

Erin Jones Higgins

that many complex categories are relational and learning relational structure is critical for learning many types of categories, it is important that this distinction is addressed. It is likely that other factors like category discriminability are critical to understanding the role of comparisons in category learning, and future research should seek to understand the interactions between these category structure factors. The highlighter hypothesis predicts that when learners can rely on feature values to determine category membership, it is more useful to highlight information that distinguishes between categories (i.e., which features are diagnostic of category membership and which values for those features are typical for each category). When learners must focus on learning the relational structure of items, then structural commonalities between items need to be highlighted because learners will not pick up on the items’ common relational structure on their own (e.g., Chi et al., 1981; Gentner & Namy, 1999; Kotovsky & Gentner, 1996).

5. TESTING THE HIGHLIGHTER HYPOTHESIS THROUGH A SERIES OF STUDIES A straightforward way to evaluate the highlighter hypothesis is to find a case where learners can rely on feature values and a separate case requiring learners to focus on relations between features to determine category membership. The predictions are that (1) between-category comparisons will be more beneficial when learners can rely on feature values to determine category membership and (2) within-category comparisons will be more beneficial when learners need to acquire the relational structure of categories to determine category membership. To provide some initial evidence for this hypothesis, the details of three experiments are provided, which test these predictions by having participants learn categories through either between-category or within-category comparisons. Participants’ category knowledge was assessed with a classification test (and in some experiments, additional tests were used in conjunction with classification to determine what category knowledge participants had acquired).

5.1 Are Between-Category Comparisons Better When Learners Can Rely on Feature Values to Determine Category Membership? The first experiment presented here (referred to hereafter as Experiment 1) evaluated the prediction that between-category comparisons will lead to

59

Comparisons and Category Learning

higher classification performance when learners can rely on feature values to determine category membership. This experiment was also an attempt to replicate Kornell and Bjork’s (2008) finding using explicit comparisons rather than inferring the role of comparisons from performance differences based on item order. Participants (n ¼ 43) were randomly assigned to one of two between-subjects conditions: between-category-comparison learning or within-category-comparison learning. The categories were two artificial categories of aliens, called Deegers and Koozles, which varied along six binary features: arms, tails, antennae, legs, eyes and mouth. For each feature, there was a prototypical Deeger value and a prototypical Koozle value. The categories were determined by family resemblance (Rosch & Mervis, 1975), meaning the category members share many features but no single feature occurs across all members (see Table 1). Participants learned about the categories by either making comparisons between items from the two different categories or making comparisons between two items from the same category. On each trial, at the top of the screen, the between-category-comparison learners saw the prompt: “List how this Deeger (Koozle) and this Koozle (Deeger) are the same and different” (see Fig. 1 for an example). The within-category-comparison learners saw the prompt: “List how these Deegers (Koozles) are the same and different.” Participants typed out the similarities and differences between the two items in the response box provided and then clicked on a button to submit the response and move to the next trial. After learning, participants completed a 32-item classification test, which consisted of 8 Deegers and Koozles they had seen during learning as well as 24 they had not seen (see Table 2).

Table 1 Family-resemblance category structure for Experiment 1 Item type Deeger

Prototype Learning exemplars

111111 111011 110111 101111 011111

Koozle

000011 000111 001011 010011 100011

Each item is represented as six binary digits with each place indicating a feature (e.g., eyes, antenna, feet.) and the 0 or 1 in each place indicating the value of that feature (e.g., square eyes or rectangular eyes). Prototypes were not shown during the learning phase. The last two features were always the “1” value during learning, reducing the set of learning items to 4 per category.

60

Erin Jones Higgins

Figure 1 Experiment 1 example between-category-comparison trial.

5.1.1 Predictions and Results The central question was whether between-category comparisons highlight information that distinguishes between categories more effectively than within-category comparisons. The prediction was that participants who learned through between-category comparisons would perform Table 2 Items presented during the classification test for Experiment 1 Deeger Koozle

111111 111110 111101 111011 110111 101111 011111 111100 111010 111001 110110 110101 101110 101101 011110 011101

000011 000001 000010 000111 001011 010011 100011 000000 000110 000101 001010 001001 010010 010001 100010 100001

Each item is represented as six binary digits with each place indicating a feature (e.g., eyes, antenna, feet.) and the 0 or 1 in each place indicating the value of that feature (e.g., square eyes or rectangular eyes).

61

Comparisons and Category Learning

better on the classification test than participants who learned through within-category comparisons. While participants in both conditions performed above chance, between-category-comparison learners had higher overall classification accuracy than within-category-comparison learners (see Table 3), as predicted. The same pattern was observed when overall accuracy was broken down into accuracy for items participants had seen during learning and items they had never seen (though note that the effect for novel items was approaching statistical significance; p ¼ .06). Importantly, and consistent with the highlighter theory, these findings occurred when the categories could be distinguished on the basis of feature values of alignable features. These results are consistent with Kornell and Bjork’s (2008) finding that categories whose members are spaced between other category’s members during learning are learned better than categories whose members are massed together.

5.2 Are Within-Category Comparisons Better When the Relational Structure of a Category is Necessary for Determining the Category Membership, but Difficult to Notice? In this second experiment (referred to as Experiment 2 hereafter), the prediction tested was that learners benefit from within-category comparisons when the relational structure of the categories is difficult to notice and learners need to use that information to determine category membership. Participants (n ¼ 28) were randomly assigned to one of two between-subjects conditions: between-category-comparison learning or within-category-comparison learning.

Table 3 Experiment 1 classification test mean accuracy by condition Between-categoryWithin-categorycomparison condition comparison condition

Accuracy for previously studied itemsa Accuracy for Novel itemsb Overall Accuracyc

0.69 (0.18)

0.58 (0.18)

0.71 (0.17)

0.60 (0.17)

0.70 (0.16)

0.60 (0.16)

Standard deviations are in parentheses. a This difference was statistically significant, t (38) ¼ 2.05, p ¼ .047, d ¼ 0.67. b This difference was approaching statistical significance, t (38) ¼ 1.94, p ¼ .06, d ¼ 0.63. c This difference was statistically significant, t (38) ¼ 2.16, p ¼ .04, d ¼ 0.70.

62

Erin Jones Higgins

Table 4 Learning materials for Experiments 2 and 3 Morkels Krenshaws

Operates on land Works to gather harmful solids Has a shovel Operates on the surface of the water Works to clean spilled oil Has a spongy material Operates in the stratosphere Works to collect dangerous gaseous ions Has an electrostatic filter

Operates on land Works to clean spilled oil Has an electrostatic filter Operates on the surface of the water Works to collect gaseous ions Has a shovel Operates in the stratosphere Works to gather harmful solids Has a spongy material

These materials and table are adapted from Rehder, B. & Ross, B. H. (2001). Abstract coherent categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 1261e1275; Erickson, J. E., Chin-Parker, S., & Ross, B. H. (2005). Inference and classification learning of abstract coherent categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 86e99.

The categories used were two categories of machines, called Morkels and Krenshaws, defined by whether or not the feature values of an item made sense together (i.e., form a coherent machine) or not (see Table 4; materials adapted from Erickson, Chin-Parker, & Ross, 2005; Rehder & Ross, 2001). If the three features made sense together (e.g., operates on land, works to gather harmful solids and has a shovel), it was a Morkel. If the three features did not make sense together (e.g., operates on land, works to clean spilled oil and has an electrostatic filter), it was a Krenshaw. The same feature values could appear in either category, making it difficult to abstract the structure of the categories by solely focusing on the individual values. During the learning phase, participants completed 18 comparison trials, each consisting of multiple parts: two inference trials1 (where an item was missing a feature and participants had to figure out the best value for the missing feature) and a trial where participants listed similarities and differences between the two items they had just seen during the inference trials. For each inference trial, participants were presented with an item that had two out of three of its features as well as its label. Below the item there were three possible values for the missing feature. Participants had to click

1

Based on pilot data, the decision was made to add these two inference trials, as these categories were difficult for participants to learn by just doing the comparison part of the trial. Adding in this additional opportunity for actively processing the items boosted performance, allowing us to see important comparison differences.

Comparisons and Category Learning

63

on what they thought was the missing feature of the presented item. After choosing a feature, they received feedback (i.e., “Correct!” or “Incorrect”). After completing two inference trials, participants were presented with the two items they had just seen, side by side. Similar to Experiment 1, in the between-category-comparison condition, participants were prompted to “List how this Morkel (Krenshaw) and this Krenshaw (Morkel) are similar and different.” In the within-category-comparison condition, participants were prompted to “List how these Morkels (Krenshaws) are similar and different.” Participants typed out the similarities and differences between the two items in the response box provided and then clicked on a box to submit the response and move to the next trial. After learning, participants performed two classification tests that assessed participants’ relational knowledge. In the first test, participants were presented with 12 novel items (one at a time) that they had to classify. The feature values used to construct these items were completely new and unfamiliar, so the only way to successfully classify these items was to use the abstract coherence-based relation (see Table 5). In the second test, participants classified 18 pairs of features (one pair at a time). The feature values that made up the feature pairs were the same as those used during the learning phase. 5.2.1 Predictions and Results Both the novel classification test and the feature-pairs classification test assess relational knowledge, but each assesses it at a different level of specificity. For both tests, focusing on individual feature values will not be helpful because, during learning, each feature value appears equally often across both categories. It is the relations between the features that matter for category membership. Therefore, in order to be successful on either or both of these tests, participants must focus on how feature values go together, rather than on individual feature values. For the novel classification test, the only way to successfully classify items is to understand that the feature values of items from one category make sense together while the feature values of items from the other category do not make sense together. In order for participants to be successful on the feature-pairs classification test, participants could either use the abstract, coherence-based relation or more specific relational knowledge (i.e., knowledge of feature correlations that occur within items from the learning phase; for example, knowing that when the feature values operates on land and has a shovel appear together, the item is in the Morkel category).

64

Erin Jones Higgins

Table 5 Test materials for Experiments 2 and 3 Novel Morkels

Operates in highway tunnels Works to remove carbon dioxide Has a large intake fan Operates in swamps Works to remove malaria-ridden mosquitoes Has a finely woven net Operates in war zones Works to gather shards of metal Has a large magnet

Operates in parks Works to gather discarded paper Has a metal pole with a sharpened end Operates on the seafloor Works to remove lost fishing nets Has a hook Operates on the beach Works to remove broken glass Has a sifter

Novel Krenshaws

Operates in highway tunnels Works to remove lost fishing nets Has a sifter Operates in swamps Works to remove broken glass Has a metal pole with a sharpened end Operates in war zones Works to gather discarded paper Has a finely woven net

Operates in parks Works to gather discarded paper Has a hook Operates on the seafloor Works to remove malaria-ridden mosquitoes Has a large intake fan Operates on the beach Works to remove carbon dioxide Has a large magnet

These materials and table are adapted from Rehder, B. & Ross, B. H. (2001). Abstract coherent categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 1261e1275; Erickson, J. E., Chin-Parker, S., & Ross, B. H. (2005). Inference and classification learning of abstract coherent categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 86e99.

The prediction was that participants who learned through withincategory comparisons would perform better on both types of classification tests than participants who learned through between-category comparisons. The results showed that across both classification tests, within-categorycomparison learners had higher classification performance than betweencategory-comparison learners (see Table 6). In order to correctly classify items in the novel classification test, participants had to go beyond the individual surface features of the items and focus on the abstract, coherence-based relations that determined category membership. In order to correctly classify items in the feature-pairs classification test, participants could use specific knowledge of feature-based relations or the abstract, coherence-based rule.

65

Comparisons and Category Learning

Table 6 Experiment 2 mean accuracy by test type and condition Between-categoryWithin-categorycomparison condition comparison condition

Feature-pairs test accuracya Novel classification test accuracyb

0.53 (0.22)

0.81 (0.23)

0.52 (0.17)

0.70 (0.24)

Standard deviations are in parentheses. a This difference was statistically significant, t (26) ¼ 3.30, p ¼ .003, d ¼ 1.29. b This difference was statistically significant, t (26) ¼ 2.16, p ¼ .04, d ¼ 0.85.

5.3 What Are the Effects of Comparisons on Learning Relative to Cases Where No Comparisons Are Made During Learning? Experiments 1 and 2 showed a clear pattern of when between-category and within-category comparisons are beneficial, following the predictions of the highlighter hypothesis. The third experiment (referred to hereafter as Experiment 3) extended the results and tested more specific predictions of this framework by including conditions in which no comparisons were explicitly prompted. These conditions are important for understanding how explicitly prompting comparisons affects learning (relative to not receiving prompts to compare), but also to provide a stronger test of the highlighter hypothesis. In the no-comparison conditions, learners may still be impacted by the order of items during learning, and still may even compare items spontaneously on their own, but perhaps not to the extent that they would be when explicitly told to make a comparison. For categories in which relational structure must be learned, there are past findings demonstrating that making within-category comparisons leads to better later performance than not making comparisons (or making fewer comparisons; e.g., Doumas & Hummel, 2004; Gentner et al., 2003; Ross, 1987; Spalding & Ross, 1994). However, many of the effects of comparisons are inferred from studies involving differences in item order (e.g., Kornell & Bjork, 2008; Kurtz & Hovland, 1956) where it is assumed people are more likely to compare successive items than ones further in the list. This experiment more finely examined the effects of each type of comparison on learning by considering between-category-comparison learning and within-category-comparison learning relative to learning without comparisons (where items were presented one at a time in an order yoked to either the between-category-comparison condition or

66

Erin Jones Higgins

within-category-comparison condition), under circumstances where learners need to acquire the categories’ relational structures. In this experiment, the same coherence-based categories from Experiment 2 were used, and participants (n ¼ 60) were randomly assigned to one of the four learning conditions: between-category-comparison learning (BC learning), no-comparison learning with the item order yoked to the between-category-comparison condition (BN learning), within-categorycomparison learning (WC learning), or no-comparison learning with the item order yoked to the within-category-comparison condition (WN learning). The procedure closely followed the procedure from Experiment 2. One difference was that participants performed 36 comparisons (rather than 18). This number was chosen in an attempt to avoid floor effects that might obscure differences between conditions. Participants in the BN condition performed 72 inference trials in an order that was yoked to the BC condition (for every two items, they saw one from each category). Participants in the WN condition performed 72 inference trials in an order that was yoked to the WC condition (every two items were from the same category). The inference trials were exactly the same as they were for each of the two comparison conditions; however, to equate for the amount of time the BC and WC learners were exposed to the items while writing their comparisons, the no-comparison learners viewed each inference trial’s feedback screen for 17956 ms, which was 2010 ms (amount of time WC and BC learners saw the feedback screen) plus half of the total time that it took for comparison learners to write their comparisons (as determined by a pilot experiment). After learning, relational knowledge was assessed through the same novel classification and feature-pairs classification tests used in Experiment 2. As noted before, for the novel classification test, the only way to correctly classify an item is to use the abstract, coherence-based relation that determines category membership. In order for participants to correctly classify feature-pairs, they can either use the abstract, coherence-based relation or more specific feature-based relational knowledge. If participants display high feature-pairs classification performance but lower novel classification performance, it suggests that participants are relying to some degree on their knowledge of specific feature correlations, rather than the abstract coherence-based relation, to classify the feature-pairs.

67

Comparisons and Category Learning

5.3.1 Predictions and Results It is important to consider three predictions when learners need to acquire at least some of the categories’ relational structure to classify new items or feature-pairs (feature values are insufficient since the same feature values appear equally in both categories). First, to the extent that explicit withincategory comparisons help learners understand the underlying structure of a category, making within-category comparisons should lead to better performance than not making such comparisons even if the item order allows such comparisons easily (i.e., WC > WN). Second, to the extent that item order is helpful in making comparisons easier, then the two groups not making comparisons will still show a clear difference (WN > BN). Third, and perhaps most importantly for the highlighter hypothesis, it is not that all comparisons lead to better later performance, but rather that particular comparisons help learners to acquire the important information they need. For the groups with items in a between-category order (BC and BN), the prediction is that making comparisons will lead the learner astray, as they will focus on trying to find feature value differences between categories (for which there are none), making it even more difficult to learn the relational structure. Thus, the prediction is that for the betweencategory groups, being forced to make comparisons will hurt category learning (BC < BN), as making explicit comparisons will be worse than making no (or fewer) comparisons. Feature-Pairs Classification Test Performance. The feature-pairs classification test assessed participants’ knowledge of specific feature correlations from the items they had studied during the learning phase. Table 7 displays the feature-pairs classification test results by condition. A 2  2 ANOVA demonstrated a main effect of item order (within-category conditions > between-category conditions) and an interaction between item Table 7 Experiment 3 feature-pairs classification test mean accuracy by condition Between-category item Within-category item order order

Comparison learners No-comparison learners

0.63 (0.19) 0.76 (0.21)

0.96 (0.12) 0.84 (0.22)

Standard deviations are in parentheses. A 2  2 ANOVA showed a main effect of item order (between or within), F (1, 56) ¼ 18.63, p < .001, hp2 ¼ 0.25 and an interaction between item order and comparison condition (yes or no), F (1, 56) ¼ 8.04, p ¼ .006, hp2 ¼ 0.13. WN learners and BN learners performed similarly, t (28) ¼ 0.92, p ¼ .37, d ¼ 0.35. WC learners classified feature-pairs marginally more accurately than WN learners, t (28) ¼ 1.93, p ¼ .06, d ¼ 0.73. BC learners classified feature-pairs significantly worse than BN learners t (28) ¼ 2.054, p ¼ .049, d ¼ 0.78.

68

Erin Jones Higgins

order and comparison. Follow-up independent t-tests showed that WC learners classified feature-pairs marginally more accurately than WN learners, though with ceiling effects in the WC condition, WN learners and BN learners performed similarly, and BC learners classified featurepairs significantly worse than BN learners. Being explicitly prompted to make between-category comparisons actually hurt learners’ ability to learn the underlying structure of the categories relative to learners who received the same item ordering, but were not explicitly told to make the comparisons. Novel Classification Test Performance. The novel classification test assessed participants’ knowledge of the abstract, coherence-based relation that defined the two categories. Table 8 displays the novel classification test results by condition. A 2  2 ANOVA demonstrated that, similar to the feature-pair test, learners who were given an item order that encouraged within-category comparisons performed significantly better on the novel classification test than learners who were given an item order that encouraged between-category comparisons. However, the predicted interaction found in the feature-pairs testdthat WC learners would outperform WN learners and BC learners would perform worse than BN learnersdwas not found. Planned independent t-tests showed a clear advantage of item order, with WN learning leading to higher performance than BN learning, but no statistically significant differences between BC and BN learners or between WC and WN learners. It is likely that the comparison between BC and BN learners was heavily impacted by floor effects, as BC learners’ performance was not statistically different from chance (and 73% of participants in the BC condition had a proportion of correct responses of less than 0.58). The real mystery of the novel classification performance data was the comparison between WC learners and WN learners. Prior research has demonstrated the benefits of making within-category comparisons relative Table 8 Experiment 3 novel classification mean accuracy by condition Between-category item Within-category item order order

Comparison learners No-comparison learners

0.54 (0.25) 0.62 (0.22)

0.77 (0.23) 0.77 (0.19)

Standard deviations are in parentheses. A 2  2 ANOVA showed only a main effect of item order (between or within), F (1, 56) ¼ 11.01, p ¼ .002, hp2 ¼ 0.16. A planned t-test showed WN learners outperformed BN learners, t (28) ¼ 2.10, p ¼ .045, d ¼ 0.79.

Comparisons and Category Learning

69

to not making comparisons (e.g., Doumas & Hummel, 2004; Gentner et al. 2003) and there was a significant difference in the feature-pairs test. In summary, learners who saw items in a within-category item order throughout learning outperformed learners who saw items in a betweencategory item order, regardless of whether learners were prompted to make comparisons or not. This effect was demonstrated in both the novel classification test and feature-pairs classification test. Most importantly, in the feature-pairs classification test, the predicted interaction between item order and whether or not learners were prompted to compare, was observed. This test assessed learners’ knowledge of specific relational information (i.e., the feature correlations they had observed in the items they had learned about). WC learners outperformed WN learners, demonstrating that there are performance benefits for learners who make within-category comparisons when relational structure must be highlighted. BC learners performed significantly worse than BN learners. It is typically assumed that making comparisons should help learning; however, in this case, making comparisons was actually detrimental to learning. This is likely because between-category comparisons led learners to focus on the wrong information (feature-value differences across items, which were not as helpful here without an understanding of relational structure). The assumption, especially in the analogical-reasoning literature, has been that two is better than onedthat comparing is better than not comparing at all. This assumption held up when within-category comparisons were considered relative to no comparisons (e.g., Alfieri et al., 2013; Doumas & Hummel, 2004; Gentner et al., 2003; Gentner & Namy, 1999) and when there was no manipulation of the degree to which learners could compare and the manipulation was by item order (e.g., Kornell & Bjork, 2008). However, these results show that comparison effects are more complicated than they initially seemed. There are circumstances where making comparisons will lead learners to focus on the wrong information and leave them worse off than they were had they not made any comparisons in the first place. On the other hand, the novel classification test results did not confirm all the predictions. Consistent with the second prediction, there was a main effect of item order. There was a trend consistent with the third prediction (BC < BN by 0.08), though floor effects made it difficult to see a reliable difference as was found in the higher-performance feature-pairs classification test.

70

Erin Jones Higgins

The surprising result of the novel classification test was the failure to find the predicted advantage of comparisons for the within-category groups (WC > WN). This result is not consistent with prior work showing within-category-comparison benefits relative to learning without comparisons (e.g., Doumas & Hummel, 2004; Gentner et al. 2003; Gentner & Namy, 1999). The most plausible explanation is that WN learners took advantage of the order of items to make within-category comparisons during the long display times (almost 18 s). The large advantage of item order for the no-comparison groups (WN > BN) suggests the WN learners were taking advantage of the order of items to make effective comparisons. This experiment demonstrated both the benefits and costs of making different types of comparisons during learning. The predicted finding that when relational structure must be learned, learning through betweencategory comparisons leads to worse performance than learning with no comparisons (BC < BN) is novel and provides strong evidence in favor of the highlighter hypothesis. The finding that learning through withincategory comparisons leads to better performance than learning with no comparisons (WC > WN) suggests the importance of providing learners with the opportunity to explicitly compare across items (at least when it comes to learning specific relational information).

6. IMPLICATIONS AND CONCLUSIONS Comparisons have been suggested as central to category learning (e.g., Spalding & Ross, 1994), yet very few studies have examined the specific effects of comparisons in learning categories. Recent work demonstrates that the type of active processing one engages in during category learning affects what is learned (e.g., Anderson et al., 2002; Chin-Parker & Ross, 2002, 2004; Jones & Ross, 2011; Yamauchi & Markman, 1998), suggesting the importance of considering how different types of comparisons influence what information learners focus on and acquire. Given that people are constantly bringing to mind earlier examples when learning new things and that different types of comparisons occur all the time through one’s own thought processes, in the classroom, and through interactions with other people, it is essential that these effects are addressed in theories of category learning and use and more broadly in the literature. This chapter proposed the highlighter hypothesis as a framework for thinking about how the type of comparison one makes can influence what information is learned about a set of categories. The central idea

Comparisons and Category Learning

71

motivating this hypothesis is that each type of comparison highlights different category information. Between-category comparisons highlight (alignable) feature-value differences between categories. Within-category comparisons highlight commonalities as well as the relational structure shared across items from the same category. Evidence consistent with the highlighter theory was demonstrated through a series of experiments.

6.1 Alternative Explanations There are multiple, plausible alternative explanations for the results of the experiments reported here, and certainly there are additional factors that need to be considered when developing a comprehensive understanding of how comparisons affect learning. These alternative accounts do not challenge the idea that different types of comparisons lead to differences in what is learned, but instead challenge the characterization of the differences between materials used in these experiments. Two of the most plausible alternative accounts are the attentional bias framework, which distinguishes the types of categories in terms of within-category and between-category similarity and an explanation that has not received as much attention in the literature on this topic, which is that prior knowledge may predict whether within-category or between-category comparisons are beneficial. 6.1.1 Attentional Bias Framework As noted above, Carvalho and Goldstone (2014a, 2014b, 2015) proposed the attentional bias framework to account for the item-order studies that have shown benefits for interleaved or blocked practice. They propose that category structure, and specifically between-category and withincategory similarity, predicts whether interleaving or blocking will be more effective for learning. Across multiple studies, the general finding is that when within-category and between-category similarity are both low, blocking items from a category together leads to higher categorization performance; when within-category and between-category similarity are both high, interleaving leads to better categorization performance (Carvalho & Goldstone, 2014a, 2014b). They argue that within-category comparisons are more helpful when there are relatively few common feature values across items from a category, as these comparisons focus learners on commonalities. Between-category comparisons are more helpful when very few feature values differ across categories, as these comparisons focus learners on differences. Zulkiply and Burt (2013) also manipulated the discriminability of

72

Erin Jones Higgins

categories and showed that interleaving is beneficial when categories are difficult to discriminate and massing is beneficial when categories are easier to discriminate. The studies presented here did not intentionally manipulate within- or between-category similarities, and that this direction provides an interesting complement to this framework. However, it is unclear what this view predicts when within-category and between-category similarities are manipulated in opposite directions (e.g., highelow). Second, a major point of earlier research (Gentner & Markman, 1995) is that similarity is not a simple idea, but that it is important to separate types of similarities (e.g., featural and relational) as well as types of differences (e.g., alignable and nonalignable). 6.1.2 Prior-Knowledge Explanation The prior-knowledge explanation is that the amount of prior knowledge one has about the categories to be learned determines the benefits of each type of comparison2. Prior knowledge may lead learners to develop expectations about the relevance of particular features or relations and may help them constrain their attention to information that is important. This explanation predicts that once learners know what pieces of information are important, between-category comparisons will be effective because learners will already know what alignable differences they should attend to during learning. On the other hand, within-category comparisons will be helpful when learners have very little prior knowledge of the categories being learned (or when their prior knowledge leads them to establish the wrong expectations). Prior knowledge certainly influences category learning. The highlighter hypothesis acknowledges this influence indirectly in terms of what information the learner needs to acquire, but perhaps a more specific analysis of prior-knowledge effects is needed. Although the prior-knowledge explanation might generally predict the results reported here (under the assumption that participants enter the study with accurate expectations for the aliens but not for the machines), it is not clear how this explanation would predict the finding that between-category comparisons can hurt learning relative to no comparisons when relational structure needs to be learned. 2

Thanks to Andrei Cimpian for pointing out this possibility.

Comparisons and Category Learning

73

6.1.3 Summary of Alternative Explanations There are plausible alternative explanations for the results of the experiments reported here. Regardless, the critical points made here and supported through the experiments presented, are that (1) comparison typeddefined here as within-category or between-categorydplays a major role in determining what information learners acquire and (2) frameworks that seek to explain how comparisons and item ordering affect category learning should consider taking into account feature-based/relational category structure as a factor, as it appears to be important for distinguishing when particular learning tasks or types of comparisons are helpful for learners. Future work should consider how the other factors discussed here (prior knowledge and between- and within-category similarities) fit in with the feature-based/ relational category structure factor. It is possible that these factors have complementary effects on the benefits of each type of comparison. For example, perhaps when learning categories where relational structure is important, having prior knowledge may reduce the benefits for withincategory comparisons while having no prior knowledge may lead to larger benefits for within-category comparisons. It is also possible that withincategory and between-category similarities interact with the predictions of the highlighter hypothesis in interesting ways.

6.2 Comparisons in the Real World Comparisons occur in many contexts during learning. Teachers may contrast two examples in order to illustrate a point. Through interactions with our surroundings, we are exposed to similar items from the same or different categories (e.g., a cat and a small dog). Learners may explicitly look for related examples, a common occurrence when solving math problems at the end of a textbook chapter. Sometimes, people may try to remember a similar (in some sense) earlier episode or example. Other times, they may be reminded of some earlier episode or example without intending to. All these cases allow learners to compare and make use of another case. The effects of comparisons depend upon both what is compared and how the comparison is made. This framework is intended to highlight that the effects of comparing between-category or within-category depend on the type of category being learned. Given that learners do not always have control over the order in which they see items, they may end up making a within-category comparison (or between-category comparison) when the other type of comparison would have been more useful. In addition, when learners intentionally call to mind an item they were

74

Erin Jones Higgins

exposed to earlier to help classify a current one, they may be misled inadvertently. For example, there is evidence that people will not always make the right types of within-category comparisons, as novices tend to focus on superficial similarities (e.g., one sees two math problems, which are both about car mechanics working on cars and assumes they are part of the same category of problems even if the relational structure of each is different; Chi et al., 1981; Ross, 1984, 1987). Prior research and the findings presented here suggest that learners may need some help determining how to learn a new concept. They may not know the best item order to study or the types of comparisons that will lead to the best understanding of the underlying structure of a category. For instance, they may need a teacher or textbook to guide them to make appropriate comparisons. Vendetti et al. (2015) summarize the research on the instructional supports that can facilitate analogical reasoning in the classroom, and research in this area is ongoing. For instances where interleaving is helpful, Rohrer et al. have shown that in the context of a math classroom, students benefit from receiving particular orders of items on homework assignments. In multiple studies with seventh grade math students, mixing up different types of math problems on homework assignments led to large gains in performance on a later math test relative to the typical practice of including a single problem type on a homework assignment (Rohrer, Dedrick, & Burgess, 2014; Rohrer, Dedrick, & Stershic, 2015). In future work, these findings should be integrated into a tool or set of guidelines for education practitioners and students, which provides guidance for how best to teach concepts given the types of information students need to learn about those concepts (e.g., the best way to order examples, what types of comparisons would be most helpful).

6.3 Conclusions This chapter synthesized a large body of research from the analogicalreasoning, problem-solving and categorization domains to generate the highlighter hypothesis, which was effective in predicting the effects of different types of comparisons on category learning. A series of experiments showed that the benefits of each type of comparison depended on the type of information learners needed to focus on in order to determine category membership. Within-category comparisons were more beneficial when relational structure had to be learned while between-category comparisons were more beneficial when learners could rely on individual feature-values to determine category membership. This is a promising first

Comparisons and Category Learning

75

step in understanding how comparisons affect what information learners acquire about categories. Given the centrality of comparisons in category learning and the clear complexity of the issue, efforts should be made to continue examining these comparison effects in further detail and continuing to integrate the numerous frameworks that exist to explain how comparisons affect learning.

ACKNOWLEDGMENTS I would like to acknowledge Brian Ross, who provided invaluable guidance throughout my graduate training, served as my dissertation advisor, and provided thoughtful feedback and suggestions for this chapter. I would like to acknowledge Aaron Benjamin, Andrei Cimpian, Kara Federmeier and John Hummel for their comments and feedback on the interpretation of the experiments presented here. I would also like to thank Robert Molitor, Audrey Merz, Kaitlin Costello and Brandon Mitchell for their assistance with data collection. The experiments reported here were previously reported by the author in a dissertation completed at the University of Illinois, UrbanaeChampaign.

REFERENCES Alfieri, L., Nokes-Malach, T. J., & Schunn, C. D. (2013). Learning through case comparisons: A meta-analytic review. Educational Psychologist, 48, 87e113. Anderson, A. L., Ross, B. H., & Chin-Parker, S. (2002). A further investigation of category learning by inference. Memory and Cognition, 30, 119e128. Birnbaum, M. S., Kornell, N., Bjork, E. L., & Bjork, R. A. (2013). Why interleaving enhances inductive learning: The roles of discrimination and retrieval. Memory and Cognition, 41, 392e402. Carvalho, P. F., & Goldstone, R. L. (2014a). Effects of interleaved and blocked study on delayed test of category learning generalization. Frontiers in Psychology, 5, 1e10. Carvalho, P. F., & Goldstone, R. L. (2014b). Putting category learning in order: Category structure and temporal arrangement affect the benefit of interleaved over blocked study. Memory and Cognition, 42, 481e495. Carvalho, P. F., & Goldstone, R. L. (2015). What you learn is more than what you see: What can sequencing effects tell us about inductive category learning? Frontiers in Psychology, 6, 1e12. Chi, M. T. H., Feltovich, P. J., & Glasser, R. (1981). Categorization and the representation of physics problems by experts and novices. Cognitive Science, 5, 121e152. Chin-Parker, S., & Ross, B. H. (2002). The effect of category learning on sensitivity to within-category correlations. Memory and Cognition, 30, 353e362. Chin-Parker, S., & Ross, B. H. (2004). Diagnosticity and prototypicality in category learning: A comparison of inference learning and classification learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 216e226. Doumas, L. A. A., & Hummel, J. E. (2004). Structure mapping and the predication of novel higher-order relations. In Proceedings of the Twenty-Second Annual Conference of the Cognitive Science Society (pp. 333e338). Duncker, K. (1945). On problem solving. Psychological Monographs, 58. Whole No. 270. Erickson, J. E., Chin-Parker, S., & Ross, B. H. (2005). Inference and classification learning of abstract coherent categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 86e99.

76

Erin Jones Higgins

Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155e170. Gentner, D., & Kurtz, K. (2005). Learning and using relational categories. In W. K. Ahn, R. L. Goldstone, B. C. Love, A. B. Markman, & P. W. Wolff (Eds.), Categorization inside and outside the laboratory. Washington, DC: APA. Gentner, D., Loewenstein, J., & Thompson, L. (2003). Learning and transfer: A general role for analogical encoding. Journal of Educational Psychology, 95, 393e408. Gentner, D., & Markman, A. B. (1994). Structural alignment in comparison: No difference without similarity. Psychological Science, 5, 152e158. Gentner, D., & Markman, A. B. (1995). Similarity is like analogy: Structural alignment in comparison. In C. Cacciari (Ed.), Similarity in language, thought, and perception (pp. 111e147). Brussels: Brepols. Gentner, D., & Namy, L. (1999). Comparison in the development of categories. Cognitive Development, 14, 487e513. Gick, M., & Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15, 1e38. Goldstone, R. L., Day, S., & Son, J. Y. (2010). Comparison. In B. Glatzeder, V. Goel, & A. von M€ uller (Eds.), Towards a theory of thinking: Vol. II. On thinking (pp. 103e122). Heidelberg, Germany: Springer Verlag GmbH. Hammer, R., Bar-Hillel, A., Hertz, T., Weinshall, D., & Hochstein, S. (2008). Comparison processes in category learning: From theory to behavior. Brain Research, 1225, 102e118. Higgins, E. J., & Ross, B. H. (2011). Comparisons in category learning: How best to compare for what. In Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society. Imai, M., Gentner, D., & Uchida, N. (1994). Children’s theories of word meaning: The role of shape similarity in early acquisition. Cognitive Development, 9, 45e75. Jones, E. L., & Ross, B. H. (2011). Classification versus inference contrasted with real-world categories. Memory and Cognition, 39, 764e777. Kang, S. H. K., & Pashler, H. (2012). Learning painting styles: Spacing is advantageous when it promotes discriminative contrast. Applied Cognitive Psychology, 26, 97e103. Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories: Is spacing the “enemy of induction”. Psychological Science, 19, 585e592. Kotovsky, L., & Gentner, D. (1996). Comparison and categorization in the development of relational similarity. Child Development, 67, 2797e2822. Kurtz, K. H., & Hovland, C. I. (1956). Concept learning with differing sequences of instances. Journal of Experimental Psychology, 51, 239e243. Loewenstein, J., Thompson, L., & Gentner, D. (2003). Analogical learning in negotiation teams: Comparing cases promotes learning and transfer. Academy of Management Learning and Education, 2, 119e127. Markman, A. B., & Gentner, D. (2000). Structure mapping in the comparison process. American Journal of Psychology, 113, 501e538. Markman, A. B., & Ross, B. H. (2003). Category use and category learning. Psychological Bulletin, 129, 592e613. Medin, D. L., Goldstone, R. L., & Gentner, D. (1993). Respects for similarity. Psychological Review, 100, 254e278. Namy, L. L., & Clepper, L. E. (2010). The differing roles of comparison and contrast in children’s categorization. Journal of Experimental Child Psychology, 107, 291e305. Namy, L. L., & Gentner, D. (2002). Making a silk purse out of two sow’s ears: Young children’s use of comparison in category learning. Journal of Experimental Psychology: General, 131, 5e15. Rehder, B., & Ross, B. H. (2001). Abstract coherent categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 1261e1275.

Comparisons and Category Learning

77

Rittle-Johnson, B., & Star, J. R. (2007). Does comparing solution methods facilitate conceptual and procedural knowledge? An experimental study on learning to solve equations. Journal of Educational Psychology, 99, 561e574. Rittle-Johnson, B., & Star, J. R. (2009). Compared to what? The effects of different comparisons on conceptual knowledge and procedural flexibility for equation solving. Journal of Educational Psychology, 101, 529e544. Rohrer, D., Dedrick, R. F., & Burgess, K. (2014). The benefit of interleaved mathematics practice is not limited to superficially similar kinds of problems. Psychonomic Bulletin and Review, 21, 1323e1330. Rohrer, D., Dedrick, R. F., & Stershic, S. (2015). Interleaved practice improves mathematics learning. Journal of Educational Psychology, 107, 900e908. Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics practice problems boosts learning. Instructional Science, 35, 481e498. Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7, 1573e1605. Ross, B. H. (1984). Remindings and their effects in learning a cognitive skill. Cognitive Psychology, 16, 371e416. Ross, B. H. (1987). This is like that: The use of earlier problems and the separation of similarity effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 629e639. Spalding, T. L., & Ross, B. H. (1994). Comparison-based learning: Effects of comparing instances during category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1251e1263. Taylor, K., & Rohrer, D. (2010). The effects of interleaved practice. Applied Cognitive Psychology, 24, 837e848. Vendetti, M. S., Matlen, B. J., Richland, L. E., & Bunge, S. A. (2015). Analogical reasoning in the classroom: Insights from cognitive science. Mind, Brain, and Education, 9, 100e106. Vlach, H. A., Ankowski, A. A., & Sandhofer, C. M. (2012). At the same time or a part in time? The role of presentation timing and retrieval dynamics in generalization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38, 246e254. Wahlheim, C. N., Dunlosky, J., & Jacoby, L. L. (2011). Spacing enhances the learning of natural concepts: An investigation of mechanisms, metacognition, and aging. Memory and Cognition, 39, 750e763. Yamauchi, T., & Markman, A. B. (1998). Category learning by inference and classification. Journal of Memory and Language, 39, 124e148. Zulkiply, N., & Burt, J. S. (2013). The exemplar interleaving effect in inductive learning: Moderation by the difficulty of category discriminations. Memory and Cognition, 41, 16e27.

CHAPTER THREE

Progress in Modeling Through Distributed Collaboration: Concepts, Tools and CategoryLearning Examples Andy J. Wills1, Garret O’Connell, Charlotte E.R. Edmunds and Angus B. Inkster Plymouth University, Plymouth, United Kingdom 1 Corresponding author: E-mail: [email protected]

Contents 1. Introduction 2. Concepts 2.1 The Grid 2.2 Canonical Independently Replicated Phenomena 2.3 Canonical Independently Replicated Phenomenon Registration 2.4 Free, Open-Source Model Implementations 2.4.1 2.4.2 2.4.3 2.4.4

Open-Source Software Archival Storage Cross-Compatibility Summary

88 89 90 90

2.5 Simulation Publication

91

2.5.1 A Technical Note About Stimulus Representations

91

2.6 Summary 3. Introduction to catlearn 3.1 Why R? 3.2 Stateful List Processors 4. Examples 4.1 ALCOVE Model Implementation

92 92 93 94 94 95

4.1.1 Description of ALCOVE 4.1.2 Implementation of ALCOVE

95 100

4.2 Proto-ALCOVE Model Implementation 4.3 Derivation of Canonical Independently Replicated Phenomena 4.3.1 Category Size

102 103 103

4.4 Canonical Independently Replicated Phenomena Registration 4.5 Input Representation Archive 4.6 Simulation Archive

Psychology of Learning and Motivation, Volume 66 ISSN 0079-7421 http://dx.doi.org/10.1016/bs.plm.2016.11.007

80 81 82 83 87 88

© 2017 Elsevier Inc. All rights reserved.

105 105 106

79

j

80

Andy J. Wills et al.

4.7 Ordinal Adequacy Test 4.8 The Grid 4.9 Other Examples 5. Overview and Conclusion 5.1 Contributing to catlearn 5.2 Conclusion Acknowledgments References

107 109 109 110 110 111 112 112

Abstract Formal modeling in psychology is failing to live up to its potential due to a lack of effective collaboration. As a first step towards solving this problem, we have produced a set of freely available tools for distributed collaboration. This article describes those tools and the conceptual framework behind them. We also provide concrete examples of how these tools can be used. The approach we propose enhances, rather than supplants, more traditional forms of publication. All the resources for this project are freely available from the catlearn website http://catlearn.r-forge.r-project.org/.

1. INTRODUCTION A formal psychological model is one that unambiguously specifies transformations from independent variables to dependent variables (Wills & Pothos, 2012). For example, in category learning, category structure might be one independent variable and classification accuracy might be one dependent variable (see Fig. 1). Wills and Pothos (2012) argued that: (1) formal models are important because they allow unambiguous comparison of the relative adequacy of theories, but that (2) little progress had been made because the comparisons had been too narrow. For example, Smith and Minda (2000) compared the performance of two models on variants of just one experiment. The route to faster progress in formal modeling, according to Wills and Pothos (2012), is to compare models across a much broader set of phenomena. Over the last few years, we have been trying to put that idea into action. We’ve concluded that the goal is not achievable by any individual or small group within a reasonable amount of time (e.g., one career). Collaboration would solve this problem, but a suitable framework for efficient collaboration has been lacking until now. In the sections that follow, we introduce a conceptual framework, and a set of practical, freely available tools, to support efficient distributed collaboration through technological means. We illustrate our ideas with examples from the field we know best, category learning, but

81

Progress in Modeling Through Distributed Collaboration

1

4

2

5

3

6

Probability of Error

(B)

(A)

0.5 0.4

Type 1 Type 2 Type 3

Type 4 Type 5 Type 6

0.3 0.2 0.1 0.0

Learning Block

Figure 1 (A) Six different ways to classify eight stimuli into two different groups; the stimuli take one of two values on each of three dimensions (color, size, shape). (B) Mean errors in learning these category structures from feedback, as a function of amount of feedback (learning block). Reproduced from Wills, A. J., & Pothos, E. M. (2012). On the adequacy of current empirical evaluations of formal models of categorization. Psychological Bulletin, 138, 102e125. Copyright 2012 by American Psychological Association.

the ideas are general purpose and can be applied to any situation where formal models are evaluated against known phenomena. In providing this framework and these tools, we have lowered the “cost of entry” for collaboration. Our hope is that, by so doing, we will encourage others to adopt distributed collaboration as a central research methodology in the formal modeling of psychological processes. The form of distributed collaboration we envisage would be a cultural shift for psychology. Such a shift would be desirable, as evidenced by how well distributed collaboration works in other fields (e.g., software development). However, the shift will only happen if distributed collaboration is compatible with the traditional metrics by which researchers gain employment, promotion, and esteemdan issue we return to throughout the article. Our central point, though, is that the formal modeling of mental processes is simply too large a project for any one person or lab. The choice is between collaboration or failure.

2. CONCEPTS We propose, and have initiated, an open archive of formal psychological models, plus independently replicated data sets against which to test them and archival records of those simulations. The structure and content of this open archive is based around five central concepts, which we discuss

82

Andy J. Wills et al.

below: the “Grid” (Section 2.1), Canonical Independently Replicated Phenomena (CIRP, pronounced “syrup,”1 Section 2.2), open CIRP registration (Section 2.3), open-source model implementations (Section 2.4) and open simulation publication (Section 2.5), analogous to open data publication (e.g. Morey et al., 2016).

2.1 The Grid

M & S (1978)

1

COVIS

S, H & J (1961) 1

proto-ALCOVE

ALCOVE

The Grid is a way of representing the goal of broad comparisons of model adequacy and progress toward that goal. It is also an archival record of the success or failure of simulations of empirical phenomena. Fig. 2 illustrates one small part of the Grid. Each column of the Grid represents a distinct formal model. The number of these columns, at least within the published literature, is knowable but large. The central problem, as we will cover in more detail in Section 2.4, is that software to implement these models is either not publicly available, or available but incompatible with implementations of other models. Each row of the Grid represents an empirical phenomenon within the explanatory scope of at least one of the models. Determining what these

0

1

0 NT

Figure 2 Illustration of a small part of the “Grid.” Columns represent formal models, rows represent empirical phenomena. The cell contents indicate: ordinal success of model (1), ordinal failure of model (0) and model not tested (NT). The examples in this figure are drawn from category learning, but the concept of the Grid applies across the formal modeling of psychological processes. S, H & J (1961) ¼ Shepard et al. (1961), M & S (1978) ¼ Medin and Schaffer (1978), ALCOVE (Attention Learning COVEring map, Kruschke, 1992), proto-ALCOVE (e.g., Johansen & Palmeri, 2002), COVIS (COmpetition between Verbal and Implicit Systems, Ashby et al., 1998). Image by Andy J. Wills. CC BY 4.0.

1

As the singular of phenomena is phenomenon, the acronym CIRP is correct for both singular and plural forms, cf. sheep.

Progress in Modeling Through Distributed Collaboration

83

rows should be is a more difficult task than it might first appear, as we will discuss in Section 2.2. The matrix of cells that is defined by the rows (phenomena) and columns (models) of the Grid poses a challenge in terms of sheer scale. Even with 10 models and 10 empirical phenomena, the Grid makes explicit the need to run up to 10 models  10 phenomena ¼ 100 different simulations. Each journal article reporting work on a formal model has, perhaps, on average, 2e4 such simulations. Even if the 25 or more journal articles that would, at this rate, be needed to fill this grid2 actually existed (and they most likely do not as yet), discovering what has already been done is laborious, as no search engine indexes journals in this way. For example, if one wanted to search for simulations of Medin and Schaffer (1978) with the ALCOVE (Attention Learning COVEring map) model (Kruschke, 1992), one might use Web of Science to retrieve all papers that cite both those articles. As of 14 July 2016, there are 316 such articles. So, at a minimum, one would have to read 316 abstracts to determine what simulations had been performed. In practice, the abstracts probably would not be conclusive in some cases, and one would have to retrieve the full text as welldand all this for one cell of the Grid. In summary, the Grid highlights the extent of the problem facing the formal modeling of psychological processes. The columns of the Grid (formal models) are not publicly available (or have other problems, see Section 2.4). Determining the rows of the grid (the phenomena) is not straightforward, as we discuss in Section 2.2. The cells of the Grid (simulations of phenomena) are a hurdle in terms of both numerosity and lack of an appropriate record system, as we have just discussed.

2.2 Canonical Independently Replicated Phenomena We propose that each row of the Grid should be a Canonical Independently Replicated Phenomenon (CIRP). In this subsection, we first justify and discuss the criterion of independent replication, and then explain what, by our definition, makes an independently replicated phenomenon canonical. We also consider some consequences of adopting the CIRP approach. We propose that each row of the Grid should be an independently replicated phenomenon because modeling nonreal results wastes time. The waste of effort increases as the number of models increasesd10 models equals up to 2

100 simulations/4 simulations per paper ¼ 25 papers.

84

Andy J. Wills et al.

10 wasted simulations for each nonreal result. Nonreal results also potentially lead to an inaccurate assessment of relative adequacyda model should not be penalized for failing to accommodate a nonreal result, yet inclusion of nonreal results means this may happen. Thus, to guard against the wasted effort and potentially misleading conclusions that come from simulating nonreal results, we propose a minimum standard that relative adequacy assessments of formal models be restricted to independently replicated phenomena. Independent replication is beginning to become accepted as a data quality standard in psychology (e.g., Ledgerwood, 2014; Pashler & Wagenmakers, 2012). However, a criterion of independent replication (no shared authors) is not always applied when choosing phenomena against which to evaluate formal models. For example, when the Generalized Context Model (GCM, Nosofsky, 1984) was introduced, it was evaluated against a single experiment which, at that time, had not been replicated (Shepard, Hovland, & Jenkins, 1961). More recently, in a review of experiments supportive of the COVIS (COmpetition between Verbal and Implicit Systems) model (Ashby, Alfonso-Reese, Turken, & Waldron, 1998), none of the attempted independent replications were cited (Ashby & Maddox, 2011). The intention here is not to single out these authors for criticism, but to illustrate a general issue. We are aware of relatively few cases in formal cognitive modeling where independent replication is applied as a data-quality criterion and, even in those cases, it is only applied to a subset of the phenomena examined (e.g., Love, Medin, & Gureckis, 2004). There are a couple of objections that might be raised against independent replication as a minimum data-quality standard for model evaluation. The first is that attempts to replicate are rare in psychology (Makel, Plucker, & Hegarty, 2012) and, where replications are attempted, they are often unsuccessful (Open Science Collaboration, 2015).3 Thus, one fear is that applying a criterion of independent replication would leave no phenomena to examine within a particular domain. The rational response to such a fear is to determine whether one’s fear is justified and, if it is, suspend formal modeling of that domain until such time as demonstrably real phenomena become available. Some domains of psychology contain independently replicated phenomena, as illustrated later in this section, and also in Section 4.3.

3

For a more extended discussion of this latter claim, see also Anderson et al. (2016) and Gilbert, King, Pettigrew, and Wilson (2016).

Progress in Modeling Through Distributed Collaboration

85

A second objection to using independent replication as a minimum dataquality standard is that it’s hard to define what counts as a replication. Pashler and Harris (2012) make a distinction between direct and conceptual replications and are critical of the latter. We agree with their central point but note that, in practice, there are some problems in distinguishing between a direct replication and a conceptual one. For example, if a direct replication is a study that is the same as the original study in every respect, then no replications are direct (e.g., they test different participants). Defining a direct replication as one identical in every respect except that they sample different members of the same tightly defined population (e.g., US undergraduates) is also likely to be prohibitively restrictive. The concept of a CIRP is a pragmatic response to the perhaps inevitable uncertainty about what counts as a replication. The search for CIRP should, we argue, favor independent replications that are as close to being direct as the published literature will allow. The determination of CIRP is an area in which the concept of ordinal success (Wills & Pothos, 2012) is useful. No two studies in psychology have ever produced literally the same results at a quantitative level. In the CIRP approach, a replication is considered to be successful if the key results are statistically significant and in the same direction as the original study. Another important aspect of defining a CIRP is to accept that some changes to stimuli or procedure may lawfully affect otherwise robust results. For example, Medin and Schaffer (1978) is one of the most highly cited publications in the history of categorization research.4 The Experiment 2 data set of that paper has been independently replicated a number of times (Blair & Homa, 2003; Minda & Smith, 2002; Nosofsky, Kruschke, & McKinley, 1992; Nosofsky, Palmeri, & McKinley, 1994; Rehder & Hoffman, 2005). Although there is debate about the generality of the ordinal result across different stimulus sets and procedures (e.g., Nosofsky, 2000; Smith & Minda, 2000), there is consensus that the ordinal result is robust within the experimental conditions employed by Medin and Schaffer (e.g., Minda & Smith, 2002). Experiment 2 of Medin and Schaffer (1978) is thus an independently replicated phenomenon in category learning. It may have boundary conditions (it’s likely most psychological phenomena do) but, when you run the experiment the way the original authors did, you can expect to get the results they did. This is a solid basis

4

Web of Science reports 1457 citations as of 15 June 2016.

86

Andy J. Wills et al.

for the relative adequacy assessment of formal models. Note that, in the CIRP approach, the emphasis is on real effects, rather than effects that are both real and important. In a relative-adequacy assessment (Wills & Pothos, 2012), an effect is important if at least one current or future model cannot accommodate it. Therefore, it is seldom possible to know in advance whether an effect is important in this sense. The CIRP approach has a couple of necessary consequences for model evaluation. First, it places relative adequacy assessments of formal models slightly behind the empirical front line; “off the razor’s edge,” as Medin (2011) put it. The necessary lag this introduces is probably a good thingdthis kind of broad model comparison is time consuming and one needs to ensure it is signal, rather than noise, that is being fitted to the models. The second necessary consequence of the CIRP approach is that there will always be more published experiments than CIRP. Unless the original experiment and its replication(s) are literally identical, this raises an issuedto which of the replications does one fit a model? One possibility is to fit each replication separately. Although such an approach has obvious advantages in terms of completeness, it has disadvantages in terms of clarity and efficiency. The CIRP approach involves identifying a replication that is representative of what is known empirically and is sufficiently well specified in terms of psychological stimulus representation, and empirical results, to allow meaningful modeling. This is one sense in which a CIRP is a Canonical Phenomenon. Deciding which experiment among replications is canonical is a matter of judgment and a judgment that is unlikely to receive consensual approval in every case. The CIRP approach does not solve this problem, but does require that the introduction of a CIRP is accompanied by an explanation of the choices made (so that the choices are transparent and open to rational challenge). One strength of the CIRP approach is that it separates empirical phenomena from the interpretation placed upon those phenomena. A replication that is successful but whose authors favor a different explanation to the original study is still a successful replication. The theoretical legwork comes in the form of comparing formal models to phenomena. Separating phenomena from formal models should make it easier for groups with different theoretical approaches to work in a distributedcollaborative manner.

Progress in Modeling Through Distributed Collaboration

87

2.3 Canonical Independently Replicated Phenomenon Registration In Section 2.2, we proposed that each row of the Grid should comprise a CIRP. The concept discussed in the current subsection is that CIRP should be recorded in a freely available store of target data sets that is open not only for inspection, but also for addition and improvement by the community.5 The existence of such a store is important for distributed collaboration on the formal modeling of psychological processes. This is because relativeadequacy assessments of models require that the models have been assessed against a commonly defined set of phenomena. If this is not apparent, imagine the alternative situation where model A is assessed against phenomena 1e3 and is found to accommodate 1 and 2 but not 3. Model B is assessed against phenomena 4e6 and is found to accommodate all three phenomena. Which is the more adequate model, A or B? The answer is one cannot tell, because model B has not been assessed against phenomena 1e3 and model A has not been assessed against phenomena 4e6. We further propose that this freely available store of CIRP should have a standard format, not only for the recording of target data, but also for concise documentation of each data set. Such documentation would need to specify, at a minimum, (1) the source of the data set, (2) references that establish independent replication of the phenomenon and (3) reasons for choosing this particular data set as the CIRP from among the set of replications. We provide a more detailed specification of our proposed documentation format in Section 4.4. Such documentation is brief and is not intended as a replacement for more fulsome publications in traditional outlets. Rather, our expectation is that CIRP registrations would be one tangible product of synthetic review articles published in high-quality journals. In the same way that data publication increases the citation rate of empirical reports (Piwowar, Day, & Fridsma, 2007), CIRP registration could act to increase the citation rate of review articles, as the review, not the CIRP documentation, would be the definitive reference for the means by which the CIRP had been determined.

5

One must also accept the possibility that a CIRP might be deprecated in the event a phenomenon turns out to be a false positive despite being independently replicated.

88

Andy J. Wills et al.

2.4 Free, Open-Source Model Implementations One of the main sources of inefficiency in formal modeling is the lack of publicly available model implementations. For example, at the time of writing, the majority of the formal models of category learning had no publicly available implementation.6 The amount of effort this wastes is large. For example, most of the work involved in writing Edmunds and Wills (2016), which established that COVIS could accommodate the result shown in Fig. 1, came from having to implement the COVIS model. The lack of a publicly available implementation increases the entry cost to using a formal model and hence disincentivizes the exploration of models by those outside the research group that created them. We thus propose that another key resource for distributed collaboration is a freely accessible store of model implementations. Below, we introduce and discuss the three attributes we consider essential in such an archive: open-source software (Section 2.4.1), archival storage (Section 2.4.2), and cross-compatibility (Section 2.4.3). 2.4.1 Open-Source Software It is important for distributed collaboration that model implementations be open source. Open-source software can be examined by others, so they can understand how it works. The fact they can do this also increases the chances that bugs are detected. Just as important, the free availability of source code means others can build upon and improve what the author has provided, either in terms of improving the efficiency or flexibility of the implementation, or by using components of the code in their own development of other formal models. This approach to software development is often known as “free software,” where “free” refers to freedom rather than cost; it reflects the freedom of users to do what they wish with the code (Stallman, 2015; Williams, 2002). The ALCOVE model (Kruschke, 1992) is an example of good practice in this regard because its source code is publicly available (Kruschke, 1991). The source code provided by Kruschke is an example of good practice for another reason toodit requires only open-source tools (e.g., Stallman & The GCC Team, 2016) to turn the source code into a working model implementation. The importance and strength of this open-tools approach 6

Examples include: COVIS (Ashby et al., 1998), ATRIUM (Erickson & Kruschke, 1998), KRES (Rehder & Murphy, 2003).

Progress in Modeling Through Distributed Collaboration

89

can be illustrated by comparison to the alternativeduse of proprietary tools. For example, MATLAB (MathWorks, 2016) is a proprietary tool used by some modelers. Its source code is not available, so researchers cannot see for themselves how MATLAB works. They cannot modify or improve it, and they have to take on trust that it is error free. The vendor can change the way a proprietary tool operates such that things you have written no longer work and decline to sell or support the older versions of the tool for which your code did work.7 In contrast, with open-source tools, if the developers choose to change the tool in a way that causes you a problem, you can choose to continue to use the old version (the old version remains freely available indefinitely). Incidentally, users of MATLAB may wish to consider OCTAVE (Eaton, 2014), an open-source tool that runs MATLAB code with little to no modification. In summary, we propose that a publicly accessible archive of model simulations should be open source, both in terms of the model code itself and in terms of requiring only open-source tools to turn the source code into a working model implementation. 2.4.2 Archival Storage Although Kruschke’s ALCOVE implementation is commendable for being open source and based on open tools, there are other respects in which it could be improved. For example, Kruschke’s ALCOVE implementation is only available on his personal website (Kruschke, 1991). Contrast this to research article publication. Few authors or readers of research articles would be satisfied with a publication approach that involved posting only on a personal website. One central reason for the existence of journals, and other archival systems (e.g., arXiv; Ginsparg, 1999), is the security others gain from knowing that the material upon which they base their own research cannot disappear at the whim (or demise) of the original author. Computer code should be accorded no lesser status. Specifically, our proposal is that model implementations should be stored in a way that could reasonably be considered archival. 7

This really happens. SPSS (IBM, 2016) will not open its own output files if they were generated before 2007 (University of Massachusetts, n.d.). Hypercard programs cannot be run on modern hardware (Oren, 2004). Visual Basic 6 support was withdrawn (Microsoft, 2008), despite a long public campaign (Ramel, 2016). In contrast, ALCOVE was written on a discontinued computer architecture and operating system (Markoff, 1989; Wikipedia, 2016b), yet compiles with minimal modification on modern machines.

90

Andy J. Wills et al.

We further propose that the storage should not only be archival, but also have version control (e.g., Collins-Sussman, Fitzpatrick, & Pilato, 2011). Version control means that any changes to the code since publication are recorded, that each update is assigned a unique version number, and that all versions remain publicly available. This means that researchers are able to say that their simulation was based on a particular numbered version of a model implementation, and be confident that this version will continue to be publicly available, even if superseded by a more recent version. This is essential for the reproducibility of published simulations. In summary, our proposal is that model implementations should be stored in a system that could reasonably be considered to be archival, and which has effective version control. One example of good practice in this regard is the SUSTAIN model (Supervised and Unsupervised STratified Adaptive Incremental Network; Love et al., 2004), as archived by Gureckis (2014). Another is the DIVA model (Kurtz, 2007), as archived by Conaway (2016). 2.4.3 Cross-Compatibility Two model implementations, even those written in the same language, can operate in a sufficiently different way that their operation cannot be easily combined in a study of relative model adequacy. For example, Kruschke’s ALCOVE code (Kruschke, 1991) reads parameters and stimulus inputs from a text file, working through the presented stimuli in order, and outputting each response to another text file. In contrast, the code for a different model used by O’Connell et al. (2016) generates the stimulus representations as part of the main code. Thus, in the former case, the model and data sets are separated, while in the latter they are integrated. These kinds of differences substantially increase the effortfulness of assessing different models against a common set of phenomena. The solution is for different models to have, as far as is possible, a common structure for inputs and outputs. This is what we mean by cross-compatibility, and we propose that model implementations should be cross-compatible. 2.4.4 Summary Distributed collaboration on the determination of relative model adequacy would be facilitated by the existence of a free, publicly available, opensource, version-controlled archive of cross-compatible model implementations. As in the case of CIRP registration, such an approach can only get traction if it supplements rather than supplants traditional measures of

Progress in Modeling Through Distributed Collaboration

91

academic esteem. It seems to us that such an archive meets that goal. In particular, it benefits those who have created these models by assisting the widespread adoption of the models by the community. This should in turn boost citations of the journal articles introducing and investigating those models. So, like CIRP registration, publicly available model implementations are intended to enhance, rather than supplant, traditional publication routes.

2.5 Simulation Publication If the rows (CIRP) and columns (models) of the Grid (Fig. 2) were freely available, there would still be a need for some mechanism by which the results of individual simulations (cells) could be archived. Given the already-discussed need for an archive of CIRP and model implementations, it seems efficient to locate a reproducible account of simulations in the same location. As in these other cases, it is important that this archive of simulations enhances rather than supplants traditional publication routes. We envisage that researchers will generate simulations in pursuit of some broader goals. For example, they may wish to compare the relative adequacy of a few models across a range of replicated phenomena. The results of this investigation would, one assumes, be the subject of peer-reviewed journal article publication in the normal way. What the archive of simulations would provide is the code necessary to reproduce the conclusion recorded in the Grid. The Grid, through the mechanism of registering simulations, also becomes a central searchable database of all relevant simulations. This should further enhance distributed collaboration and, by so doing, improve citation metrics. 2.5.1 A Technical Note About Stimulus Representations This subsection is a discussion of a technical issue that the casual reader can safely skim. The technical issue concerns the fact that formal models of cognitive processes seldom start with a retinal representation of stimuli. Rather, they assume the presence of certain perceptual processes prior to the input of the model and provide a somewhat more abstract representation of the experimental stimuli. For example, one common approach in the modeling of category learning is to represent each stimulus as a point in a multidimensional psychological stimulus space (Shepard, 1957, 1987). The input representations chosen by a modeler when simulating a given experiment are an essential component of the model specification. The simulation

92

Andy J. Wills et al.

cannot be reproduced without the stimulus representations, and different stimulus representations could lead to different conclusions. The technical question that arises is how one handles stimulus representations within an archive of simulations. One approach is to include stimulus representations in each of the individual simulation archives. However, we advocate a slightly different approach. Although different models sometimes use different input representations, it is also the case that some models are input compatible. For example, some versions of exemplar and prototype models use the same input representations and differ only in how that input is subsequently processed (e.g., Nosofsky, 1987). It therefore makes sense to publish stimulus representations separately from simulations, as simulations are specific to one modeldCIRP combination (i.e., they concern one cell of the Grid), while stimulus representations can be common to multiple cells in a given row. Separate publication of stimulus representations also provides a convenient location for brief documentation and justification of the choices made in constructing these stimulus representations. Documentation of a stimulus-representation archive would briefly explain the way in which the representation was determined, citing (where appropriate) the empirical data used in the construction of the representations (e.g., a multidimensional scaling solution).

2.6 Summary We have outlined what we see as the necessary components for distributed collaboration on formal modeling in psychology. In summary, what is needed is a free, open, documented, version-controlled archive of (1) CIRP (Canonical Independently Replicated Phenomena), (2) crosscompatible model implementations and (3) simulations.

3. INTRODUCTION TO

catlearn

In this section we introduce catlearn, the framework we have written to support distributed collaboration on formal modeling in psychology. catlearn is a free and open-source extension (“package”) of the R language (R Core Team, 2016). Like R itself, catlearn is made available under the GPL (General Public License; Free Software Foundation, 2007), which ensures the software is, and will remain, free and open source. The project title, catlearn, was once an acronym (CATegory LEARNing), but the goals of the framework have outgrown the acronym, and the word catlearn should now be treated as an arbitrary proper noun.

Progress in Modeling Through Distributed Collaboration

93

3.1 Why R? Why is catlearn based around R? R is perhaps best known to psychologists as software for statistical analysis. It provides similar functionality to other statistical analysis software used by psychologists, but also many other features.8 Of particular relevance to formal modeling, R supports nonlinear optimization (the process by which formal models are typically fitted to data). Due to R’s increasing popularity as an analysis tool for psychologists, situating catlearn within R brings the advantage of a familiar environment that contains many of the tools needed to inspect and analyze the output of model simulations. R is also easy to learn, relative to more low-level languages such as C. Another strength of R is that it provides a simple system for documentation to be incorporated within catlearn. Preceding a command by a question mark, e.g., ?t.test, brings up documentation relevant to that command. This inbuilt documentation system provides an ideal means to document the rows (CIRP), columns (model implementations) and cells (simulations) of the Grid (Fig. 2) and also to document the commands that generate input representations. R packages can also include data sets, which are loaded using the data command, e.g., data(USArrests). This means that, when combined with the documentation system, R packages are well suited to CIRP registration. A further strength of R is that it has dedicated version-controlled archival systems. Those wishing to contribute to catlearn can do so at any time using the version-controlled archive provided by R-Forge (Theußl & Zeileis, 2009). Users interested in the daily-updated development version of catlearn (the “unstable” version) can also download it from R-Forge. Users who prefer a slightly older but more fully tested version (the “stable” version) can download it from the Comprehensive R Archive Network (CRAN). CRAN is a robust version-controlled archive hosted simultaneously in over 120 different locations worldwide. One possible objection to situating catlearn within R is that R code doesn’t always run as fast as code written in some other languages. In many cases, this is not an issue from the user’s perspective as code written in R still runs instantaneously. However, model implementations may, in some cases, benefit from being written in a compiled language. This can be

8

The Comprehensive R Archive Network (R Foundation, 2016) lists 8993 packages that extend the base functions of R (20 August 2016).

94

Andy J. Wills et al.

achieved within R in a number of waysdfor example, through use of the Rcpp package (Eddelbuettel, 2013; Eddelbuettel & Francois, 2011). Rcpp allows one to write time-critical pieces of code in Cþþ and integrate them seamlessly into R. R also links to a number of other languages, including FORTRAN, Java, Python and MATLAB (via OCTAVE, Gaujoux, 2015).

3.2 Stateful List Processors The catlearn package, as described so far, provides the framework for a free, open, documented, version-controlled archive of CIRP, model implementations, input representations and simulations. One thing that the above discussion did not consider was cross-compatibility of models. To facilitate model comparisons, each model implementation should, as far as possible, require similar inputs and provide similar outputs. Of course, this correspondence cannot be complete, as formal models differ in their assumptions about input representations and have different parameters. While some differences are perhaps inevitable, it makes sense to eliminate unnecessary differences in implementation. In pursuit of that goal, model implementations within catlearn employ the stateful list processor schema. This schema is most easily described with the aid of a concrete example, which is provided in Section 4.

4. EXAMPLES In the current section, we provide some specific examples of how can be used as an archive of model implementations, CIRP and simulations. The examples are written from a user perspective, rather than a developer perspective. In other words, they assume a situation where you are using content already present in the catlearn package, rather than adding new content yourself. Adding content to catlearn is discussed in Section 5.1. Readers may find the following examples easier to follow if they have a working version of R and the catlearn package in front of them. A guide on how to install both can be found at the catlearn project website (Catlearn Research Group, 2016). Once you have installed R and catlearn, type library(catlearn) in R to make catlearn available in your current session. Typing ?catlearn then provides a brief help screen, with instructions of how to get a list of available commands. Each command then has its own help file (e.g. ?shin92). catlearn

Progress in Modeling Through Distributed Collaboration

95

The examples that follow are drawn from category learning butdto reiteratedcatlearn is a general-purpose framework for the formal modeling of psychological processes. Our examples come from category learning simply because that is the area with which we are most familiar. Our examples concern one of the central unresolved questions of category learningdare categories represented by prototypes (e.g., Reed, 1972) or by the storage of specific examples (e.g., Medin & Schaffer, 1978) or something else (e.g., Nosofsky et al., 1994)? We start by describing the implementation of an exemplar model (Section 4.1) and a prototype model (Section 4.2), within catlearn. We then derive a CIRP (Section 4.3) and discuss its documentation within catlearn (Section 4.4). Next, we discuss an input-representation archive (Section 4.5), which is used by an archived simulation of our example CIRP with a formal model (Section 4.6). After that, we introduce the use of an Ordinal Adequacy Test (OAT) to automatically evaluate whether the simulation successfully accommodates the CIRP (Section 4.7) and show how the Grid acts as a central record of simulation results (Section 4.8). Finally, we invite the reader to explore some of the other content of the catlearn package (Section 4.9).

4.1 ALCOVE Model Implementation In this section, we describe the implementation of the ALCOVE model (Kruschke, 1992) within the slpALCOVE command of the catlearn package. The first subsection is a mathematical description of ALCOVE, as some understanding of how ALCOVE works is required to understand how the slpALCOVE command works. This section can be skimmed by those already familiar with ALCOVE. For the mathematically unconfident, it should be sufficient to get the gist of this mathematical subsection. The second subsection describes the implementation of ALCOVE within catlearn. 4.1.1 Description of ALCOVE ALCOVE (Kruschke, 1992) is one of the most influential models of category learning.9 Fig. 3 summarizes its architecture. ALCOVE is a connectionist model that assumes stimuli are represented as points in a multidimensional psychological stimulus space (Fig. 4A). Thus, each stimulus is represented by a vector, which we will denote here as x. For example, 9

Web of Science reports 814 citations as of 16 June 2016.

96

Andy J. Wills et al.

B

A

h1 h2

X

α2

α1

h3

Figure 3 Architecture of the ALCOVE model (Kruschke, 1992). The gray quadrilateral is a three-dimensional depiction of a two-dimensional plane, representing a psychological stimulus space. Points h1, h2 and h3 represent the location of radial-basis (“exemplar”) units within that space. Point x represents the presented stimulus. The lettered circles are category representations, and the arrows connecting to them are variable-strength connection weights from the radial-basis units. a1 and a2 represent the attention allocated to each of the two dimensions of the space; the arrows beside these a illustrate that dimensional allocation acts to stretch and squash psychological space in ALCOVE. Image by Andy J. Wills. CC BY 4.0.

(A)

(B)

3 2 3

(D) similarity

2

1

(C)

1 distance

Figure 4 (A) Representing the similarity structure of stimuli 1, 2 and 3 in a twodimensional geometric space; in this example, the dimensions of this space are readily interpretable as size and angle. (B) Euclidean distance (distance2 ¼ x2 þ y2). (C) Cityblock distance (distance ¼ x þ y). (D) An exponential decay relationship between similarity and distance in psychological space. Reproduced from Wills, A. J., & Pothos, E. M. (2012). On the adequacy of current empirical evaluations of formal models of categorization. Psychological Bulletin, 138, 102e125. Copyright 2012 by American Psychological Association.

for stimuli varying in size and angle, one might write x ¼ (0.4 0.5), where the two values represent the psychological size and angle of the presented stimulus. Presentation of a stimulus leads to the activation of radial-basis nodes (Cheney, 1966). Radial-basis nodes, like stimulus representations, can be considered as points in stimulus space (Fig. 3). In virtually all applications of ALCOVE there is exactly one radial-basis node for each unique training

Progress in Modeling Through Distributed Collaboration

97

stimulus. Although these radial-basis nodes are often called “exemplar” nodes, this description is something of a misnomer as, in most applications, all the nodes exist before training begins. It is perhaps better to think of these radial-basis nodes as a simplification of the abstract concept behind ALCOVE, which is that there are radial-basis nodes randomly scattered across stimulus space (the “COVEring map” of ALCOVE). However one prefers to think about it, the architecture of the radial-basis layer of ALCOVE is fully specified by the matrix h, which has the same number of columns as there are radial-basis units ( j columns) and the same number of rows (i rows) as there are psychological stimulus dimensions. For example, in a simple size-angle experiment with four stimuli, the architecture of the radial-basis layer might be described as   0:4 0:8 0:8 0:4 h¼ 0:4 0:4 0:8 0:8 where each column represents the location of one training exemplar in stimulus space. ALCOVE computes the activations of each of the radial basis nodes with the following equation: 2 !q=r 3 X   r 5 ah ¼ exp4  c (1) ai hji  xi  j

i

This equation specifies that the activation of each radial-basis node is a decreasing function of its distance from the presented stimulus. Where r ¼ 2, that distance is Euclidean (Fig. 4B); where r ¼ 1, the distance is city-block (Fig. 4C). Euclidean distance is typically used for integral stimuli, city-block for separable stimuli (see Garner, 1976). Where q ¼ 1, the decreasing function is exponential (Fig. 4D); where q ¼ 2, it is Gaussian. Exponential decay is typically used (Shepard, 1987); occasionally Gaussian decay is used where stimuli are highly confusable (Ennis, 1988). Stimulus space can be uniformly contracted or expanded using c (see Fig. 5C). c is largely treated as an arbitrarily variable parameter (Wills & Pothos, 2012), although psychologically it is intended to represent cognitive discriminability or memorability of stimuli, so if information about this could be derived independently for a set of stimuli then it would constrain model fitting somewhat (minimally, c would need to be a non-decreasing function of discriminability/memorability). In related models, application to amnesic data takes this form (e.g., Nosofsky & Zaki, 1998).

98

(A)

Andy J. Wills et al.

(B)

(C)

Figure 5 (A) Geometric representation of two categories, each of four stimuli (category membership denoted by color of dot). (B) Stretching along the x axis and compression along the y axis, thereby increasing within-category similarity and decreasing betweencategory similarity. (C) Overall expansion of psychological similarity space. Reproduced from Wills, A. J., & Pothos, E. M. (2012). On the adequacy of current empirical evaluations of formal models of categorization. Psychological Bulletin, 138, 102e125. Copyright 2012 by American Psychological Association.

In Eq. (1), ai represents dimensional attention on dimension i. Dimensional attention acts as a multiplier to distance, stretching psychological space uniformly across one axis (see Fig. 5B). The model is typically initialized with equal attention to all dimensions, conventionally summing to unity, e.g., a ¼ (0.5 0.5). The activation process represented by Eq. (1) results in a vector of radialbasis node activations, e.g., ah ¼ (0.4 0.8 0.4 0.8). Radial-basis node activation propagates forward to a set of output (category) nodes, which then have activation ao. There is one output node for each category, and each radial-basis node has one variable weight connection to each output node. The weight-state of the model is thus a matrix of the form:   0 0 0 0 w¼ 0 0 0 0 which has k rows (one for each ouput node) and j columns (one for each hidden node). Activation of the output nodes is calculated with the standard connectionist equation X aok ¼ wkj ahj (2) j

The forward propagation of activation ends with a standard exponential ratio rule to convert activation to response probability   exp faoK PðKÞ ¼ P  o  (3) exp fk k

Progress in Modeling Through Distributed Collaboration

99

where f is a nonnegative response-scaling parameter. Low values of f lead to approximately probability-matching behavior (category selection probability is proportional to the ratio of output node activations). High values of f lead to approximately winner-take-all behavior (the category with the highest activation is always selected). We note in passing that this exponential ratio rule is probably a poor model of categorical decisions (Wills, Reimers, Stewart, Suret, & McLaren, 2000). In ALCOVE, learning is driven by “teacher” (t) values. The presence of a category label is represented by a teacher signal of 1; absence of a category label is typically represented by 1. The teacher is typically considered to be “humble.” This means that if the output activation is more extreme than the þ1/1 teaching value, then the output activation is used as the teaching signal. Learning of connection weights from radial-basis nodes to output nodes uses a standard summed-error term:10   Dwkj ¼ lw tk  aok ahj (4) where lw is the associative learning-rate parameter, which can range from 0 to 1. This equation acts to change connection weights in the direction that most rapidly reduces error. Attentional weights are also learned. This is achieved by the backpropagation of error (Rumelhart, Hinton, & Williams, 1986; Werbos, 1974) to the radial-basis nodes in the standard manner: X  bj ¼ ahj (5) tk  aok wkj k

This back-propagated error is then used to change the attentional weight for each dimension: X   (6) bj c hji  xi  Dai ¼ la j

where la is the attention learning-rate parameter, which again can range from zero to one. Implementations of ALCOVE constrain attentional weights to be nonnegative. ALCOVE’s attentional learning system acts to stretch and squash psychological stimulus space (Fig. 5B) in the directions that most rapidly reduce error.

10

See Le Pelley (2004) for a discussion of summed- and separate- error term equations.

100

Andy J. Wills et al.

Figure 6 ALCOVE implemented as a stateful list processor in catlearn. Image by Andy J. Wills. CC BY 4.0.

4.1.2 Implementation of ALCOVE The function slpALCOVE in the catlearn package implements ALCOVE as a stateful list processor.11 In this section, we explain how slpALCOVE works and, by so doing, also introduce the more general concept of a stateful list processorda structure that is well suited to many formal models of cognitive processes. As with all R commands, the ? query returns help documentation for slpALCOVE (type ?slpALCOVE to see it). Although complete, such documentation is typically concise rather than tutorial in nature. Where possible, one should cite, within the help documentation, another tutorial publication. In the case of slpALCOVE, the current article serves that purpose. Fig. 6 provides an illustration of the basic operating principles of the slpALCOVE stateful list processor. The slpALCOVE function takes two primary inputs from the user: st and tr. Input st (“state”) is a list containing the model parameters and the model’s initial state. For slpALCOVE, the parameters are c, r, q, phi (f), lw (lw), la (la) and h. These parameters were defined above. List st also contains the model’s initial state: w contains the initial connection weights, and alpha (a) contains the initial attentional weights. These model states were defined above. The only other entry in st is colskip, which is an

11

The slp in slpAlCOVE stands for Stateful List Processor.

Progress in Modeling Through Distributed Collaboration

101

instruction to the model implementation to ignore the first N columns in the training matrix. Object tr (“training”) is a matrix, where each row is one trial that is presented to the network. The nature of a list-processor architecture is that the model processes all of these trials and in the order they are presented. The ordering of trials (e.g., through randomization) is undertaken by other functions that generate tr, not by the list processor. The first column of tr, ctrl (“control”), is normally zero, but can be set to other values to change the mode of operation of the model mid-list. The options currently implemented in slpALCOVE are: 1 ¼ reset the model to its initial state, 2 ¼ freeze learning on the current trial. The latter option allows no-feedback test phases to be implemented in the standard way (i.e., by running the model with learning rates set to zero). The former option allows the list processor to run the same model with the same parameters on multiple runs of the experiment. This can be useful for averaging out order effects. After ctrl, there are a variable number of optional columns that can contain any numerical information the user wishes. Typically, these will be used to contain information such as experimental condition, block and trial. These columns are ignored by the list processor, it just needs to be told how many columns there are to ignore, using the colskip parameter; colskip should be set to equal the number of optional columns, plus one. Setting colskip incorrectly can lead to unpredictable behavior. After these optional columns, the next set of columns contain the input representation, x, and then the teaching signals, t, both of which were defined above. The final set of columns, m, should in most circumstances be set to zero. Setting an m column to 1 indicates that, on that trial, that stimulus dimension was not presented. This is useful in some cases where stimuli have multiple components, not all of which are presented on every trial. In addition to st and tr, the slpALCOVE function takes a number of other optional input arguments. These set the operating mode for the model. Most models have a number of different variants, and ALCOVE is no exception. If these options are not set (as in the example in Fig. 6), slpALCOVE runs the version of ALCOVE specified in the previous section. For the sake of brevity, the alternative options are not discussed here, but are documented in ?slpALCOVE. So, slpALCOVE takes tr and st as input. The simulation is run by creating tr and st, and then entering the command slpALCOVE(st,tr). A concrete example follows later in this article. When the simulation completes it

102

Andy J. Wills et al.

returns a list, as illustrated in Fig. 6. The list has three components: p, w and alpha. Component p is a matrix that has one column for each category unit of the model and one row for each trial of the simulation. The numbers in the matrix are the predicted probability of each category response on each trial. Matrix p can easily be recombined with the training matrix, tr, for easier interpretation of the model’s predictions.12 This combined matrix can then be analyzed using the same techniques as used to analyze participant data in R (e.g., through the use of the aggregate command). The other two parts of the output list are w and alpha, which give the values of the connection and attentional weights at the end of the simulation. This information can be used to explore how the model has learned the category structure with which it was presented.13 The returning of final w and alpha is the property of slpALCOVE that leads us to describe it as a stateful list processor. In other words, it returns not only the model’s predictions but also its (final) state. The stateful property of the implementation is important because it avoids an otherwise serious limitation of the list-processor architecture. Specifically, in nonstateful list processors, the input to a model cannot be made contingent on its previous output. In category learning, the most common example in which input needs to be contingent on output is in training to criterion. In modeling training to criterion, one might run the model until the probability of a correct response exceeds some threshold (e.g., 0.99). A list processor cannot do this because the number of training items has to be specified from the outset (it’s the number of rows in the tr matrix). A stateful list processor can handle training to criterion because it returns its final state. For example, to model training to criterion where the criterion is checked once per block of trials, one presents a single block of trials to slpALCOVE and checks whether the returned probabilities exceed the criterion. If they do not, one sets the initial state of the model (w, alpha) to the values returned by slpALCOVE and runs another block of trials. Thus, the model can “pick up where it left off” because it can be returned to the state it was in at the end of the last block.

4.2 Proto-ALCOVE Model Implementation Although ALCOVE is typically considered as an exemplar-like model, it is also possible to use it as a prototype model (e.g., Johansen & Palmeri, 2002). 12 13

out