The Use of Psychological Testing for Treatment Planning and Outcomes Assessment [1 ed.] 0805811621, 9780805811629

Recent years have seen dramatic changes, prompted by out-of-control costs, in the way physical and mental health care se

1,203 60 44MB

English Pages 656 [664] Year 1994

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Use of Psychological Testing for Treatment Planning and Outcomes Assessment [1 ed.]
 0805811621, 9780805811629

Citation preview

The Use of Psychological Testing for Treatment Planning and Outcome Assessment

~——

MARK E. MARUISH ——

na

3 100069350

DATE DUE FEB 28 2012 S62 52 5/

The Use of Psychological Testing

for Treatment Planning and Outcome Assessment

¥ Iso:a to se off aor inamuseiyl 107 REA smoahO

THE USE OF PSYCHOLOGICAL TESTING FOR TREATMENT PLANNING AND OUTCOME ASSESSMENT

Edited by Mark E. MaruisH CS Assessments National Computer Systems

SETUN RALL UNIVERSITY $0. GRANGE, HB’

[EA 1994

LAWRENCE ERLBAUM Hillsdale, New Jersey

ASSOCIATES,

PUBLISHERS Hove, UK

4 rt

er

>

Copyright © 1994, by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of the book may be reproduced in any form, by photostat, microform,

retrieval system, or any other

means, without the prior written permission of the publisher. Lawrence

Erlbaum Associates,

Inc., Publishers

365 Broadway Hillsdale, New Jersey

07642

Library of Congress Cataloging-in-Publication Data The Use of psychological testing for treatment planning and outcome

assessment / edited by Mark E. Maruish. p. cm. Includes bibliographical referenes and indexes.

ISBN 0-8058-1162-1 (alk. paper) 1. Psychological tests. 2. Mental illness—Diagnosis. 3. Psychiatry—Differential therapeutics. 4. Mental illness— Treatment—Evaluation.

5. Psychiatric rating scales.

I. Maruish,

Mark E. (Mark Edward) [DNLM: 1. Psychological Tests. 2. Mental Disorders—diagnosis. 3. Patient Care Planning. 4. Projective Techniques. WM 145 U84

1994] RC473.P79U83 1994 616.89’075—dc20 DNLM/DLC for Library of Congress

:

93-37257

GIP

Books published by Lawrence Erlbaum Associates are printed on acid-free paper, and their bindings are chosen for strength and durability. Printed in the United States of America

LO®

9°86]

Ongar

oc2

nL

Contributors

Thomas M. Achenbach University of Vermont Robert P. Archer Eastern Virginia Medical School C. Clifford Attkisson University of California, San Francisco Larry E. Beutler University of California, Santa Barbara James A. Ciarlo University of Denver

William D. Faustman Department of Veterans Affairs Medical Center, Palo Alto and Stanford University School of Medicine Raymond D. Fowler American Psychological Association Antonio A. Goncalves University of Miami Roger L. Greene Pacific Graduate School of Psychology Thomas K. Greenfield Medical Research Institute of San Francisco and

James R. Clopton Texas Tech University C. Keith Conners Duke University Medical Center Susan E. Costin Texas A&M University Lynn DellaPietra Hahnemann University Leonard R. Derogatis Hahnemann University

University of California, San Francisco William Henry Vanderbilt University L. Michael Honaker American Psychological Association

Joel Katz The Toronto Hospital and the University of Toronto Randy Katz The Toronto Hospital and the University of Toronto

vi

CONTRIBUTORS

Concordia University, Montreal, Canada

Frederick L. Newman Florida International University

David Lachar University of Texas Medical School at Houston

Brian F. Shaw The Toronto Hospital and the University of Toronto

Michael J. Lambert

Charles D. Spielberger University of South Florida

Rex B. Kline

Brigham Young University Lewis Lazarus Hahnemann University Mark E. Maruish NCS Assessments, National Computer Systems Theodore Millon University of Miami and Harvard Medical School

Douglas K. Snyder Texas A&M University Sumner J. Sydeman University of South Florida Phylis Wakefield University of California, Santa Barbara Irving B. Weiner University of South Florida

Kevin L. Moreland Fordham University

Rebecca E. Williams University of California, Santa Barbara

Leslie C. Morey Vanderbilt University

Mark J. Woodward University of Miami

Contents

Foreword

xiii

Preface

PART I: GENERAL CONSIDERATIONS a

Introduction Mark E. Maruish

Psychological Tests in Screening for Psychiatric Disorder

22

Leonard R. Derogatis and Lynn DellaPietra Use of Psychological Tests/Instruments for Treatment Planning

55

Larry E. Beutler, Phylis Wakefield, and Rebecca E. Williams Use of Psychological Tests for Outcome Assessment

75

- Michael J. Lambert Criteria for Selecting Psychological Instruments for Treatment Outcome Assessment Frederick L. Newman

98

and James A. Ciarlo

Selection of Design and Statistical Procedures for Progress and Outcome Assessment Frederick L. Newman

PART II: ADULT ASSESSMENT INSTRUMENTS 7.

Minnesota Multiphasic Personality Inventory-2

137

Roger L. Greene and James R. Clopton

vil

Vili

CONTENTS

8.

161

Millon Clinical Multiaxial Inventory-II Antonio A. Goncalves, Mark J. Woodward,

and Theodore Millon

185

Personality Assessment Inventory Leslie C. Morey and William Henry

10.

SCL-90-R, Brief Symptom Inventory, and Matching Clinical Rating Scales Leonard R. Derogatis and Lewis Lazarus

217

11.

Rorschach Assessment Irving B. Weiner

249

12.

Beck Depression Inventory and Hopelessness Scale

279

Randy Katz, Joel Katz, and Brian F. Shaw

13.

State-Trait Anxiety Inventory and State-Trait Anger Expression Inventory Charles D. Spielberger and Sumner J. Sydeman

292

14.

Marital Satisfaction Inventory Douglas K. Snyder and Susan E. Costin

322

15.

Katz Adjustment Scales James R. Clopton and Roger L. Greene

352

16.

Brief Psychiatric Rating Scale William O. Faustman

371

17.

Client Satisfaction Questionnaire-8 and Service Satisfaction Scale-30 C. Clifford Attkisson and Thomas K. Greenfield

402

PART Iii: CHILD AND ADOLESCENT INSTRUMENTS

18.

Minnesota Multiphasic Personality Inventory-Adolescent Robert P. Archer

19.

Millon Adolescent Personality Inventory and Millon

423

453

Adolescent Clinical Inventory Mark J. Woodward, Antonio A. Goncalves, and Theodore Millon

20.

Personality Inventory for Children and Personality Inventory for Youth

479

David Lachar and Rex B. Kline

21.

Child Behavior Checklist and Related Thomas M. Achenbach

Instruments

517

CONTENTS 22.

Conners Rating Scales

550

C. Keith Conners

PART IV: FUTURE DEVELOPMENTS 23.

Future Directions in the Use of Psychological Assessment for Treatment Planning and Outcome Assessment: Predictions and Recommendations Kevin L. Moreland,

Author

Index

Subject Index

581

Raymond D. Fowler, and L. Michael Honaker

603 629

To Ginny, Katie, and Abby— those who make it all worthwhile

Foreword W. Grant Dahlstrom University of North Carolina at Chapel Hill

Experienced clinicians and beginning students share a common need: current information on assessment devices that they may not be very familiar with from their previous coursework. For beginners, it is a joy to come upon sources that have been prepared by individuals involved in the original development of such instruments and that trace the subsequent developments of each device up to the present. For practitioners who may be called on to employ, in some critical circumstance, a test or procedure that they may not have used for some time, it is valuable to have on hand the materials that have been assembled here by an editor with a critical eye for what practitioners need to know. Whether as new topics or as subjects for needed review, students and practitioners alike will also benefit from treatments, in their proper contexts, of such issues as base rates and hit

rates, false positives and false negatives, sensitivity and specificity, or the cost-effectiveness of various professional procedures. Reminders of these important considerations are brought to the attention of their readers by the authors of these contributions in the settings in which they make important differences in decisions about screening, referrals, or readiness for discharge or treatment termination. It is refreshing to find these technical issues taken up with clarity and documented with real world data. All too often, they are encountered by students only in textbooks in theory courses, usually with examples provided from some field other than psychological assessment. For the practitioner, they may be quite hazy topics recalled, if at all, only with some vague feelings of concern and worry. The presentations here should serve to clarify (and calm) users of assessment devices of many different kinds.

xi

a

ee

ee

-



_

‘Ratnennt

=

ie

%

-

4 :!

‘See

_

t+

noOwsoT ramen

7

—_—

2

ina

o-duncethigns

6

>

“a9

»

gentlest

a.ie

le

ies

eal)

Jeans te

WV

ere.)

Avyuwih

von

e

4

3

eg

~

oe ;

1

i

ioe

as

-

a a

~~

=

¥

; Pi

mreww'

7

=

=

;

>

ake y r

ae

mroriaitteD

e



"

'

7 Aj 7

ee: ee. a

»

,

x

oa le

ee

;

_

_

cae

es ee ee 7 ~*~, ~

@

~

'

x

iv 4 7



jal

Siege

G

99

4

“er

hy

aT od “on

cwiky

a



112d] wettrekale

‘ :

an

>

aan

as 7

ve = _

formu Soe Me

ae

~

4 Sate shy alle Goings? beam, ;

Sirerees attra tolt mot die mar) v0 tow aan ion ja Mhoobiet vt benang ora) eve wel) once more ome ca Yo, 6

wenger

iN

oT

4

NO Hutt)



wigs

mad

ot sort ii bins suromuriters vhaur Yo faatnigetowilty Heals allylb bias

08 i Leer ot rem uth eeititnesy 1

ceeeny witton qa mapa

Wal bee) wend top yam ved! tad) ubrauny ve tee comatenelinty tannins

ce ak

:

a

we 4d Seni Leiden Meaed «ondt watt ale waar 9b Hid ee eremcas aptialy ceeiabor: oe i Lone eobeliasny abe We Herat tite itil

Mills “esttinapn bon labor: «sive; fleet tot aapdoe ew Sib bee eae food a. cree tome Te otrate a yaty teal ab ie

2 aaa Oyance’ wsXrsiieae beseitritionns get ow evi vanranss sas" orth see Mo ty a Mili otegies ah gisadem.4) ae «sthlocee SR aersilc nrSe

| ater

erie racy eel) at

aN Ha

300

cy Dig

ae

-

Preface

Over the past several years, the American people have witnessed some rather dramatic changes in the state of their health-care system. These changes, prompted by the out-ofcontrol costs of health services, have reshaped the way in which health care is delivered and

paid for. The resulting developments have affected not only physical health-care services but mental health services as well. The practice of test-based psychological assessment has not entered this new era unscathed. Limitations placed on total monies allotted for all aspects of psychological treatment, as well as those specifically related to reimbursement for psychological testing, have had an impact on the practice of psychological testing. However, for those skilled in its use, the ability of testing to help quickly identify psychological problems, plan treatment, and document the effectiveness of that treatment presents many potentially rewarding opportunities during a time when health-care organizations need to provide problem-focused limited treatment, demonstrate the effectiveness of the treatment to payor and patient, and implement a philosophy of quality improvement.

With the opportunity at hand, it is now up to those with skills and training in psychological assessment to make the most of this chance to contribute to (and benefit from) efforts to control or eliminate the health-care crisis that confronts us. However, this task may not be as

simple as it would appear. Many clinical psychologists, other applied psychologists, and other professionals schooled in the use of psychological tests have actually had relatively limited exposure to the full range of applications of testing in day-to-day clinical practice. For many, their formal testing courses and practicum and internship experiences have focused primarily on the use of testing for symptom identification, personality description, and diagnostic purposes, while minimally addressing how test results can assist in planning treatment or assessing the impact of that treatment. Consequently, although the basic skills are there, there is a need for many well-trained clinicians to develop or expand their knowledge and skills in psychological testing to better apply them for treatment planning and outcome assessment purposes. It is this very need, and that related to those in graduate-level training testing courses, that have served as the impetus for the development of this book.

In contemplating the contents of this volume, it was decided that the most informative and Xili

XIV

PREFACE

useful approach would be to address separately aspects of four broad topical areas. The first area concerns issues and recommendations to be considered in the use of psychological testing, in general, for treatment planning and outcome assesment. The second and third areas deal with issues related to the use of specific psychological tests and scales. The fourth section concerns the future of psychological testing. Part I is devoted to addressing general considerations that apply to the need for and use of psychological testing for treatment planning and outcome assessment purposes. The introductory chapter provides an overview of the status of the health-care delivery today and the ways in which testing can contribute to making that system more cost-effective. Two chapters are devoted to general matters related to issues related to treatment planning, and three chapters focus on issues related to outcome assessment. The first of the two planning chapters deals with the use of psychological tests for screening purposes, with particular emphasis on screening in nonpsychiatric settings. Screening can serve as the first step in the treatment planning process, thus it is a topic that warrants the reader’s attention. The second of these chapters presents a discussion of the research that suggests how testing may be predictive of differential response to treatment and its outcome. The three chapters on the use of testing for outcome assessment are complementary. Chapter 4 provides a somewhat general overview of the use of testing for outcome assessment purposes, discussing some of the history of outcome assessment, its current status, its measures and methods, individualizing outcome assessment, and the distinction between clinically and statistically significant differences in outcome assessment. The next two chapters expand on the groundwork laid in this chapter. Chapter 5 presents an updated discussion of a set of specific criteria that can serve as an invaluable guide to clinicians in the selection of psychological measures for assessing treatment outcome. Chapter 6 provides a rather extensive discussion of statistical procedures and research design issues related to the measurement of treatment progress and outcome with psychological tests This discussion is presented with full knowledge of the understandable distaste that many clinicians have for statistics. However, knowledge and skills in this area are particularly important and needed by clinicians wishing to establish and maintain an effective treatment evaluation process within their particular setting. The next two sections address the use of specific psychological instruments for planning and outcome assessment purposes. Part II focuses on instruments that are intended exclusively or primarily for use with adult populations. Part III deals with child and adolescent instruments. Some may consider the chapters in these two sections as being the “meat” of this volume, because many provide “how-to” instructions for tools that are commonly found in the clinician’s armamentarium. However, these instrument-specific chapters are no more or less important than those found in the first section. They are only extensions and are of limited value outside of the context of those first few beginning chapters. In determining the contents of the second and third sections, two major questions needed to be answered. The first question concerned which of the hundreds of currently available psychological instruments should have a chapter devoted to its use for treatment planning and outcome assessment. The second involved the actual chapter content, that is, what issues or

questions are of the greatest concern or relevance to clinicians and thus should be dealt with in each of these instrument-specific chapters. Arriving at resolutions to these matters was not

easy. Instruments, adult and child/adolescent, that were candidates for topics of the instrument-

specific chapters were evaluated against several selection criteria. These included the popularity of the instrument among clinicians; the acceptance of the instrument on the basis of its psychometric qualities; in the case of recently released instruments,

the potential for the

instrument to become widely accepted and widely used; the perceived usefulness of the

PREFACE

XV

instrument for treatment planning and outcome assessment purposes; and the availability of a recognized expert on the instrument (preferably its author) to contribute a chapter to this volume. In the end, the instrument-specific chapters selected for inclusion were those judged to be of greatest interest and utility to the majority of the intended audience. Readers currently using other tests should view the instruments selected for inclusion as being worthy of consideration, should they be evaluating new tools for treatment planning and/or evaluation purposes. A decision regarding the general content of the instrument-specific chapters was somewhat more difficult to arrive at. However, in the end, the contributors were asked to develop

a chapter that addressed three important questions: What does the instrument do and how was it developed? How should one use this instrument for treatment planning? How should it be used to assess treatment outcome? Guidelines were provided to assist the contributors in addressing these questions. Many of the contributors adhered strictly to these guidelines; others modified the contents of their chapter to reflect and emphasize what they judged to be important to the reader to know about the instrument when using it for planning and outcome assessment purposes. Regardless of the approach taken, each of these chapters provides information that will be useful to those wishing to employ the instrument for the stated

purposes. Part IV offers a discussion about the future of psychological assessment. This single chapter was written to inform the reader of anticipated advances in the field of testing, as well as anticipated legislative and medical care mandates that may affect the manner in which psychological testing may be used in years to come. It is important to recognize that this volume is not intended to be definitive work on the topic. However, we hope the reader will find the contributions useful for a better understanding of general and test-specific considerations and approaches related to treatment planning and outcome assessment, and in effectively implementing these in their daily practice. We hope, as well, that this work will stimulate further endeavors in investigating the application of psychological testing for these purposes.

Acknowledgments The development of this volume could not have taken place without the assistance and support of several people. First and foremost, I have been quite fortunate to have some of the leaders in the construction, use, and interpretation of psychological tests as contributors to this volume. I express my utmost gratitude for their time, efforts, and expertise. Without them, this book would not have come into being.

I also thank the management and staff of the NCS Assessments Division of National Computer Systems. They provided me with various forms of support that facilitated the development of this book. I am also grateful for the input, encouragement, and moral support that came from my coworkers at NCS while working on this project. In particular, I acknowledge the input regarding the content and development of this book that was provided by Terri Foley, Nancee Meuser, and Denny Morrison. Finally, I express special appreciation to my family—Ginny, Katie, and Abby—for all of the sacrifices they made for me during my work on this book. Their love, understanding, and tolerance enabled me to persevere during the more difficult phases of this project. Mark E. Maruish

oN

{ os ft pete . fonts

pal

bes

tl ate

apt

j

i

'

n

rae

a 0020" a. he as ighe

AGT

eT

sed wgn BEC OE tT

ovr

yO

ae

ete OFelsAT ‘evdepaanrealt oor pea

Rr

FW Aes eit. eee

es HA Pe ee



ie

f «Wis Nein

te

iF

“eons sins wa ena

a

iv

Ee

gt Sowers

mgs7%

- :

mr

be

Py

ae ace rw il

Mae

|

Ve agin

,

a Sy

vated

|

Hh pie Reith Mabe it coed Met

} ya

yet

Ot

ay rutin

Sang tends Tala’ (ean ~datuluee sikSemana.

ue

; be.

a

a

ieee

sae fr YR WOE FAD Print’ Hee arse teow wi \talthtrrey @Ew atu itd 1 ge apt ite oi. tue oy gel Pe

Sp Winer adios HF eau tiias pow ; saibiadiveien i)

Nw

5

cure

eave

ae

wryinease paterae rhialas 5

STAMOSAE ION

PIN

eoarhig

aie

ww hieote

ae

|

4

cea

eer

i

a

tegease

mipta ’nalthayhc ali Oe

a

ee

7

7

~

peng —

ny eo Gal ots ee

ang

hv

Fawrwe



ngy

TD

- mat

ah at

We

Seem

|

-

cana

o OP? yale

py pe A iy!

dtl Di

ai

i

.

5 i

ow

= d eran a wmeahntms ve pene NIA RE gh panne

anes

I

prety

wil

>

espe.

ll

eli oo

ceniieniiadta

mee

ee re ...

-

wath

e

-

The Use of Psychological Testing for Treatment Planning and Outcome Assessment

PART |

GENERAL CONSIDERATIONS

Chapter 1 Introduction Mark E. Maruish NCS Assessments Division, National Computer Systems

The use of psychological tests in the assessment of human behavior always has been viewed as one of the major hallmarks of applied psychology, particularly clinical psychology. However, the potential contribution of test-based psychological assessment has not always been fully appreciated. Spielberger (1992), recently discussed the decrease in interest in assessment that began in the 1960s. He attributed this decline to several factors. As originally reported in Megargee and Spielberger (1992), these included a growing emphasis on teaching behavioral treatment approaches in graduate programs, an increase in the use of psychotropic medications, American psychiatry’s change of focus from syndromes and personality structure to symptoms, professional time constraints resulting in a preference for brief screenings over full battery assessments, and the questioning of the utility of assessment for treatment planning purposes. However, both Watkins (1991) and Spielberger (1992) also have observed a “growing Renaissance in psychological assessment throughout the world” (Spielberger, 1992, p. 6) during the past decade. From his review of psychological assessment surveys conducted over the past 30 years, Watkins (1991) concluded that [P]sychological assessment has long been and continues to be a highly and clearly important role in psychological training and practice. Assessment is a major component of psychological training programs, and most practicing psychologists, regardless of work setting, provide assessment services and spend a fair portion of their professional time doing so. (p. 431)

On the other hand, Spielberger pointed to the growing membership in assessment-oriented organizations (e.g., the Society For Personality Assessment) and the number of and attendance at various related conferences throughout the world as evidence of this contention.

Again, as originally reported in Megargee and Spielberger, Spielberger attributed this renewed interest to multiple factors: realization that assessment can assist in psychological intervention, increased demands for personality assessment in nonclinical areas such as business and education, the continuation of basic assessment training in graduate programs to

meet internship program demands, and assessment’s contributions to research (e.g., by way of construct definition) in all areas of psychology.

MARUISH

Weiner (1992) similarly noted how psychodiagnostic assessment has rebounded as a field of endeavor during the past 15 years. He pointed to better test materials, a wider range of application, increases in both the requests and appreciation for psychodiagnostic consultation, more investment in assessment activities by professionals, and innovations in how the practice is conducted. This observed resurgence of appreciation for and interest in assessment coincides with changes in the health-care industry that are beginning to impact the manner in which psychological services are provided. These changes reflect the alarming trend that has occurred in the health-care field over the past couple of decades (i.e., the skyrocketing of health-care costs in the United States). This includes the cost of mental health and related services. As a

result, there has been a drive to contain health-care costs through a variety of means, while striving to continuously improve the quality of the services delivered. This later movement, frequently referred to as “continuous quality improvement” or “total quality management,” is neither a new concept nor one that is indigenous to the health-care industry. Rather, it is one that has been applied in business settings for decades and is now gaining increasing attention in the health-care field as the crisis taking place within it escalates. One may wonder how the role of psychological assessment may be enhanced or benefit from the current state of affairs in the mental health-care arena. As Maruish (1990) explained, First, consider that the handwriting on the wall appears to be pointing to one scenario. With limited dollars available for treatment, the delivery of cost-efficient, effective treatment will be dependent on the ability to clearly identify the patient’s problem(s). Based on this and other considerations, the most

appropriate treatment modality . . . must then be determined. Finally, the organization will have to show that it has met the needs of each client. . . . It is in all of these functions—problem identification, triage/disposition, and outcome measurement—that psychological assessment can make a significant contribution to the success of the organization. (p. 5)

The movement to control costs and improve quality presents a challenge to psychologists and other clinicians to survive in the rapidly changing mental health-care market. At the same time, it also presents to those skilled in assessment a heretofore unequaled opportunity to both capitalize on their skills and contribute to the development of an effective, stream-

lined model of mental health-care delivery. It is this challenge and opportunity that has inspired and motivated the development of this book. To better understand both the challenge and the opportunity, it is important for the reader to grasp the full magnitude of the health-care crisis, what is currently being done about it now, and what will be done in the near future. Equally important is an understanding of continuous quality improvement as a general approach to product development and service delivery, as well as general approaches to its implementation in the health-care field. This will allow a better appreciation of the general discussion of the use of psychological testing for treatment planning and outcome assessment. For the remainder of this chapter, the term assessment refers to test-based assessment. This is in contrast to psychological assessment, which is conducted via patient and collateral interviews, record reviews, or other means without the benefit of psychological test results. However, test-based assessment does not preclude consideration of information from these other data sources.

The Health-Care Crisis and Its Solutions A few years ago, Ellwood (1988) noted that the healthcare system has become an organism guided by misguided choices; it is unstable, confused, and desperately in need of a central nervous system that can help it cope with the complexities of modern medicine. The problem is our inability to measure and understand the effect of the choices of patients, payers, and physicians on the patient’s aspirations for a better quality of life. The result is that we have uninformed patients, skeptical payers, frustrated physicians, and besieged health care executives. (p. 1550)

Zimet (1989) provided examples of these misguided choices in the field of mental health service delivery in his discussion on the mental health-care revolution. In commenting on the state of affairs at that time, he noted how much more amenable Medicare, Medicaid, and other insurance plans were to inpatient treatment as opposed to less costly outpatient treatment.

He pointed to data‘drawn from Kiesler’s (1980) article, indicating that 70% of the

mental health dollar went to cover hospitalization costs. These are two signs of the disarray from which the health-care system is now trying to recover.

THE COST OF HEALTH CARE: GENERAL CONSIDERATIONS The cost of health care, in general, is out of control. Pallak and Cummings (in press) observed that medical and mental health-care costs have increased faster than that which would be predicted on the basis of the population increase, inflation, or the growth of the gross national product (GNP). Numerous articles report staggering statistics that should raise the concern of consumers, providers, and third-party payors. For instance, Zimet (1989) reported that American health-care costs rose from $50 billion in 1960 to $90 billion in 1973 to $500 billion (or 11% of the GNP) in 1986. Resnick (1992) reported that, in 1991, an estimated $735 billion was spent on health care. Zimet (1989) cited Califano’s (1986)

prediction that $1.5 trillion (or 15% of the GNP) will go to health care by the year 2000. Similar statistics were reported by Kiesler and Morton (1988) and Broskowski (1991). In addition, Kiesler and Morton cited Uyeda and Moldawsky’s (1986) report of a 19.2% increase in hospital costs from 1979 to 1982, whereas Broskowski (1991) noted that Ber-

man’s (1987) figures indicated that the rate of health-care cost growth surpassed that of the inflation rate. Health-care costs have had a tremendous impact on American businesses. Zimet (1989) reported that, in 1988, the average increase in health insurance premiums was 20%, with Medicare rising 38.5% and Blue Cross/Blue Shield rising 38% for federal workers. He also noted that, during the first 9 months of 1987, General Motors spent $2.3 billion on medical coverage, while bringing in profits of $2.7 billion during that same period of time. Kiesler and Morton (1988) reported that, by 1984, the cost of health insurance premiums to corporate America

was $90 billion; citing Califano (1986), they indicated that this represented

38% of corporate America’s pretax profits at that time. Broskowski (1991) cited trends identified in a Foster-Higgins (1989) report of a 1988

survey of health benefit costs of more than 1,600 small, medium, and large companies with over 10 million employees across 40 specific industries. Of particular note was the increase in the average per-employee health-care cost from $1,645 in 1984 to $2,354 in 1988. Healthcare cost increases over the previous year were experienced by a great majority of the

5

6

MARUISH companies (88%). Slight variations in cost occurred as a function of geographical location, whereas moderate variations occurred as a function of industry being considered (e.g., energy-petroleum vs. retail wholesale trades). What contributed to the dramatic increase in health-care costs? Both Kiesler and Morton (1988) and Broskowski (1991) offered their own perspectives. Kiesler and Morton pointed to several factors. First, a passive risk-sharing insurance system came into being in this country. Planning for the implementation of new initiatives in health care was inadequate. Various inflationary practices, such as fee for service, cost-based reimbursement, and incentives for

choosing the inpatient over outpatient treatment, became a part of the system. The taxexempt status of corporate health-care costs lessened consumer price sensitivity and the sensitivity of the health-care market to rises in income and/or decreases in product price. In addition, Kiesler and Morton (1988) indicated that some health-related values held by

Americans also might have contributed to the problem. These values can be summed up best by attitudes such as, “Americans have a right to health care,” “My doctor knows what for me,” and “It’s good, so more must be even better.” The following also contributed increase in costs: broader insurance coverage, more access to service providers, and, quently, more utilization of services. Finally, Kiesler and Morton (1988) pointed

is best to the conseto the

tendency of the American system to deny coverage for the initial (and often inexpensive) costs of treatment, whereas insuring the later (and frequently more expensive) dollar costs of care. Broskowski (1991) related increases in health-care costs to two general factors: the increased price for health-care services and increased health-care utilization. He presented a somewhat detailed scenario explaining a complex, interactive set of circumstances that contributed significantly to the problem of increased costs. These circumstances included insurance companies’ willingness to cover higher hospital charges, their passing the costs onto employers who could provide such benefits to their employees because of tax incentives, a consequent lack of incentive on hospitals’ part to operate effectively and competitively, and federal legislation incenting hospitals to increase staff, space, and equipment. Also, insurance plans began to incent inpatient over outpatient treatment; at the same time, there was no impetus for hospitals to stop inappropriate admissions. Broskowski (1991) also pointed to societal factors that contributed to increased costs. These included the increased need for health-care services by an aging American population, cost-insensitive

societal expectations for cures for most medical conditions,

increasingly

stringent standards of care, and increased costs for diagnostic and treatment procedures ordered to protect hospitals and physicians from medical malpractice suits. Lifestyle behaviors (e.g., smoking, overeating), inadequate funding for preventive care, expectations for access to new medical technology, and increases in lengths of stay (LOSs) or readmissions for the same incident of illness were noted to have further contributed to the problem.

MENTAL HEALTH-CARE COSTS How have the costs of mental health care contributed the overall costs of health care? The statistics again are startling. In 1985, Frank and Kamlet (cited in Broskowski, 1991) reported that treatment costs for mental health, alcohol, and drug abuse problems steadily increased at a rate exceeding that of other insured conditions. Kiesler and Morton (1988) indicated that the percentage of the U.S. population utilizing mental health services rose from 14% to 26% from 1957 to 1976. Winegar (1992) noted a 350% increase in adolescent psychiatric hospital admissions from 1980 to 1984. As cited in Richardson and Austad (1991), the 1988 article

1

INTRODUCTION

“Developments in the Health Care Marketplace” pointed to data that suggested that, at the time, an estimated 20%—25% of all health-care claims were mental health-care claims. Shulman (also cited in Richardson & Austad, 1991) indicated that 75% of these costs could be attributed to residential and inpatient treatment. In the previously reported Foster-Higgins (1989) survey of over 1,600 employers (cited in Broskowski, 1991), these costs amounted to 9.6% of the total costs for their medical plans. In terms of absolute dollars, various authors have reported somewhat conflicting figures. Zimet (1989) indicated that, by 1986, $15 billion dollars were spent on mental health care, whereas Harwood, Napolitano, and Kristiansen (cited in Kiesler & Morton, 1988) reported that, in 1983, even excluding substance abuse costs, mental illness had resulted in direct and

indirect costs totaling $73 billion. Regardless of which figures more accurately represent the current state of affairs, it is apparent that the costs incurred in the treatment of mental illness have been significant. Several reasons for these huge mental health-care expenditures have been reported. In addition to the increased coverage for mental health services, Kiesler and Morton (1988) pointed to increased access to providers as a factor in the increased use of mental health services. They cited Mechanic’s (1980) statistics on the fivefold increase in the number of mental health professionals that occurred between 1947 and 1977. Zimet (1989) pointed to another set of factors related to psychiatric hospitals’ exemption from a fixed-rate, prospective payment system based on diagnostic-related groups (DRGs). He attributed this to difficulties in tying particular LOSs to individual diagnostic groups. Noting that, at that time, psychiatric hospitals generally tended to have longer LOSs than general hospitals with or without psychiatric units, and that insurance companies tended to view outpatient psychotherapy less favorably than inpatient treatment (in terms of what they are willing to reimburse), Zimet (1989) very cogently summed up the then state of affairs: the reimbursement limitations imposed on hospital stays for physical illness do not apply to mental illness, addiction, and rehabilitation. Psychiatric hospitals, therefore, continue to attract investors, with the results that psychiatric hospitals are being built at an unprecedented rate. Inpatient psychiatric and addiction treatment has become the most lucrative hospital market, having few limitations, such as LOS, that general hospitals face with their patients. In turn, mental health professionals (mostly psychiatrists because they have hospital practices) and their patients face none of the restrictions on payment that are involved for outpatient psychotherapy by third party payers during a patient’s hospitalization. (p. 705)

Broskowski (1991) also pointed to the exemption of psychiatric disorders from the DRG payment system and higher utilization as having contributed to the increasing costs of mental health, alcohol abuse, and drug abuse treatment. He attributed increased utilization to several

factors, including less stigma associated with mental illness and its treatment; increased substance abuse throughout all segments of the population; popularization of various forms of personal growth and psychotherapy; and (as also indicated by Kiesler and Morton) increased access to treatment providers, which he attributed to support made available for training and community-based treatment. It is important for the reader to realize that, in addition to the bleaker aspects of the state of

affairs, there is an up side to the utilization of the huge amount of mental health-care resources. Both Zimet (1989) and Kiesler and Morton (1988) pointed to the medical offset that can accrue from the utilization of mental health services. Essentially, they were referring to the well-documented reduction in medical costs that can come from mental health treatment. As Zimet (1989) indicated,

8

MARUISH in Those needing psychological care reduce their overall health care costs significantly by being for costs health lower have psychotherapy. On the other hand, those who do not require psychotherapy a variety of reasons. . . . Corporations and the federal government have not attended to the results of these [medical] offset studies as yet. However, they are aware that a very high percentage of physician visits are not due to physical illness but rather emotional problems. (p. 707)

Reactions to the Crisis Acknowledgment that health-care costs were out of control led to drastic changes. These changes are reflected in the way in which health-care services are delivered today.

GENERAL HEALTH-CARE CONSIDERATIONS Kiesler and Morton (1988) and Broskowski (1991) described initial cost-containment efforts

that included risk shifting and risk capitation. The former approach involves shifting the financial risk from employers and insurers to the customer via plans involving such things as higher premiums and higher deductibles. As a result of this tactic, the consumer would be expected to be more cost-sensitive and to exert more influence and control over the healthcare industry. Risk capitation involves putting a limit on the amount that will be paid for a given treatment. Here, there is a shift of risk from the insurer to the service provider, as exemplified by Medicare’s prospective payment system (PPS) based on DRGs. According to Kiesler and Morton (1988), risk shifting and capitation have been succeeded by two major forms of corporatization. The first is the formation of managed care organizations, including health maintenance organizations (HMOs), preferred provider organizations (PPOs), and independent practice associations (IPAs). The second is the growth of the privately owned hospital chains. According to Winegar (1992), however, this latter trend

seems to be diminishing from what was observed in the 1980s, at least as far as psychiatric and substance abuse facilities are concerned. Winegar attributes this to managed mental health care’s efforts to limit unnecessary hospital-based care. In addition, some health-care corporations were noted to have diversified their risks by investing in different health-care enterprises (e.g., hospitals, hospital supplies, and testing laboratories), or to have integrated vertically their provider system, thus providing all types services to all consumers within a given insured group. Along with the changes. discussed previously, Kiesler and Morton (1988) cited Rodriguez’s (1985) observation that the free market places more responsibility on the purchaser of health services, while decreasing the authority that has, in the past, resided with the service provider. Kiesler and Morton (1988) also stated that “. . . in the private sector, the

power base is shifting from providers to corporate officials and oversight boards, not necessarily provider-controlled, whose decisions are based on evidence that the treatment is needed and cost-effective” (p. 997). For example, this power was reflected in the single-

buyer self-insurance programs found in some large corporations. Kiesler and Morton projected the regulations imposed by such programs to result in many cost-saving practices, including increases in outpatient care with accompanying decreases in inpatient care, decreases in service intensity and inefficiency, and improvements in the documentation of treatment outcomes. They projected that a similar shift also might occur in the public sector under certain conditions.

1

INTRODUCTION

9

In summing up their findings, Kiesler and Morton (1988) stated that, Overall, immense changes are taking place. We predict that public and private cost-containment efforts, in tandem with rapid restructuring of the industry and growing consensus about the limits of free marketplace competition should lead to declining provider autonomy, increasing integration of services, increasing emphasis on treatment outcomes, increasing management purview and control, changing power bases, and (eventually) more consumer control via government and administrative decree. This extraordinary restructuring of our nation’s health care system is needed, is occurring, and will continue for some time. (p. 997)

Consistent with Kiesler and Morton’s view was Ellwood’s (1988) observation that . . . patients, payers, and executives of health care organizations have both higher expectations and greater power” (p. 1550). “sé

MANAGED HEALTH CARE The prominence of managed care in the delivery of health care in the United States today has been touched on previously. In fact, Cummings (1992) contended that the modal delivery system in universal health care is likely to be adopted in this country before the end of the century. Because of its current and future impact on the provision of health-care services and the providers of those services, it is worthwhile to discuss this trend in more detail. The increased provision of health-care services via managed care organizations over the past couple of decades has been documented by several authors. For example, Rundle (cited in Zimet, 1989) reported that, from 1984 to 1987, the number of written free choice commer-

cial health-care plans (i.e., allowing the insured to freely select among health providers and hospitals) dropped from 96% to 40%. During the same period, fee for service plans, having restrictions similar to those of HMO plans, rose from 3% to 44%. Broskowski’s (1991) review of the literature seemed to indicate that, whereas the growth of HMOs is leveling off, PPOs are increasing in popularity. As Zimet (1989) noted, “The almost total disappearance of free choice indemnity plans appears to be only a matter of time” (p. 704). Managed care organizations work to control both the utilization and price of services provided in several ways. Broskowski (1991) indicated that the principal means of reducing service utilization include developing plans that offer financial incentives to use efficient providers, increasing cost sharing, requiring authorization prior to receiving treatment, and employing utilization review (UR) procedures during the time the treatment is being rendered. According to the Foster-Higgins survey (cited in Broskowski,

1991), these organiza-

tions controlled price via coverage of less expensive but equally effective treatments, claims review, prospective payments for DRGs, capitation of payments for specific groups of beneficiaries, and negotiation of fee-for-service to selected providers offering quality and efficient services. Where does the delivery of mental health services fit within a managed care setting? Richardson and Austad (1991) cited the Levin, Glasser, and Jaffee (1988) survey of 304 managed health-care organizations. In this study published in 1988, 97% of the HMOs

surveyed offered some type of mental health coverage as part of their basic benefit. The median number of outpatient therapy sessions offered per year by the surveyed organizations was 20, whereas the median number of covered inpatient days was 30.

In addition, attempts to control the costs of mental health service delivery are similar to those for managed health care in general. According to Haas and Cummings (1991), these may include increased co-payments by the insured, limiting the individual yearly total

10

MARUISH covered costs and number of treatment episodes (or inpatient LOS), limiting treatment to specific disorders or use of specific techniques, and/or requiring pre-authorization for treatment. Richardson and Austad’s (1991) scenario of how the insured access managed mental

health-care services is similar to Haas and Cummings’ observations.

A Time of Opportunities for Psychology Despite the concerns (ethical and otherwise) about and resistance to participation in managed mental health-care programs that have been voiced by some professionals, the managed health-care system is not going away (Cummings, 1992; Richardson & Austad, 1991; Zimet, 1989). Oddly enough, the current trend in mental health-care delivery has placed psychologists in an advantageous position. Cummings (1992) felt that “managed care needs psychology, the premier psychotherapy profession, to maintain its clinical integrity.” Richardson and Austad (1991) cited Anderson and Fox (1987) when pointing to managed care’s striving for simultaneously providing quality care and saving on costs as the “essence” of managed health care, whereas Schulman (1989) indicated that psychologists could take advantage of efforts toward cost containment through “the marketing of maximum efficiency.” Kiesler and Morton (1988) and Zimet (1989) pointed to the psychologist’s training in the area of wellness and prevention, as well as their knowledge of the effect of environment and genetics on behavior as being beneficial in this new age of health-care delivery. Along with Broskowski (1991), they also viewed the psychologist’s traditional dissociation with the provision of the more expensive inpatient care as being a plus. In addition, Kiesler and Morton pointed to the psychologist’s empirical- and outcome-oriented approach as an advantage that can be used in the interest of the public. But perhaps the greatest chance for psychologists to contribute in the evolving health-care market place is afforded by their training and skills in the area of

assessment. Generally, clinical assessment (of which testing can be considered a part) may be viewed as “the process by which clinicians gain understanding of the patient necessary for making informed decisions” (Korchin, 1976, p. 124). According to Korchin and Shuldberg (1981),

“the basic justification for assessment is that it provides information of value to the planning, execution, and evaluation of treatment” (p. 1154)—areas of critical importance in the new

health-care environment. Thus, it seems that the skills possessed by most psychologists and that provide them with the greatest opportunity in this new health-care environment are those related to psychological testing. Consistent with this picture is Maruish’s (1990) view of how psychological testing can play an important role in the delivery of mental health care in the future. While expanding on his ideas of what will be needed in the future, he noted that What will be valued are self-administered, multidimensional, and problem-oriented scales, as well as other data-gathering instruments (e.g., psychosocial histories) that can quickly and economically highlight the problems requiring attention. Those instruments that also prove useful in selecting the most cost-effective, appropriate treatment will be worth their weight in gold. In addition, psychometrically sound self-report measures, sensitive to changes in psychological disturbance and administered in pre- and post-treatment fashion, can meet the organization’s need to demonstrate treatment effectiveness. Therapist rating scales and client satisfaction scales, indicating the

degree to which the client is pleased with the services they have received, can also be used for this purpose. Overall, the use of one or more of these measures will serve as a powerful means of both monitoring and improving the organization’s level of care and marketing itself as a provider of quality mental health services. (p. 5)

1

INTRODUCTION

14

The interest of many health-care organizations in adopting a philosophy of and implementing procedures for continuous quality improvement (CQI; InterStudy, 1991; Maruish, 1991) has created other opportunities for psychological testing. This follows from the fact that, “. . . defining the processes of a system, then measuring and evaluating results in order to improve those processes is at the heart of continuous quality improvement” (InterStudy, 1991). As is discussed further later, with its utility in measuring results, psychological testing has the potential to make significant contributions and demonstrate its value to the CQI efforts of mental health-care organizations. An often overlooked, yet important, aspect of psychological testing is its ability to add standardization to the treatment planning, outcome assessment, and CQI components of service delivery, The application of specific tests or battery of tests to served patients provides valuable information that can affect the overall quality of the service both now and in the future, and permits clinicians and third-party payors to communicate with and among each other with a common language. This may result in therapeutic benefits. This matter is not addressed further in this chapter. However, it is a consideration of which the reader should remain mindful.

PSYCHOLOGICAL TESTING FOR TREATMENT PLANNING Strupp recently commented (cited in Butcher, 1990) on the distinction that treatment planning currently enjoys. Note the consistency of the following with that cited from Maruish (1990) earlier: Why has treatment planning come to assume such prominence in recent years? Among important and interrelated reasons, one should mention concerted efforts to make psychotherapy more efficient and cost effective, the growing influence of “third parties” (insurance companies and the federal government) that are called upon to foot the bill for psychological as well as medical treatments, and society’s disenchantment with open-ended forms of psychotherapy without clearly defined goals. . . . Today there are [many] treatment options whose potential value must be studied in great detail and depth. The search for the “right” therapy for the “right” patient continues. . . . (p. iii)

How is psychological testing viewed with regard to its potential role in treatment planning? The opinions and arguments regarding its use vary. For example, in their recent work on the Millon Clinical Multiaxial Inventory (MCMI),

Choca, Shanley, and Van Denberg

(1992) identified some common objections to the use of psychological testing in general for treatment planning purposes. One is that the information most useful for therapeutic purposes only can be obtained through the face-to-face therapeutic interview. Related to this is the argument that testing actually interferes with the therapeutic process. Another objection comes from those who feel that knowledge of assessment results has no impact on the

therapy process or its outcome. Lastly, Choca et al. (1992) noted that the therapist’s knowledge of test results may negatively color or influence his or her attitude toward the patient. Applebaum (1990) pointed to other common arguments against testing, including the expense of testing, the contention that a good interviewer can obtain all necessary information without tests, and the stagnation or decline of the clinician’s interviewing skills because of his or her reliance on testing for information-gathering purposes. He also pointed out that some feel that testing is counterproductive, with possible consequences being the dehumanization of the patient, the development of negative transference in the patient, and the revelation—to both patient and therapist—of some information prior to the patient’s readiness to adequately process this information.

12

MARUISH

Regardless of these opinions and arguments, the literature suggests that psychological testing is alive and well; Spielberger’s (1992) and Weiner’s (1992) previously noted comments attest to this fact. In his review of psychological assessment surveys conducted during the years 1960 to early 1990, Watkins (1991) concluded that approximately 50%—75% of psychologists devoted approximately 10% of their time to assessment services. In addition, Choca et al. (1992) pointed to recent test usage surveys that demonstrated that “.. . a substantial number [of clinical psychologists] continue to do psychological testing . . . and apply their findings to their treatment recommendations” (p. 188). What role does or can psychological testing play in effective treatment planning? Watkins’ (1991) review of the literature indicated that tests are used chiefly to supply information about the patient’s personality structure, respond to assessment needs, and for diagnostic purposes. Butcher (1990) viewed the contribution of psychological assessment with instruments such as the revised Minnesota Multiphasic Personality Inventory (MMPI-2) in treatment planning as being found not only in the assistance that it can provide in developing a plan consistent with the personality and external resources of the individual client, but also in its potential for allowing the client (at least in some instances) to find out more about him or herself and for facilitating client—patient communication. He saw assessment via objective tests as having the potential of providing a shortcut and/or a second opinion in identifying the patient’s problems as well as clues to his or her nature. It also may allow for the revelation of otherwise unforeseen obstacles to therapy and possible areas of individual growth. In addition, pretreatment psychological testing can identify problems and related personality dimensions (e.g., defenses, fears) outside the patient’s awareness, permit a comparison or evaluation of the patient’s problems vis-a-vis a normative frame of reference, and provide a structured vehicle for conveying feedback to a patient regarding the nature and extent of his or her problems. Similar to Butcher, Appelbaum (1990) advocated the use of testing for not only quickly identifying or diagnosing problems, but also for offering a second opinion. He saw tests as having the potential to assist in identifying latent weaknesses and strengths, as well as making the clinician aware of the complexity of the patient’s personality. In addition, and possibly most importantly, the results of assessments may serve as a guide or reference point during the course of treatment. Perhaps the extent to which psychological testing can contribute generally to treatment planning can be summed up best by Strupp (cited in Butcher, 1990), who noted that, “careful assessment of a patient’s personality resources and liabilities is of inestimable importance. It will predictably save money and avoid misplaced therapeutic effort; it can also enhance the

likelihood of favorable treatment outcomes for suitable patients” (pp. v—vi). Before concluding discussion on this topic, it is important to note one more way in which psychological testing can contribute significantly to treatment planning. This is in the area of screening for psychopathological conditions. Screening generally is viewed as a clinical activity in which a patient is administered a

relatively gross, brief procedure intended to identify signs or symptoms that are pathogno-

monic of the presence and/or severity of an underlying disorder (e.g., depression). These procedures may be in any of several forms, including tests and examinations, structured or semistructured interviews or rating scales completed by the clinician or the patient, a review of the patient’s chart, or other such means of gathering important patient information. Most of the generally accepted procedures designed for this purpose yield conclusions that have a

high probability of being correct. However, this information is often limited, with positive

findings suggesting the need for further, more extensive evaluation to verify the findings or clarify their implications (e.g., what type of depression is the instrument identifying as being

1

INTRODUCTION

13

present). Other screening procedures are designed to assess nonpathologic features, such as the general level of functioning related to the presence or level of a specific construct, such as intelligence. A discussion of this latter type of screening procedure is not undertaken in this volume. The field-of medicine is replete with routine screening procedures (e.g., blood tests, urinanalysis, tuberculosis titer, review of systems). In the area of psychology, screeners may take the form of structured and semistructured interviews/rating scales, the mental status exam, or various forms of brief psychometric tests. Instruments such as the SCL-90-R, Beck

Depression Inventory and Hopelessness Scale (BDI and BHI), the State-Trait Anxiety Inventory (STAD), and the Hamilton Rating Scale for Depression (HRSD) are probably among the more widely used psychological screening instruments. Some might classify lengthy, multiscale instruments such as the Minnesota Multiphasic Personality Inventory (MMPI) and its updated, revised version, the MMPI-2, as psychological screeners. However, the length of each of these instruments would preclude them for inclusion in a group of screeners, at least according to the criteria noted earlier. Although probably not commonly thought of in this way, the manner in which screening and psychological screeners are used can play an integral part of treatment planning. In fact, when used for problem identification and triage/disposition activities, their use becomes the first step in developing a treatment plan for the patient. Use of screening procedures, either routinely or selectively, can help determine the presence of a problem or disorder that otherwise might have gone unidentified, thus avoiding expenses that might be incurred by missing its identification. Screening also can assist in determining the urgency for treatment of the patient’s problems, whether there is a need for additional assessment and/or the most

appropriate therapeutic regimen (e.g., psychotherapy vs. psychotropic medication vs. a combination of both), all of which can have cost-saving implications. Thus, the clinician’s knowledge and skills in the use of screening instruments can serve to increase his or her value to organizations wishing to curb the costs of providing health care to their clientele.

PSYCHOLOGICAL TESTING FOR OUTCOME ASSESSMENT It appears imminent that all types of service delivery units will be required to demonstrate their effectiveness. The Joint Commission of Accreditation of Healthcare Organizations (JCAHO;

1991) sees the 1990s as a time when purchasers and users of health-care services

are expecting an objective report of the results of that care. Patterson (1990) related similar observations, whereas InterStudy (1991) reported that businesses and organizations purchasing health care for their employees are demanding “meaningful, measurable assurances of quality” from those contracted to supply health-care services. Linden and Wen (1990) reported that clinicians as well as researchers are required to demonstrate effectiveness of their psychotherapeutic treatments to affect needed legislation and acquire financing for clinical services, training, and research activities. However, despite that valid and reliable measures of treatment outcome are available, the

implementation of objective measurement of outcome in the mental health service arena has been slow to occur (Newman, 1991; Pallak & Cummings, in press; Psychotherapy Finances 1992). In a recent survey of psychological test users (Assessment Applications, 1991), the

vast majority of the respondents (customers of a large test publisher and distributor) reported that measuring patient outcomes was important or very important, whereas the percentage of respondents indicating they have an objective system for measuring patient progress or

14

MARUISH

treatment efficacy was quite low (29% and 27%, respectively). That only a little more than half of these practitioners are feeling much pressure to initiate an outcome program probably has a lot to do with this situation. Pallak and Cummings (in press) reported that cost rather than clinical issues have been the major reason for the relatively recent concerns over outcome issues. One may relate the issue of cost to the concerns of the three parties—the purchaser, the provider, and the patient— who have become sensitive to the rising costs in health-care delivery (Geigle & Jones, 1990). In Geigle and Jones’ (1990) report of a meeting of providers, employers, insurers, and others on the topic of outcomes measurement, the purchasers of health-care services were noted to view “outcomes-oriented quality measurement” as having potential for ensuring value for the money spent on their employees’ health-care benefits. They also were optimistic that quality of care will reduce costs associated with health insurance, disability and workers’ compensation, and other costs accrued from extended employee absences. This is consistent with a Psychotherapy Finances (1992) report that major employers are wanting to see graphic evidence that money spent on mental health care is being more efficiently used than it has been in past years. Also, Pallak and Cummings (in press) noted how purchasers of mental health services are looking increasingly at both financial and clinical efficacy variables when making their buying decisions. As for their group of providers (in this case, doctors and hospitals), Geigle and Jones (1990) reported outcomes measurement as being viewed as a means of both ensuring their highest level of performance and the best care for the patient. These providers also reported that the use of outcome measures augment their ability to compete for health-care contracts. Both points should be relevant to individual providers and organizations. Additionally, the latter point is one that others have drawn attention to before. For example, Psychotherapy Finances (1992) reported that those providing care via managed-care contracts are going to find it more difficult to secure the higher paying and more satisfying contracts unless they can demonstrate effectiveness. As noted in the article, Clearly, the ability to measure the clinical effectiveness of psychotherapy is emerging as an important marketing tool for both managed care companies and therapists. Some employee assistance programs, for example, are considering using outcomes data as a means of matching therapists with clients to increase the likelihood of good results. And some therapists see outcome data not just as a way to sell

themselves to managed care firms but also as a way to discover how changes in a therapy approach might affect clinical results and practice income. (pp. 1—2)

Patients’ obvious concerns about their health-care benefits are now becoming important to both the purchaser and the provider. As Wiggins (1992) recently noted, “The patient, not the clinician, is the ultimate judge of the quality and benefit of care. Managed care should focus on the patient’s view, not just the doctor’s professional opinion” (p. 3). Geigle and Jones (1990) noted that patients’ involvement, via formal and informal solicitation of their opinions and other types of data, is being sought out by health-care professionals to the point where they are becoming the central figures in the outcomes measurement movement. JCAHO (1991) has quite succinctly summarized its view of what it will take to survive in the future health-care market. As they indicated, “Organizations that are able to demonstrate that they provide care in an effective and efficient manner are likely to be the winners in this new environment [of accountability]” (Joint Commission on Accreditation of Healthcare Organizations,

1991, p. 1). Few would argue with this view, or that it also holds true for all

health-care organizations, including those providing mental health-care services, or that it applies to individual providers, not just organizations. As related to the focus of this book, the question then becomes: What type of outcome assessment information will be most

1

INTRODUCTION

15

sought by and/or useful to purchasers, providers, and patients of mental health services? The answer to this question may vary according to whom one is speaking, but some general trends are likely to emerge. Using a working definition of outcomes measure as being “any measurement system used to uncover or identify the health outcome of treatment for the patient” (p. 8), Geigle and Jones (1990) found that the ideal measure of quality (of treatment) is the patient’s quality of life as might be reflected in the lessening psychological distress or impairment or his or her return to work. Subscribers to Psychotherapy Finances (1992) have indicated that some employers and payors want to know if the patient has returned to work, whether outpatient treatment prevents more costly inpatient treatment, and/or did that patient feel more positive about him or herself and resume his or her regular activities. As with Geigle and Jones’ impressions, follow-up to determine effectiveness subsequent to treatment (e.g., 6-12 months later) also is recommended as a means to obtain important outcome information. Pallak and Cummings (in press) suggested that information provided by valid and reliable psychometric instruments is important and useful. They pointed to the realization within managed-care settings that such instruments exist as another impetus for the relatively recent interest in measuring outcomes within these types of settings. It is unclear what types of instruments they were referring to when they stated patient reactions, judgments, and perceptions of their status and well-being have been recognized as important indicators of clinical effectiveness. Patient responses to questionnaires provide a quantitative measure of outcome and improvement.

However, they do suggest that measures of abnormal personality are to be included here. What is clear is the relevance and importance of measures of client satisfaction. Pallak and Cummings (in press) noted that in general, patient questionnaires measuring patient’s perceptions of problems and judgments about progress provide a quantifiable index of patient reactions by which to gauge or measure improvement. . . . [Also,] perception of satisfaction with treatment and perception of treatment effectiveness in terms of dealing with problems on the part of the patient are critical indicators of progress.

Related to these comments are the observations of Cleary and McNeil (1988) about the increased interest in conducting research in this area. Similar to that indicated earlier, Cleary and McNeil noted how, in an environment of competition for patients, providers of care are attending more to the patients’ concerns about the type of care they receive. Researchers also are finding more government support for research pertaining to health services.

INTERSTUDY’S OUTCOME

MANAGEMENT SYSTEM

Newman (1991) pointed out and provided an example of how assessment data used for progress or outcome tracking can be related to variables such as treatment approach, costs, or reimbursement criteria, and thus can provide objective support for decisions regarding continuation of treatment, discharge, or referral to another type of treatment (e.g., from outpa-

tient to inpatient treatment). It is thinking such as this that prompted InterStudy to begin development of its Outcomes Management System (OMS). Its potential for affecting the manner in which health care may one day be provided merits particular attention here. Ellwood (1988) defined outcomes management as “. . . a technology of patient experience designed to help patients, payers, and providers make more rational medical carerelated choices based on better insight into the effect of these choices on the patient’s life”

16

MARUISH (p. 1551). The development of the OMS was undertaken as a collaborative effort by InterStudy, a Minnesota-based health-care think tank, and independent researchers throughout the United States to establish “a mechanism for systematically assessing, tracking, and analyzing health outcomes that are important to patients” (InterStudy, 1991, p. 1). From the outset, the goals of this system included the measurement of the patient’s functional status, particularly changes resulting from therapeutic interventions, over time; collection and maintenance of

data in a standard format to allow for comparison of outcomes across sites; and incorporation of methods for accounting for an organization’s effects on the patient’s health or quality of life. Aside from the development and maintenance of a national database and a membership of payor and provider organizations who will use this data to direct their policy and practice decisions, the OMS consists of two additional components (InterStudy, 1991). The first is the 36-item Health Status Questionnaire developed by Ware (cf. Ware & Sherbourne, 1991) as a shortened version of the instrument designed for use in the RAND Corporation’s Health Insurance Study (Brook, Ware, Davies-Avery, et al., 1979). The Short Form-36 Questions

(SF-36) contains items related to the patient’s functional status and quality of life and is constructed to be used with all patients receiving health-care services, including mental health services. A number of condition-specific instruments, referred to as “TyPE (Technology of Patient Experience) specifications,” also have been or are in the process of being designed to accompany the SF-36 and are to be administered before, during, and after the termination of

treatment (InterStudy, 1991). Relevant to the present discussion are those TyPEs developed for various mental disorders. Work has been completed, is currently being completed, or is planned by collaborative efforts undertaken by InterStudy and various organizations and institutions on TyPEs for disorders such as depression, alcohol abuse, drug abuse, schizophrenia, and panic disorder. It is hoped that information from the SF-36 and the TyPE specifications will one day find itself with other data (e.g., outcome-related demographics, treatment rendered, data on lifestyle) in an OMS

national database (InterStudy, 1991). Access to such data will benefit

the purchaser, the provider, and the recipient of the service. The purchaser will be able to make sound medical care service selections based on cost and quality. The provider will have more information on which to base decisions related to selection of therapies, case manage-

ment over the long term, and improvement in the functional status of the patient. Through their health-care provider, patients will be able to obtain more accurate information about what to expect following treatment. All in all, the OMS offers promise for today’s troubled health-care system from many perspectives. Only time will tell whether InterStudy and its affiliates will be successful in establishing such a powerful tool for health-care decisionmaking.

PSYCHOLOGICAL TESTING IN A CQI SYSTEM Implementing a regimen of psychological testing for the planning of treatment and/or assess-

ing its outcome has a place in all organizations where the delivery of cost-efficient, quality services has become a primary goal. However, additional benefits can accrue from testing when it is incorporated within an ongoing program of service evaluation and continuous quality improvement (CQI). Although its principles were espoused by Americans, the CQI philosophy initially was

1

INTRODUCTION

implemented by the Japanese in rebuilding their economy after World War II. Today, many U.S. organizations have sought to balance quality with cost by implementing CQI procedures. Simply put, CQI is a process of the continuous setting of goals, measuring progress toward the achievement of those goals, and subsequently reevaluating them in light of the progress that has (or has not) been made toward them. Underlying the CQI process are a few simple assumptions. First, those organizations that can produce high-quality products or services at the lowest possible cost have the best chance of surviving and prospering in today’s competitive market. It is less costly to prevent errors than to correct them, and the process of preventing errors is a continuous one. In addition, it is assumed that the workers within the organization are motivated and empowered to improve the quality of their product or service based on the information they receive about their work. More information about CQI can be found in several sources (e.g., Berwick, 1989; Dertouzos, Lester, & Solow, 1989; Donabedian, 1980, 1982, 1985; Johnson, 1989; Scherkenback, 1987; Shewhart, 1939; Walton, 1986).

The continuous setting of, measurement of progress toward, and reevaluation of goals characteristic of the CQI process is being employed by many health-care organizations as part of their efforts to survive in this competitive, changing market. At least in part, this move

also reflects what InterStudy (1991) has viewed as a “shifting from concerns

about

managing costs in isolation to a more comprehensive view that supplements an understanding of costs with an understanding of the quality and value of care delivered” (p. 1). InterStudy defined quality as a position or view that should lead all processes within a system. In the case of the health-care system, the most crucial of these processes is that of patient care. InterStudy pointed out that, with a CQI orientation, these processes need to be well defined, agreed on, implemented unvaryingly when delivering care, and provide measurable results that subsequently lead to conclusions about how the processes might be altered to improve the results of care. Simply stated, InterStudy saw CQI as implying “. . . a system that articulates the connections between inputs and outputs, between processes and outcomes . . . , a way of organizing information in order to discover what works, and what doesn’t” (1991, p. 1). In the mental health arena, as in other areas of health care, CQI is concerned with the

services delivered to customers. Here, the customer may include not only the patient being treated, but also the employer through whom the health-care plan is offered and the thirdparty payor who selects/approves the provider of the service in the form of coverage that is available through the health-care plan. Psychological testing can help the provider focus on delivering the most efficient and effective treatment to satisfy the needs of all his or her “customers,” thus contributing to the CQI effort. Perhaps the most apparent way in which testing can augment the CQI process is through its contributions in the area of outcomes assessment. Through repeated administrations of tests for all patients at intake, and later at one or more points in the treatment process, an organization can obtain a good sense of how effective the organization as a whole or individual clinicians or treatment program/units are in providing services to their clients. This testing might include problem-oriented measures as well as measures of patient satisfaction with services delivered. Considered in light of other nontest data, this may result in changes in service delivery goals such as the implementation of more effective problem identification and treatment planning procedures. For example, Newman’s (1991) graphic example of how data can provide support for treatment decisions can be extended to indicate how various levels of depression (as measured by the Beck Depression Inventory) may be served best by various types of treatment (e.g., inpatient vs. outpatient).

17

18

MARUISH

JCAHO’S AGENDA

FOR CHANGE

Before concluding this section, it is important to note JCAHO’s new focus—their Agenda for Change—borne out of the changes occurring in the 1980s (Joint Commission on Accreditation of Healthcare Organizations, 1991). From the increasing demands for greater efficiency in and more evidence for the effectiveness of service delivery came the realization that value must become the new goal in health care. This Agenda for Change represents a JCAHO initiative aimed at supporting the goal of value in health care by way of its expertise in the areas of standard setting, evaluation, decisionmaking, and education. Implied here is JCAHO’s recognition that its traditional focus on analyzing capabilities needs to be augmented by the tracking of the actual performance of its member organizations. JCAHO’s Agenda for Change is guided by four major concepts directly related to CQI: (a) In health-care organizations, the activities of all members—clinical, support, administra-

tive, and so on—affect patient outcomes; (b) CQI should be a priority goal in these organizations; (c) JCAHO

standards should focus on those activities that are essential to quality of

care in these organizations; and (d) JCAHO should gather and assess data that relate to compliance to standards as well as to actual performance as it pertains to CQI and provide feedback regarding same (Joint Commission on Accreditation of Healthcare Organizations, 1991). These concepts lead to specific objectives for JCAHO, not the least of which is a redirection of its standards. Included here is an increased emphasis on establishing and maintaining a CQI program within the organization, methodologic requirements of quality assessment and improvement activities, and management of information supporting these same activities. This emphasis on CQI within the Agenda for Change provides yet another opportunity for psychological testing to make a significant contribution within mental healthcare organizations as it relates to their efforts directed toward surviving with in a rapidly changing market.

A Final Word The United States currently is experiencing a crisis that is having a significant impact on both the accessibility of health care and the manner in which it is practiced. The negative impact on the social, economic, physical, and psychological well-being of its citizens is significant. On the up side, the problem is recognized, remedies have begun to be applied, and some degree of control already has begun to take place. The delivery of mental health services has not been spared from the relatively drastic measures that have been implemented to keep down the cost of health-care services. Consequently, the type of psychological services and the manner is which they are offered have undergone change. Some may view these changes as restrictive, not in the interest of the patient and his or her individual needs, and generally negative. Others see the benefit that can be derived from practitioners being forced to become more efficient in their methods and more accountable for the services that they provide. From this latter standpoint, the overall effect is one that has a positive impact on the patient, the clinician, and the state of the healthcare delivery system in this country. As was suggested previously, the psychologist’s training in psychological testing should

provide him or her with an edge in surviving in the evolving mental health service delivery market. Maximizing his or her ability to use the “tools of the trade” to facilitate problem identification, subsequent planning of appropriate treatment, and measuring and document-

1

INTRODUCTION

ue)

ing the effectiveness of his or her efforts can only prove to be an aid in the clinician’s striving for optimal efficiency in service delivery. It is hoped that the information and guidance provided by the many distinguished contributors to this edited volume will assist practicing psychologists and psychologists-in-training to maximize the resources available to them and thus in prospering in the emerging new health-care arena. This is a time of anxiety and uncertainty. It is also a time of great opportunity. How one chooses to face the current state of affairs is a matter of personal and professional choice.

References Anderson, M. D., & Fox, P. (1987, Spring). Lessons learned from Medicaid managed care approaches. Health Affairs,-72—86. Appelbaum, S. A. (1990). The relationship between assessment and psychotherapy. Journal of Personality Assessment, 54, 791-801. Assessment Applications. (1991, Summer). Readership survey results: Outcomes. p. 6. Berman, K. (1987). Health insurance rates keep climbing. Business Insurance, 21, 1, 34.

Berwick, D. M. (1989). Sounding board: Continuous improvement as an ideal in health care. New England Journal of Medicine, 320, 53-56. Brook, R. H., Ware, J. E., Davies-Avery, A., et al. (1979). Overview of adult health status measures fielded in RAND’s health insurance study. Medical Care, 19, 787. Broskowski, A. (1991). Current mental health care environments: Why managed care is necessary. Professional Psychology: Research and Practice, 22, 6-14. Butcher, J. N.

(1990).

The MMPI-2

in psycho-

logical treatment. New York: Oxford University Press. Califano, J. (1986).

America’s health care revo-

lution. New York: Random House. Choca, J. P., Shanley, L. A., & Van Denberg, E. (1992). Interpretative guide to the Millon Clinical Multiaxial Inventory. Washington, DC: American Psychological Association. Ciarlo, J. A., Brown, T. R., Edwards, D. W., Kiresuk, T. J., & Newman, F. L. (1986) Assessing mental health treatment outcome measurement techniques. DHHS Pub. No. (ADM) 86-1301. Washington, DC: U.S. Government Printing Office. Cleary, P. D., & McNeil, B. J. (1988). Patient satisfaction as an indicator of quality care. Jn-

quiry, 25, 25-36.

Cummings, N. A. (1992, August). Future practice patterns: Independent practice and managed mental health care. Paper presented at the meeting of the American Psychological Association, Washington, DC. Dertouzos,

M.

L.,

Lester,

R.

K.,

&

Solow,

R. M. (1989). Made in America: Regaining the productive edge. Cambridge, MA: MIT Press. Donabedian, A. (1980). Explorations in quality assessment and monitoring: The definition of quality and approaches to its assessment (Vol. I). Ann Arbor, MI: Health Administration Press. Donabedian, A. (1982). Explorations in quality assessment and monitoring: The criteria and standards of quality (Vol. I). Ann Arbor, MI: Health Administration Press. Donabedian, A. (1985). Explorations in quality assessment and monitoring: The methods and findings in quality assessment: An illustrated analysis (Vol. II). Ann Arbor, MI: Health Administration Press. Ellwood, P. M. (1988, May). Outcomes management: A technology of patient experience. Paper presented at the meeting of the Massachusetts Medical Society. Presented as a special report in the New England Journal of Medicine, 318, 1549-1556. Foster-Higgins. (1989). Health care benefits survey. 1988. Princeton, NJ: Author. Geigle, R., & Jones, S. B. (1990). Outcomes measurement: A report from the front. Inquiry, 27, 7-23. Haas, L. J., & Cummings, N. A. (1991). Managed outpatient mental health plans: Clinical, ethical, and practical guidelines for participation. Professional Psychology, 22, 45-51. InterStudy (199la, May). An introduction to InterStudy’s Outcome Management System.

20

MARUISH Handout presented at the sixth meeting forthe Outcomes Management, Minneapolis, MN. InterStudy (1991b). Preface. The InterStudy Quality Edge, 1, 1-3. Johnson, P. L. (1989). Keeping score: Strategies and tactics for winning the quality war. New York: Harper & Row. Joint Commission on Accreditation of Healthcare Organizations (1991). The Joint Commission’s Agenda for Change: Indicator development and testing. Oakbrook Terrace, IL: Author. Kiesler, C. A. (1980). Mental health policy as a field of inquiry for psychology. American Psychologist, 35, 1066-1080. Kiesler, C. A., & Morton, T. L. (1988). Psychology and public policy in “health care revolution.” American Psychologist, 43, 9931003. Korchin, S. J. (1976). Modern clinical psychology. New York: Basic Books. Korchin, S. J., & Schuldberg, D. (1981). The future of clinical assessment. American PsyB. L., Glasser,

J. H., & Jaffee,

chologist, 41, 9.

Richardson, L. M., & Austad, C. S. (1991). Realities of mental health practice in managedcare settings. Professional Psychology: Research and Practice, 22, 52-59. Rodriguez,A.R. (1985).

chologist, 36, 1147-1158. Levin,

sement criteria. Assessment Applications, 4-5. Pallak, M. S., & Cummings, N. A. (in press). Outcomes research in managed mental health care: Issues, strategies and trends. In S. A. Shueman, S. L. Mayhew, & B. S. Gould (Eds.), Managed behavioral health care: A search for precision. Springfield, IL: Charles C. Thomas. Patterson, D. Y. (1990). Managed Care: An approach to rational psychiatric treatment. Hospital and Community Psychiatry, 41, 1092— 1095. Psychotherapy Finances. (1992). Managed care: Will your income be tied to proving your effectiveness? 18, 1-3. Resnick, R. J. (1992, November). National health is coming: An update. Minnesota Psy-

C. L.

(1988). National trends in coverage and utilization of mental health, alcohol, and substance abuse services within managed health care systems. American Journal of Public Health, 78, 1222-1223. Linden, W., & Wen, F. K. (1990). Therapy outcome research, health care policy, and the con-

tinuing lack of accumulated knowledge. Professional Psychology: Research and Practice, 21, 4882-488. Maruish, M. (1990, Fall). Psychological assessment: What will its role be in the future? Assessment Applications, 5. Maruish, M. (1991, Summer). Continuous quality improvement and mental health service delivery. Assessment Applications, 7-8. Mechanic, D. (1980). Mental health and social policy (rev. ed.). Englewood, Cliffs, NJ: Prentice-Hall. Megargee, E. I., & Spielberger, C. D. (1992). Reflections on fifty years of personality assessment and future directions for the field. In E. I. Megargee & C. D. Spielberger (Eds.), Personality assessment in America (pp. 170— 190). Hillsdale, NJ: Lawrence Erlbaum Associates. Newman, F. L. (1991, Summer). Using assessment data to relate patient progress to reimbur-

Current and future direc-

tions in reimbursement for psychiatric services. General Hospital Psychiatry,

7, 341-348.

Scherkenback, W. W. (1987). The Deming route to quality and productivity: Road maps and roadblocks. Rockville, MD: Mercury Press/ Fairchild Publications. Shewhart, W. A. (1939). Statistical methods from the viewpoint of quality control. Washington, DC: U.S. Department of Agriculture Graduate School. Shulman, J. (1989, August). Managed mental health care: Positioning yourself for the future. Paper presented at the meeting of the American Psychological Association, New Orleans, LA.

Spielberger, C. D. (1992, Spring). New Horizons for personality assessment. SPA Exchange, pp. 6-7. Uyeda,

M.

K., & Moldawsky,

S. (1986).

Pro-

spective payment and psychological services. American Psychologist, 41, 60-63. Walton, M. (1986). The Deming management method. New York: Dodd, Mead & Company. Ware, J. E., & Sherbourne, C. D. (1991). The SF-36 short form health status survey: I. Conceptual framework and item selection. Boston, MA: New England Medical Center Hospital, International Resource Center for Health Care Assessment.

1 Watkins, C. E. (1991). What have surveys taught us about the teaching and practice of psychological assessment? Journal of Personality Assessment, 56, 426-437. Weiner, I. B. (1992, Summer). Current developments in psychodiagnosis. The Independent

Practitioner, 12, 114-119. Wiggins,

J. G. (1992,

May).

Practice

guide-

INTRODUCTION

lines: Fundamentally wrong. APA Monitor, jh ek Winegar, N. (1992). The clinician’s guide to managed mental health care. New York: Haworth Press. Zimet,C.N. (1989). The mental health care revolution: Will psychology survive?. American Psychologist, 44, 703-708.

21

Chapter 2 Psychological Tests in Screening for Psychiatric Disorcler Lynn DellaPietra Hahnemann

University

Screening typically is not considered to be integral to traditional treatment planning or outcome assessment paradigms, and scant mention is made of it in contemporary authoritative documents on these topics. In addition, many clinical psychologists complete their graduate educations and never are exposed to formal screening models or concepts. This state of affairs is curiously inconsistent with the realities of contemporary mental health care in terms of a number of important issues. First, from a historical perspective, psychology represents one of the pioneering disciplines in the development of screening models and selection algorithms. Second, as mentioned below, one of the fundamental tenets of screening concerns the idea that early detection can improve substantially the probability of delivering effective treatment.

Third, at a time when run-away health-care costs have

become one of the major problems facing our society, we should be seeking to identify rather than ignore cost-effective methodologies that have the potential to significantly reduce costs. Finally, borrowing from the old baseball adage, “You can’t hit what you can’t see,” we must respond constructively to the reality that the large majority of individuals with psychiatric disorders in our society (with estimates as high as 75%) are never seen by a mental health professional. The major proportion of cases seen in our system are attended by primary care physicians. These data compel us to realize that, although specific techniques of treatment planning

and outcome assessment may be valid and reliable, they are focused on the “tip of the iceberg.” They are essentially irrelevant for a large majority of individuals who could derive benefit from their application, because these individuals infrequently are recognized and

rarely enter treatment with mental health professionals. This being the case, the routine screening for mental disorders of major cohorts at risk (e.g, college students, medical patients, elderly) has the potential not only to identify a greater proportion of individuals with psychological disorders, but to do so at an earlier stage of their illnesses, thereby significantly reducing cumulative morbidity.

Ze

Overview of Screening THE CONCEPT OF SCREENING Screening has been defined traditionally as, “. . . the presumptive identification of unrecognized disease or defect by the application of tests, examinations or other procedures which can be applied rapidly to sort out apparently well persons who probably have a disease from those who probably do not” (Commission on Chronic Illness, 1987, p. 45).

Screening is an

operation conducted in an ostensibly well population to identify occult instances of the disease or disorder in question. Some authorities make a distinction between screening and case finding, which is specified as the ascertainment of disease in populations composed of patients with other disorders. Using such a distinction, the detection of psychiatric disorders among medical patients, for example, would more precisely fit the criteria for case finding than screening. In practice, there appears to be little difference between the two processes. For this reason, we have chosen to use the term screening for both operations. Regardless of its specific manifestation, the screening process represents a relatively unrefined sieve that is designed to segregate the cohort under assessment into “positives” who presumptively have the condition, and “negatives” who are ostensively free of the disorder. Screening is not a diagnostic procedure per se. Rather, it represents a preliminary filtering operation that identifies those individuals with the highest probability of having the disorder in question for subsequent specific diagnostic evaluation. Individuals found negative by the screening process typically are not evaluated further. The conceptual underpinning for screening rests on the premise that the early detection of unrecognized disease in apparently healthy individuals carries with it a measurable advantage in achieving effective treatment and/or cure of the condition. Although logical, this assumption should be verified empirically for each specific disorder, because it is not always valid. In certain conditions, early detection does not measurably improve our capacity to alter morbidity or mortality, either because diagnostic procedures are unreliable or effective treatments for the condition are not yet available. In an attempt to facilitate a better appreciation of the particular health problems that lend themselves to effective screening systems, the World Health Organization (WHO) published guidelines for effective health screening programs (Wilson & Jungner, 1968). A version of these criteria is listed next. 1. The condition should represent an important health problem that carries with it notable morbidity and mortality. 2. Screening programs must be cost-effective, that is, the incidence/significance of the disorder must be sufficient to justify the costs of screening. 3. Effective methods of treatment must be available for the disorder. 4. The test(s) for the disorder should be reliable and valid so that detection errors (i.e., false positives or false negatives) are minimized. 5. The test(s) should have high cost—benefit, that is, the time, effort, and personal inconvenience to the patient associated with taking the test should be substantially outweighed by its potential benefits. 6. The condition should be characterized by an asymptomatic or benign period, during which detection will significantly reduce morbidity and/or mortality. 7. Treatment administered during the asymptomatic phase should demonstrate significantly greater efficacy than that dispensed during the symptomatic phase.

Some experts challenge the idea that psychiatric disorders, and the screening systems designed to detect them, conclusively meet all of these criteria. For example, it is arguable

23

24

DEROGATIS AND DELLAPIETRA

whether our treatments are truly effective for certain psychiatric conditions (e.g., schizophrenia), and we have not demonstrated definitively for some conditions that treatments initiated during asymptomatic phases (e.g., “maintenance” antidepressant treatment) are more efficacious than treatment initiated during acute episodes of manifest symptoms. Nevertheless, it generally is understood that psychiatric conditions and the screening paradigms designed to identify them do meet the WHO criteria in most instances, and that the consistent implementation of such systems has the potential to improve substantially the quality and cost-efficiency of our health care. The impetus for the development and routine implementation of effective psychiatric screening systems for medical and community cohorts arises not only from the increases in morbidity and mortality associated with undetected psychiatric disorders (Hawton, 1981; Kamerow, Pincus, MacDonald,

1986; Regier et al., 1988), but also from several additional

factors. First, it is currently well established that between two thirds and three quarters of individuals with psychiatric disorders either go completely untreated or are treated by nonpsychiatric physicians; these individuals never are seen by a mental health-care professional (Dohrenwend & Dohrenwend, 1982; Regier, Goldberg, & Taube, 1978; Weissman, Myers, & Thompson, 1981; Yopenic, Clark, & Aneshensel, 1983). Second, although there is a

significant correlation or comorbidity between physical illness and psychiatric disorder (Barrett, Barrett, Oxman,

& Gerber,

1988; Fulop & Strain, 1991; Rosenthal et al., 1991), the

detection by primary care physicians of the most prevalent psychiatric disorders (i.e., anxiety and depressive disorders) is routinely poor (Linn & Yager, 1984; Nielson & Williams, 1980). It is not unusual to find recognition rates in medical cohorts frequently falling below 50%. In addition, research consistently has demonstrated higher prevalence rates for psychiatric disorders among the medically ill (Wells, Golding, & Burnam,

1988), and confirmed that

high health-care utilizers have elevated levels of psychological distress and psychiatric diagnoses (Katon et al., 1990).

Effective psychiatric screening programs designed with current methods not only would significantly reduce psychiatric and medical morbidity, but almost certainly would have a beneficial impact on health-care costs. The number of unnecessary diagnostic tests would be reduced, lengths of stay would be diminished, and the demand for health care among the groups with the highest utilization rates would be decreased. More importantly, those individuals in the community whose disorders currently go undetected would be identified early and treated before the pervasive morbidity associated with chronicity sets in.

THE EPIDEMIOLOGIC SCREENING MODEL Because most psychologists are not specifically familiar with screening paradigms, we briefly review the basic epidemiologic screening model. Essentially, a cohort of individuals who are apparently well, or in the instance of case finding present with a condition distinct from the index disorder, are evaluated by a test to determine if they are at high risk for a

particular (index) disorder or disease. As outlined earlier, the disorder must have sufficient incidence or consequence to be considered a serious public health problem, and be characterized by a distinct early or asymptomatic phase during which detection will improve the results of treatment substantially. The screening test (e.g., pap smear, Western blot) should be both reliable (i.e., consistent in its performance from one administration to the next) and valid (i.e., be capable of identifying those with the index disorder and eliminating individuals who do not have the condition). In psychometric terms, this form of validity traditionally has been referred to as

2

SCREENING FOR PSYCHIATRIC DISORDER

25

TABLE 2.1 Epidemiologic Screening Model

Actual

Screening Test Test positive Test negative

Cases

Noncases

a Cc

b d

Note. Sensitivity (Se) = a/(a + c); false negative rate (1-Se) = c/a + c); specificity (Sp) = db + d); false positive rate (1-Sp) = b/(b + d); positive predictive value (PPV) = a/(a + b); negative predictive value

(NPV) = dXc + d).

“predictive” validity. In epidemiologic models, the predictive validity of the test is apportioned into two distinct partitions: the degree to which the test correctly identifies those individuals who actually have the disorder, termed its sensitivity; and the extent to which

those free of the condition are correctly identified as such, its specificity. Correctly identified individuals with the index disorder are referred to as true positives, whereas those accurately identified as being free of the disorder are termed true negatives. Misidentifications of healthy individuals as affected are labeled false positives, and affected individuals missed by the test are referred to as false negatives. The basic fourfold epidemiologic table, as well as the algebraic definitions of each of these validity indices are given in Table 2.1. Sensitivity and specificity are a screening test’s most fundamental validity indices; however, other parameters can markedly affect a test’s performance. In particular, the prevalence or base rate of the disorder in the population under evaluation can have a powerful effect on the results of screening. Two other indicators of test performance, predictive value of a positive and predictive value of a negative, reflect the interactive effects of test validity and prevalence. These indices also are defined in Table 2.1, although their detailed discussion is

postponed until a later section.

Screening Tests for Psychiatric Disorders A HISTORY OF SCREENING MEASURES The predecessors of modern psychological screening instruments date back to the late 19th and early 20th centuries. Sir Francis Galton (1883) created the prototype psychological

questionnaire as part of an exposition for the World Fair. The first self-report symptom

inventory, the Personal Data Sheet, was developed by Robert Woodworth (1918) as part of the effort to screen American soldiers entering World War I for psychiatric disorders. At

approximately the same time, psychiatrist Adolph Meyer constructed the first psychiatric

rating scale, the Phipps Behavior Chart, in 1914 (Kempf, 1914). Since the initial efforts of

these distinguished innovators, hundreds of comparable instruments have been produced and published. In the current chapter, we briefly review seven such instruments, five self-report inventories and two clinical rating scales, to familiarize the reader with the array of screening

measures available. Obviously, this essay is not the appropriate place for a comprehensive

26

DEROGATIS AND DELLAPIETRA review of psychological screening tests. Rather, our goal is to familiarize the reader with a number of instruments that are exemplary of their class.

GENERAL PSYCHOMETRIC

PRINCIPLES

Fundamental to a realistic appreciation of the psychometric basis for psychiatric screening is the realization that we are first and foremost involved in psychological measurement. Psychologists are schooled rigorously in the awareness that the principles underlying psychological assessment are no different from those that govern any other form of scientific measurement. However, a major distinction that characterizes psychological measurement resides in the object of measurement: it is usually a hypothetical construct. By contrast, measurement in the physical sciences usually concerns tangible entities, which are measured via ratio scales with true zeros and equal intervals and ratios throughout the scale continuum (e.g., weight, distance, velocity). In quantifying hypothetical constructs (e.g., anxiety, depression, impulsivity), measurement occurs on ordinal-approaching-interval scales, which of necessity are more primitive and have substantially larger errors of measurement (Luce & Narens, 1987). Psychological measurement is no less scientific; however, it is less precise.

Reliability. All scientific measurement is based on consistency or replicability; reliability concerns the degree of replicability inherent in measurement. To what extent would a symptom inventory provide the same results upon re-administration? To what degree do two clinicians agree on a psychiatric rating scale? Conceived differently, reliability can be thought of as the converse of measurement error. It represents that proportion of variation in measurement that is due to true variation in the attribute under study, as opposed to random or systematic error variance. Reliability can be conceptualized as the ratio of true score variation to the total measurement variance. It specifies the precision of measurement and thereby sets the theoretical limit of measurement validity. Validity. Just as reliability indicates the consistency of measurement, validity reflects the essence of measurement: the degree to which an instrument measures what it is designed to measure. It specifies how well an instrument measures a given attribute or characteristics of interest. Establishing the validity of a screening instrument is more complex and programmatic than determining its reliability, and rests on more elaborate theory. Although the validation process involves many types of validity experiments, the most explicitly applicable to the screening process is predictive validity. Essentially, the predictive validity of an assessment device hinges on its degree of correlation with an external reference criterion—some sort of gold standard. In the case of screening tests, the external criterion usually takes the form of a comprehensive laboratory and/or clinical diagnostic evaluation that definitively establishes the presence or absence of the index condition. Critical to a genuine appraisal of predictive validity is the realization that it is highly specific in nature. To say that a particular screening test is valid has little or no scientific meaning; tests are valid only for specific screening purposes. We make an explicit note of validation specificity here only because there appears to be some confusion on this issue relative to psychological tests. Psychological tests employed in screening for psychiatric disorder(s) must be validated specifically in terms of the diagnostic assignments they are designed to predict. A specific unidimensional test (e.g., a depression scale) should be validated in terms of its ability to accurately predict clinical depressions; it should be of little value in screening for other psychiatric disorders except by virtue of the high comorbidity of

depression with numerous other conditions (Maser & Cloninger, 1990) and the pervasive nature of depressive symptoms in many medical illnesses.

2

SCREENING

FOR PSYCHIATRIC DISORDER

Generalizability. Like reliability and validity, generalizability is a fundamental psychometric characteristic of test instruments used in psychiatric screening paradigms. Many clinical conditions and manifestations are altered systematically as a function of parameters such as age, gender, race, and the presence or absence of a comorbid medical illness. When

validation coefficients (i.e., sensitivity and specificity) for a particular test are established relative to a specific diagnostic condition, they may vary considerably if the demographic and health parameters of the cohort on which they were established are altered significantly. To cite examples, it is well established that men are more constrained than women in reporting emotional distress, so much so that well-constructed tests measuring symptomatic distress report distinct sets of norms for the two genders (Nunnally, 1978). Another illustration resides in the alteration of the phenomenologic characteristics of depression across age. Depression in the young tends toward less dramatic affective display, and progresses through the classic clinical delineations of young and middle adult years, to the geriatric depressions of the elderly, which are more likely to be characterized by dementia-like cognitive dysfunctions. Any single test is unlikely to perform with the same degree of validity across shifts in relevant parameters. Therefore, generalizability must be established empirically and cannot be assumed merely from research on populations different from the cohort under study. Self-Report Versus Clinical Judgment. Although advocates and adherents argue the differential merits of self-report versus clinician ratings, a great deal of evidence suggests that the two techniques have strengths and weaknesses of roughly the same magnitude. Neither approach can be said to function generally more effectively than the other in screening for psychiatric disorder. Each screening situation must be assessed and evaluated separately, and the parameters of each must be weighed objectively to determine which type of instrument is best suited for the task at hand. Traditionally, self-report inventories have been used more frequently as screening tests than clinical rating scales. This is probably because the self-report modality of measurement has much to recommend it to the task of screening. Self-report measures tend to be brief, inexpensive, and are tolerated well by the individuals being screened. These features lend the important attributes of cost-efficiency and cost-benefit to self-report. Self-report scales are transportable; they may be used in a variety of settings and they minimize professional time and effort, because their administration, scoring, and evaluation require little or no professional input. Recently, such tests have been adapted for use on personal computers. Interactive computerized testing enables test administration, scoring, evaluation, and storage of results entirely by computer, reducing both professional and technical support time. Finally, perhaps the greatest advantage of self-report resides in the fact that the test is being completed by the only person experiencing the phenomena—the respondent. A clinician, no matter how skilled or well trained, can never know the actual experience of the respondent;

rather, he or she must be satisfied with an apparent or deduced rendition of the phenomena. This last feature of self-report tests also can represent their greatest potential source of error (i.e., patient bias in reporting). Because the test respondent is providing the test data, there exists the opportunity to consciously or unconsciously distort the responses given.

Although patient bias does represent a potential difficulty for self-report, empirical studies have indicated that such distortions represent a problem only in situations where there is obvious personal gain associated with response distortions. Otherwise, this problem usually does not represent a major source of bias (Derogatis, Lipman, Rickels, Uhlenhuth, & Covi, 1974a). There is also the possibility that response sets such as acquiescence or attempts at impression management may result in systematic response distortions, but such effects tend to add little error variance in most realistic clinical screening situations. Probably the greatest limitation of self-report arises from the inflexibility of the format; a

27

28

DEROGATIS AND DELLAPIETRA

line of questioning cannot be altered or modified depending on how the individual responds to previous questions. In addition, only denotative responses can be appreciated; facial expressions, tones of voice, attitudes and postures, and cognitive/emotional status of the respondent are not an integral part of the test data. This inflexibility extends to the fact that the respondent also must be literate to read the questions. The psychiatric rating scale or interview is a viable alternative to self-report instruments in designing a screening paradigm. The clinical rating scale introduces professional judgment into the screening process and is inherently more flexible than self-report. The clinician has both the expertise and freedom to delve, in more detail, into any area of history, thought, or behavior that will deliver relevant information on the respondent’s mental status. The clinician also carries the capacity to clarify ambiguous answers and probe areas of apparent contradiction. In addition, because of his or her sophistication in psychopathology and human behavior, there is the theoretical possibility that more complex and sophisticated instrument design may be utilized in developing psychiatric rating scales. On the negative side, just as self-report is subject to patient bias, clinical rating scales are subject to equally powerful interviewer biases. Training sessions and videotaped interviews are utilized to reduce systematic errors of this type. However, interviewer bias never can be eliminated completely. Furthermore, the fact that a professional clinician is required to make the ratings significantly increases the costs of screening. Lay interviewers have been trained to do such evaluations in some instances, but they rarely are as skilled as professionals, and the costs of their training and participation must be weighed into the equation as well. Finally, the more flexibility is built into the interview, the more time it is likely to take for the clinician to complete the ratings. At some point on this continuum, the test will no longer resemble a screening test, but will begin to take on the characteristics of a comprehensive diagnostic interview. Both modalities of measurement are designed to quantify the respondent’s status in such a way as to facilitate a valid evaluation of his or her caseness. Both approaches lend themselves to actuarial quantitative methods,

which

allow for a normative

framework

to be

established within which to evaluate individuals. Most importantly, both approaches “work”; it depends on the nature of the screening task, the resources at hand, and the experience of the investigators doing the study to determine which method will work best in any particular situatiou.

SCREENING TESTS In this section, we have provided a brief synopsis of seven popular psychological tests and rating scales that frequently are employed as psychiatric screening instruments. Our assessment is not intended to be a comprehensive review, but rather to touch on the nature of each

measure and provide some information about its psychometric characteristics and background. In the case of commercially available tests (e.g., SCL-90-R,! BSI,2 GHQ), detailed discussions and comprehensive psychometric data are available from their published administration manuals. Scholarly reviews provide analogous information in the cases of the others (CES-D, BDI, SDS, HAS, HRDS). Five of the screening tests are self-report, whereas the remaining two are clinician rated. Table 2.2 provides a brief summary of their characteristics. ~

1SCL-90-R® is a registered trademark of Leonard R. Derogatis, Ph.D. 2BSI'™ is a trademark of Leonard R. Derogatis, Ph.D.

29

“ajON | =

10S Y-06 IS@ DHD Q-S39 1ag sds SVH SG¥YH

ajeqvioyiny uondiussaq

jesipayy pue

aw

Arewudaed

‘syusnedyno/sjuctyedut p = [eorpeur ‘syuanjed ¢ =

“WIPHYNW O?-SIull “WIPHINUW GL-OLUI Sway “WIPHINWG}-G UI “wIplun QO} ull “wIplun OL-G ull “WIPIUN QO} ull ‘wIpiq OZ UILU ‘wiplun +0€ UI

UOLULUOD BSA, UYIAA

A1aVL ee

OG SUay EG SUay ‘09 ‘OE ZI} OZ Sway! |Z Sway! OZ SWB Bugeyp, swey Bugey1g sway

apoyw

AyruMUIUIOD ‘syuaosajope ¢ =

speBoseq (S261) 49S (SZ6L)snebo1eq HS Ssaqpjo5(2261) 9S HOIPEY (L261) 19S 499g (L961) 12S 6unz (S961) H°S voywey(6S61) “ull Uoy!WweY(O961) “ull

AyruNUMIOD ‘s}[npe Z =

JUaWNysu]

oUyeIYOASY BuUlUaaloS ssa) U!

‘AjJepja 9 =

‘UsIP[TYD / =

ecuTey SaltET EZ Sv VEat 7% v'€ cuit Gey. | acuca Oe ‘ck Sipe 2

ada][09

“sJUSplys

L6/EL O6/eL° -SZ'/0'L-69° 26 06-197/L6"-€8" 08°-79'/26'-9L" 28°-087/¢8"-8S" vE/LE OG,

/Ayayisuag uogeoyjddy Ayoyivads

suoneindod

30

DEROGATIS AND DELLAPIETRA SCL-90-RR/BSI'™,

The SCL-90-R (Derogatis, 1977, 1983) is a 90-item, multidimensio-

nal, self-report symptom inventory derived from the Hopkins Symptom Checklist (Derogatis, Lipman, Rickles, Uhlenhuth,

& Covi,

1974b) and first published in 1975. The inventory

measures symptomatic distress in terms of nine primary dimensions and three global indices of distress. The dimensions include somatization, obsessive-compulsive, interpersonal sensi-

tivity, depression, anxiety, hostility, phobic anxiety, paranoid ideation, and psychoticism. Several matching clinical rating scales (e.g., the Hopkins Psychiatric Rating Scale and the SCL-90 Analogue

Scale), which

measure

the same

nine dimensions,

are also available.

SCL-90-R norms have been developed for adult community nonpatients, psychiatric outpatients, psychiatric inpatients, and adolescent nonpatients. Geriatric and other specialized norms for the test are currently under development. The Brief Symptom Inventory (BSI) (Derogatis, 1992; Derogatis & Melisaratos, 1983; Derogatis & Spencer, 1982) is the brief form of the SCL-90-R. The BSI measures the same nine symptom dimensions and three global indices using only 53 items. Dimension scores on the BSI correlate highly with comparable SCL-90-R scores (Derogatis & Spencer, 1982), and the brief form shares most psychometric characteristics of the longer scale. Both the SCL-90-R and the BSI have been used as outcome measures in an extensive array of research studies, among them a number of investigations focusing specifically on screening. A recent published bibliography for the SCL-90-R (Derogatis, 1990) lists almost 500 research reports in which the test has been utilized. Although not utilized as extensively as its parent instrument, the BSI also has been shown to be sensitive to psychological distress and psychiatric disorder in a number

of research contexts

(Cochran

& Hale,

1985; O’Hara,

Ghonheim, Heinrich, Metha, & Wright, 1989). Both the SCL-90-R and the BSI have been

translated into 26 languages. General Health Questionnaire (GHQ). The GHQ originally was developed as a 60-item, multidimensional, self-report symptom inventory by Goldberg (1972). Subsequent to its publication (Goldberg & Hillier, 1979), four subscales were factor-analytically derived: somatic symptoms, anxiety and insomnia, social dysfunction, and severe depression. The GHQ is one of the most widely used screening tests for psychiatric disorder internationally, its popularity arising in part from the fact that several brief forms of the test are available. The GHQ30 and GHQ12 represent brief forms of the original GHQ, and also follow the basic four subscale format of the longer parent scale, but avoid including physical symptoms as indicators of distress (Malt, 1989). The GHQ has been validated for use in screening and

outcome assessment in numerous populations, including the traumatically injured, cancer patients, geriatric populations, and many community samples (Goldberg & Williams, 1988).

CES-D. The Center for Epidemiological Studies Depression Scale (CES-D) was developed by Radloff (1977). It is a brief, unidimensional, self-report depression scale composed of 20 items that assess the respondent’s perceived mood and level of functioning within the past 7 days. Four fundamental dimensions—depressed affect, positive affect, somatic problems, and interpersonal problems—have been identified as basic to the CES-D. The CES-D has been used effectively as a screening test with a number of community samples (Comstock & Helsing, 1976; Frerichs, Areshensel, & Clark, 1981; Radloff & Locke, 1985), as well as medical (Parikh, Eden, Price, & Robinson,

1988) and clinic populations

(Roberts, Rhoades, & Vernon, 1990). Recently, Shrout and Yager (1989) demonstrated that the CES-D could be shortened to five items and still maintain adequate sensitivity and

specificity, as long as prediction was limited to traditional two-class categorizations. Beck Depression Inventory (BDI).

The BDI is a unidimensional, self-report depression

measure that employs 21 items to measure depression and was developed by Beck and his A,

2

SCREENING FOR PSYCHIATRIC DISORDER

colleagues (Beck, Ward, & Mendelson) in 1961. Each of the items represents a characteristic

symptom of depression (e.g., pessimism, self-contempt) on which the respondent is to rate him or herself on a 4-point scale. These scores are then summed to yield a total depression score. Beck’s justification for this system is that the frequency of depressive symptoms is distributed along a continuum from nondepressed to severely depressed. In addition, number of symptoms is viewed as correlating with intensity of distress and severity of depression. A short (13-item) version of the BDI was introduced in 1972 (Beck & Beck, 1972), with additional psychometric evaluation accomplished subsequently (Reynolds & Gould, 1981). The BDI has been used as a screening device with renal dialysis patients, as well as with medical inpatients and outpatients (Craven, Rodin, & Littlefield, 1988). Recently, Whitaker et al. (1990) used the BDI with 5,108 community adolescents; they noted that it performed with moderate validity in screening for major depression in this previously undiagnosed population. Self-Rating Depression Scale (SDS). The SDS was published by Zung in 1965 (Zung, 1965). It is a 20-item, self-report, unidimensional measure of depression. Each item is scaled via four response choices, ranging from “none or a little of the time” to “most or all of the time.” The items represent symptoms of depression and provide a quantitative measure of duration of depressive symptomatology. The SDS has been used extensively with geriatric populations, and has proved valid in applications with elderly depressed outpatients, elderly medical patients, and elderly community residents (Zung & Zung, 1986). Sensitivity and specificity for the SDS are particularly impressive with this group, in light of the added complexity of diagnosing depression in the elderly. As is the case with the SCL-90-R/BSI, there is a matching clinical rating scale based on the SDS, the Depression Status Inventory (DSI) (Zung, 1972). The DSI is assessed along the

same value range as the SDS, and the two measures have been reported to correlate .87 (Zung, 1974). In addition, a 10-question Short Zung Interviewer-Assisted Depression Rating Scale (Short Zung IDS), which is administered by a clinician, has been described by Tucker, Ogle, Davison,

and Eilenberg (1987). This instrument

is reported to be valuable

in the

assessment of elderly patients who may not be able to complete the standard self-report SDS on their own. Hamilton Anxiety Scale (HAS).

The HAS is a 14-item clinician rating scale published

by Hamilton in 1959. Each item represents a clinical feature of anxiety, requiring the clinician to rate the client on a 5-point scale from (0) “not present” to (5) “very severe.” The

items reflect both somatic (e.g., cardiovascular, respiratory, gastrointestinal, and genitourinary) and psychic/cognitive (e.g., memory and concentration impairment) manifestations of anxiety. The HAS is designed to yield two separate subscores for “psychic anxiety” and “somatic anxiety.” The HAS has been used with children as well as adults (Kane & Kendall, 1989), coronary artery bypass patients (Erikkson, 1988), general medical/surgical patients (Bech, Grosby, & Husum, 1984), psychiatric outpatients (Riskind, Beck, Brown, & Steer, 1987), and many other groups. In addition to these applications, the HAS has become accepted as a standard outcome measure in clinical anxiolytic drug trials. Hamilton Rating Scale for Depression (HRDS)/(HAM-D).

The HRDS

is similar to the

HAS, in that both provide quantitative assessments of the severity of a clinical disorder. The HRDS was developed by Hamilton in 1960 and was revised in 1967 (Hamilton, 1967). It consists of 21 items, each measuring a depressive symptom. Hamilton has recommended

using only 17 items when scoring, because of the uncommon nature of the remaining items

(e.g., depersonalization). Hedlund and Vieweg (1979) reviewed the psychometric and substantive properties of the

31

32

DEROGATIS AND DELLAPIETRA

HRDS

in two dozen studies, and gave it a very favorable evaluation. More recently, Bech

(1987) completed a similar review, and concluded that the HRDS is an extremely useful scale for measuring depression. A Structured Interview Guide for the HRDS (SIGH-D) is also available (Williams, 1988), which provides standardized instructions for administation, and

has been shown to improve interrater reliability. Just as the HAS with anxiety, the HRDS also has become a standard outcome measure in antidepressant drug trials.

Psychological Screening in Specific Settings COMMUNITY SETTINGS By far the most comprehensive data on the prevalence of psychiatric disorders in the community has been developed from the recent NIMH Epidemiologic Catchment Area (ECA) investigation, a study of psychiatric disorders in the community involving nearly 20,000 individuals. These results make explicit the fact that psychiatric disorders are highly prevalent in our society. This is so regardless of whether we assess lifetime (Robins et al., 1984), 6-month (Blazer et al., 1984; Myers et al., 1984) or 1-month (Regier et al., 1988) prevalence

estimates. Detailing the latter, the 1-month prevalence for any psychiatric disorder, across all demographic parameters, was 15.4%, which is similar to European and Australian estimates, which ranged from 9% to 16% (Regier et al., 1988). In terms of specific diagnoses, the overall rate for affective disorders was 5.1%, whereas

that for anxiety disorders was 7.3% (Regier et al., 1988). Six-month prevalence estimates for affective disorders ranged from 4.6% to 6.5% across the five ECA sites (Myers et al., 1984), whereas 6-month estimates for anxiety disorders, recently updated by Weissman and Merikangas (1986), reveal rates for panic disorder ranging from 0.6% to 1.0%. Agoraphobia showed prevalences from 2.5% to 5.8% across the various ECA sites. Clearly these data demonstrate that psychiatric disorders are a persistent and demonstrable problem that affect substantial numbers of our community population. Unfortunately, there currently is no effective system for screening individuals in the community per se; we must wait until they seek medical advice or treatment for a disorder, and formally enter the healthcare system. At that point, primary care “gatekeepers” have the first, and in most instances the only, opportunity to identify psychiatric morbidity.

MEDICAL SETTINGS In medical populations, prevalence estimates of psychiatric disorder are increased substantially over community rates. This is particularly true of anxiety and affective disorders, which account for the majority of psychiatric diagnoses assigned to medical patients (Barrett et al., 1988; Derogatis et al. 1983; Von Korff, Dworkin,

leResche,

& Kruger,

1988). In

recent reviews of psychiatric prevalence in medical populations, Barrett et al. (1988) ob-

served prevalence rates of 25% to 30%, whereas Derogatis and Wise (1989) reported prevalence estimates for a broad range of medical cohorts that varied from 22% to 33%. These authors concluded, “In general, it appears that up to one-third of medical inpatients reveal

symptoms of depression; of these, 20 to 25% manifest more substantial depressive syndromes” (p. 101). Concerning anxiety, Kedward and Cooper (1966) observed a prevalence rate of 27% in their study of a London general practice, whereas Schulberg et al. (1985)

2

SCREENING

FOR PSYCHIATRIC DISORDER

observed a combined rate of 8.5% for phobic and panic disorders among American primary care patients. In another contemporary review, Wise and Taylor (1990) concluded that 5% to 20% of medical inpatients suffer the symptoms of anxiety, whereas 6% receive formal anxiety diagnoses. They further determined that depressive phenomena are even more preva-

lent among medical patients, citing reported rates of depressive syndromes of 11% to 26% in inpatient samples. With prevalence rates such as these, and the acknowledged escalations in morbidity and mortality associated with psychiatric disorders, there is little doubt that screening programs for psychiatric disorders in medical populations could achieve impressive utility. Potential therapeutic gains associated with psychiatric screening would be enhanced further and magnified by the fact that attendant related problems such as substance abuse, inappropriate diagnostic tests, and high utilization of health-care services also would be minimized. Early and accurate identification of occult mental disorders in individuals with primary medical conditions would lead to a significant improvement in their well-being, and also would help relieve the fiscal and logistic strain on the health-care system. Physician Recognition of Psychiatric Disorder. ‘It is now well established that, in the United States, primary care physicians represent a de facto mental health-care system (Burns & Burke, 1985; Regier et al., 1978, 1982). There is reliable evidence that one fifth to one

third of the primary care population suffer from at least one psychiatric condition (typically an anxiety or depressive disorder) (Derogatis & Wise, 1989), rendering the competence with which primary care physicians recognize psychiatric disorders a critical issue. Unfortunately, current evidence suggests that only a minor fraction of prevalent psychiatric disorders are detected in primary care, a deficiency that has considerable implications for both the physical and psychological health of our patients (Seltzer, 1989). Also, because undetected psychiatric disorders are associated with increased morbidity and mortality, as well as enhanced use of health-care facilities (Katon et al., 1990; Wells et al., 1988), the costs of the failure to

detect them are magnified substantially. Unaided Physician Recognition. During the past decade, a substantial number of studies have been reported documenting both the magnitude and nature of the problem of undetected psychiatric disorder among primary care physicians (Davis, Nathan, Crough & Bairnsfather, 1987; Jones, Badger, Ficken, Leepek, & Anderson, 1987; Kessler, Amick, & Thompson, 1985; Schulberg et al., 1985). The data from these studies establish rates of accurate physi-

cian diagnosis of psychiatric conditions, which range from a low of 8% (Linn & Yager 1984) to a high of 53%, observed more recently by Shapiro et al. (1987) with an elderly cohort. Although the methodology and precision of these studies undoubtedly has improved over the course of the decade (Anderson & Harthorn, 1989; Rand, Badger, & Coggins, 1988), rates of

accurate physician diagnosis have remained below 33% for the most part. A summary of these investigations, along with their characteristics and accurate detection rates, appear in Table.2:3,

Aided Physician Recognition.

The data from the investigations outlined previously sug-

gest that proactive steps must be taken to facilitate the accurate recognition of psychiatric conditions among primary care doctors. This is particularly the case in the light of contemporary changes in health care, which suggest that in the future nonpsychiatric physicians will be

playing a greater rather than a lesser role in this regard. If primary care physicians cannot identify psychiatric conditions correctly, they can neither adequately treat them personally or

refer them to appropriate mental health professionals. Such a situation ultimately will degrade the quality of our health-care systems further, and help deny effective treatment to those who, in many ways, need it most.

33

34

DEROGATIS AND DELLAPIETRA TABLE 2.3 A Review of Recent Research on Rates of Accurate Identification of Psychiatrtic Morbidity in Primary Care

Andersen & Harthorn (1989)

120 physicians primary care

Davis et al. (1987)

377 family practice patients

Jones et al. (1987)

20 family

Correct Diagnosis (%)

DKI

33% affective disorder 48% anxiety disorder

Zung

15% mild symptoms 30% severe symptoms

DIS

21%

GHQ

16%

1,452 primary care patients

GHQ

19.7%

150 patients in a

Zung

8%

DIS

44%

GHQ

53%

Zung

15%

physicians/51

Rand et al. (1988)

Criterion

Study Sample

Investigator

patients

36 family practice residents/520 patients

Kessler et al. (1985)

Linn & Yager (1980)

general medical clinic

Schulberg et al. (1985)

294 primary care patients

Shapiro et al. (1987)

1,242 patients at university internal medical clinic

Zung et al. (1983)

41 family medicine patients

Note. DKI = Diagnostic Knowledge Inventory; Zung = Self-Rating Depression Scale; DIS = Diagnostic Interview Schedule; GHQ = General Health Questionnaire.

There is some evidence that primary care physicians can identify accurately the prevalence of psychiatric disorders and the nature of these conditions. They estimate prevalence to be between 20% and 25% in their patient populations, and perceive anxiety and depressive disorders

to be the most

prevalent conditions

they encounter

(Fauman,

1983;

Orleans,

George, Houpt, & Brodie, 1985). To identify and overcome the problems inherent in detect-

ing psychiatric conditions in primary care, a number of investigators have studied the effects of introducing a diagnostic aid for primary care doctors, in the form of results from a psychological screening test. Although far from unanimous, the studies completed during the past decade have concluded that, in the appropriate situation, screening tests can improve significantly physician detection of psychiatric conditions. Linn and Yager (1984), using the Zung SDS, found an increase from only 8% correct

diagnosis to 25% in a cohort of 150 general medical patients. Similarly, Zung, Magill, Moore, and George (1983) reported an increase in correct identification rising from 15% to 68% in family medicine outpatients with depression. Likewise, Moore, Silimperi, and Bobula (1978) observed an increase in correct diagnostic identification from 22% to 56% working with family practice residents. Not all studies have shown such dramatic improvements in diagnostic accuracy, however. Hoeper, Nyczi, and Cleary (1979) found essentially no im-

provement in diagnosis associated with making GHQ results available to doctors, and Shapiro et al. (1987) reported only a 7% increase in accuracy when GHQ scores were made accessible. The question of aided recognition of psychiatric disorders is a complex one, with

2

SCREENING FOR PSYCHIATRIC DISORDER

35

numerous patient and doctor variables playing an important role. Nonetheless, the results of the studies on aided recognition appear promising, and an excellent contemporary review of the issues involved has been written by Anderson and Harthorn (1990). Problems Unique to Psychiatric Disorders. The prototypic psychiatric disorder is a hypothetical construct, with few pathognomonic clinical or laboratory indicators and a pathophysiology and etiology that remain opaque. For these reasons, unique problems can arise in the detection of these disorders, particularly in medical patients. To begin with, the highly prevalent anxiety and depressive disorders have a multitude of somatic symptoms associated with them. These subjective manifestations are difficult to differentiate from those arising from verifiable physical causes. Schurman, Kramer, and Mitchell (1985) indicated that 72% of visits to primary care doctors resulting in a psychiatric diagnosis presented with somatic symptoms as the primary complaint. Katon et al. (1990) and Bridges and Goldberg (1984) both indicated that presentation with somatic symptoms as primary complaints is a key reason for misdiagnosis of psychiatric disorders in primary care. In their study of high healthcare utilizers, Katon et al. (1990) reported that the high utilization group had elevated SCL-90-R scores of over three quarters of a standard deviation on the anxiety and depression subscales and the somatization subscale as well. A second problem, more specific to the chronically or terminally ill, has to do with the misperception of clinical depressions as demoralization reactions (Derogatis & Wise, 1989). Most serious chronic illnesses, and those that inevitably result in mortality, have a period of

disaffection and demoralization as a natural aspect of the illness. These negative affective responses are a natural reaction to the loss of vitality and well-being associated with being chronically ill and, where appropriate, the anticipated loss of life. Physicians familiar with caring for such patients (e.g., cancer, emphysema patients) frequently misperceive true clinical depressions (for which effective treatments are available) for reactive demoralized states, which are a

natural part of the illness. They then fail to initiate a therapeutic regimen on the grounds that such mood states are part of the primary medical condition. There is good evidence that such reactive states can be distinguished reliably from major clinical depressions (Snyder, Strain, & Wolf, 1990), and patients suffering such painful comorbid conditions are done a substantial disservice if we fail to diagnose adequately and treat their disorders. Although understandable, this composite of problems has a highly regressive impact on our overall health-care system. As it is now structured, we have a system where the large majority of psychiatric disorders are seen by primary care physicians, who undeniably leave

a plurality of these conditions undetected. Of the cases they do identify, only a small minority are ever referred to mental health specialists, even though such conditions are known to be of ©

a chronic and recurrent nature, and the primary care gatekeepers admit that they feel less than fully competent to treat them. Undetected or improperly treated anxiety and depressive disorders are known to be disproportionately associated with substance abuse, alcoholism, excessive diagnostic tests, suicide, excessive utilization of the health-care system, and spiral-

ing health-care costs. Unless a means is discovered to provide comprehensive psychiatric education to our primary care physicians, our hope of ever developing an efficient, costeffective health-care system will remain illusory.

ACADEMIC SETTINGS Recollections of college days usually bring to mind idyllic images of youthful abandon and the pursuit of personal growth and pleasure, unencumbered by the tedious demands and

36

DEROGATIS AND DELLAPIETRA

stresses of everyday adult life. Unfortunately, the realities of contemporary student life paint a very different portrait. The period of undergraduate and graduate studies represents a phase in the life cycle of rapid change, high stress, and previously unparalleled demands on coping resources. In the light of this reality, it is not surprising that it also represents a phase of life associated with a high incidence of psychiatric morbidity. Numerous studies have reported prevalence rates of psychological disorders in university populations.

Telch, Lucas, and Nelson (1989) investigated panic disorder in a sample of

2,375 college students and found that 12% reported at least one panic attack in their lifetime. Furthermore, 2.36% of the sample met DSM-III-R criteria for panic disorder. Craske and Krueger (1990) reported lifetime prevalence of nocturnal panic attacks in 5.1% of their 294 undergraduates. Prevalence of daytime panic attacks was also 5.1%, but only 50% of those reporting nocturnal panic also reported daytime panic. Disorders that are especially salient in college populations include addiction, eating disorders, and depression. West, Drummond,

and Eames (1990) found that of 270 college

students, 25.6% of men and 14.5% of women, reported drinking large quantities of alcohol weekly. This same sample included 20% of men and 6% of women who damaged property after drinking in the past year. Seay and Beck (1984) administered the Michigan Alcohol Screening Test (MAST) to 395 undergraduates and discovered 25% to be problem drinkers and 7% to be alcoholics. However, only 1% of the students were aware that they had a drinking problem. Eating disorders, especially bulimia, are relatively common in college populations, because the average age at onset is between adolescence and early adulthood (American Psychiatric Association, 1987). In a study of 1,040 college students, Striegel-Moore, Silberstein, French, and Rodin (1989) found rates of bulimia of 3.8% for females and 0.2% for males. In a study of 69 college women, Schmidt and Telch (1990) reported the prevalence of

personality disorders in three groups. Of a group defined as bulimic, 61% met criteria for at least one personality disorder, whereas “nonbulimic binge eaters” had a 13% prevalence. Prevalence of personality disorder in the control group was only 4%. Most bulimics (57%) who exhibited personality disorder met criteria for borderline personality disorder. Wells, Klerman, and Deykin (1987) reported 33% of their sample of 424 adolescents met the standard criteria for depression using the CES-D. When more stringent duration criteria were applied, the rate fell to 16%. Even more troubling are the results of a study by McDermott,

Hawkins, Littlefield, and Murray (1989), which revealed that 65% of the 331

college women and 51% of the 241 college men whom they surveyed met criteria for depression using the CES-D. Furthermore, 10% of this sample reported contemplating selfinjurious behavior during the previous week. Suicidal ideation was reported by 8%, and 1% said they thought about suicide “most or all of the time” during the past week. The results of these studies make it apparent that university students suffer from a considerable amount of

psychiatric morbidity. A critical question then becomes, to what degree is this morbidity detected by university health centers? As in the community, university physicians carry much of the burden for detecting

psychological disorders because of the nature of their contacts with students. These physicians invariably treat many patients who present with somatic complaints, which are actually manifestations of an underlying psychological disorder. The relative homogeneity of the age of the student group does carry some advantages with it, but beyond that fact, the university physician is essentially in the same position relative to recognizing psychological disorders

as his or her primary care counterparts. Because there have been no empirical studies regarding the accuracy of university physicians’ diagnoses, it must be assumed that they are comparable to their primary care colleagues. As reviewed earlier, primary care doctors show

2

SCREENING

FOR PSYCHIATRIC DISORDER

rates of accurate identification ranging from 8% (Linn & Yager, 1984) to 53% (Shapiro et al., 1987), with a majority of the rates remaining below 33% (Derogatis, DellaPietra, & Kilroy, 1992). This is obviously an unsatisfactory level of detection, given the prevalence of disorder in college populations. Aided Recognition of Disorders. The use of aided recognition screening paradigms appears to be one strategy that may improve the rate of detection of psychological disorders in this important population. As we learned from similar approaches in primary care, significant improvement in recognition rates can be developed from the implementation of such systems. The components required involve screening tests that are valid indicators of psychiatric morbidity in this age group, and a systematic and meaningful system of application

within universities. Several screening instruments have been used successfully with adolescent and college student populations and have been shown to have adequate sensitivity and specificity. The BDI and the CES-D are two unidimensional measures that have been used with college populations. Whitaker et al‘ (1990) used the BDI with 5,108 adolescents and found it to have

moderate validity in this population. Schmidt and Telch (1990) also used the BDI to measure depression in college women with bulimia. McDermott et al. (1989) used the CES-D to investigate health-related practices and events, and depression in college students. They found the scale to be practical and reliable. The General Behavior Inventory (Depue, Krauss, Spoot, & Arbisi, 1989) was used to detect unipolar and bipolar conditions in university students and found to have adequate sensitivity (unipolar, .78; bipolar, .76) and high specificity (.99 for both conditions). Two multidimensional instruments that also have proved useful with college populations are the General Health Questionnaire (GHQ) and the SCL-90-R. The GHQ was used by Szulecka, Springett, and De Pauw (1986) to identify first-year undergraduates who might be good candidates for psychotherapy, whereas the SCL-90-R and its briefer counterpart, the BSI, have been utilized with a number of studies of distress in university students. For example, Benjamin, Kasniak, Sales, and Shanfield (1986) used the BSI to measure distress in law and medical students, and Johnson, Ellison, and Heikkinen (1989) employed the

SCL-90-R to describe the type and severity of psychological symptoms in university students attending a counseling center. College norms are currently under development for both of these latter tests.

It appears that there are a variety of instruments currently available that can improve the rate of detection of psychological disorders in university settings. It is now incumbent on academic decisionmakers to formally integrate such measures into university mental health screening systems. Implementation of a Screening System. In implementing such screening systems, there are two broad methods to address the problem of integrating screening for psychological disorders into the university setting. The first is to take a preventive approach. Szulecka et al. (1986) used this method with entering students at Nottingham University. At time of registration, students were given the GHQ. Students scoring high (i.e., showing more distress) were then split into two groups: an intervention group (IG) and a control group (CG). There was also a matched group (MG) of students scoring low on the GHQ. The IG subjects were

offered an interview to discuss their feelings about the GHQ and adjustment to college life. They were made aware of counseling and support services on campus. At the end of the year,

all students again took the GHQ. Although many results failed to reach statistical significance, they supported a number of clinically relevant trends. Compared to the CG, the IG showed more improvement in GHQ scores at follow-up, made fewer consultations to physi-

37

38

DEROGATIS AND DELLAPIETRA

cians, had fewer withdrawals from the university, and had fewer students fail out of school. When asked for their reactions to the questionnaire, many students saw it as “evidence of

care, and it strengthened their confidence in the health center.” The results suggest that identifying vulnerable students upon admission and offering help can have beneficial effects. Although the CG made more consults to physicians, they were less likely to return, showing that students in need of help may not always seek it or follow through. For this reason, an active outreach program seems essential. A second approach to the problem that integrates an active outreach component was described by Clark, Levine, and Kinney (1988/1989). Although this process is discussed with specific focus on the prevention, identification, and treatment of bulimia, it is a generic procedure that can be applied to psychiatric disorders in general. Basically, the authors advocate assaulting the problem from all quarters. By enlisting faculty, staff, the library, campus media, counselors, physicians, and peers, awareness of the available resources for

treating psychological disorder will be heightened. Some of the examples of services to be developed include: courses, workshops, and public lectures about psychological disorders (given by faculty); information sessions (given by counselors, ministers, residence hall coordinators); accessible reading materials and lists of people/organizations to contact (in the library); public service announcements (in all campus media); and peer support groups. Once students are made aware of the nature of psychological problems and the availability of services, counselors, physicians, and “recovered” patients can provide psychological screening, therapy, medical assessment and treatment, and support. Obviously, these two approaches to implementing screening are not mutually exclusive, and perhaps combining them would lead to an even more effective way to identify and treat psychological disorders on campus. Thus, students could be screened prior to admission and those identified as vulnerable would be offered intervention. Those who “slip through” the screening process, or those who develop psychological problems after entering school, hopefully would seek help as a result of the campus’ campaign for awareness. Such a comprehensive program would be sure to reach a great majority of students in need.

Screening for Cognitive Impairment Screening for cognitive impairment, especially when dealing with geriatric populations, is extremely important, because it is estimated that up to 70% of patients with an organic mental disorder (OMD) go undetected (Strain et al., 1988). Because some OMDs are revers-

ible if discovered early enough, screening programs in high-risk populations can have a very high utility. Even in conditions found to be irreversible, early detection and diagnosis can help in the development of a treatment plan and the education of family members.

INSTRUMENTS WITH A GENERAL VERSUS SPECIFIC FOCUS There are several instruments available that provide quick and efficient screening of cognitive functioning. Most of these address the general categories of cognitive functioning covered in the standard mental status examination, including: attention, concentration, intelligence,

judgment, learning ability, memory, orientation, perception, problem solving, psychomotor

2

SCREENING FOR PSYCHIATRIC DISORDER

39

ability, reaction time, and social intactness (McDougall, 1990). However, not all instruments include items from all of these categories. These general instruments can be contrasted with another class of cognitive screening measures characterized by a more specific focus. For example, the Stroke Unit Mental Status Examination (SUMSE) was designed specifically to identify cognitive deficits and plan rehabilitation programs for stroke patients (Hajek, Rutman, & Scher, 1989). Another example of a screening instrument with a specific focus is the Dementia of Alzheimer’s Type Inventory (DAT), designed to distinguish Alzheimer’s disease from other dementias (Cummings & Benson, 1986). The more specific types of measures tend to be less common, perhaps due to their limited range of applicability. Unlike other screening tests, the great majority of cognitive impairment scales are administered by an examiner. Of the instruments to be reviewed in this monograph, none is a selfreport measure. There are no pencil-and-paper inventories that can be completed by the respondent alone. Instead, these screening measures are designed to be administered by a professional and require a combination of oral and written responses. Most of the tests are highly transportable, however, and can be administered by a wide variety of health-care workers.

COGNITIVE SCREENING INSTRUMENTS The following section provides a brief summary of nine popular cognitive impairment screening measures. This is not intended to be an exhaustive review, but rather some data on the nature of each measure and its psychometric properties (see Table 2.4). Mini-Mental State Examination (MMSE). The MMSE was developed by Folstein, Folstein, and McHugh (1975) to determine the level of cognitive impairment. It is an 11-item

TABLE 2.4 Screening Instruments for Cognitive Impairmant

Instrument

Author/Date

Description and Purpose

Application

Sensitivity/ Specificity

1,3,4

.83/.99

MMSE

Folstein et al. (1975)

11 Items—to determine level of cognitive impairment

CCSE

Jacobs et al. (1977)

30 Items--to detect presence of organic mental disorder

2,3,4,5

.73/.90

SPMS-Q

Pfeiffer (1975)

10 Items--to detect

1

.55-.88/.72-.96

2

.94/.92

1,4,5,6

.55-.96/NA

presence of cognitive impairment

HSCS

Faust & Fogel (1989)

15 Items--to estimate presence, scope, and severity of cognitive impairment

MSQ

Kahn et al. (1960)

10 Items--to quantify dementia

Note. 1 = community populations; 2 = cognitively intact; 3 = hospital 5 = geriatric; 6 = long-term care patient.

inpatients; 4 = medical patients;

40

DEROGATIS AND DELLAPIETRA

scale measuring six aspects of cognitive’ function: orientation, registration, attention and calculation, recall, language, and praxis. Scores can range from 0 to 30, with lower scores indicating greater impairment. The MMSE has proved successful at assessing levels of cognitive impairment in many populations, including community residents (Kramer, German, Anthony, Von Korff, & Skin-

ner, 1985), hospital patients (Teri, Larson, & Reifler, 1988), residents of long-term care facilities (Lesher & Whelihan,

1986), and neurological patients (Dick et al., 1984). How-

ever, Escobar et al. (1986) suggested using another instrument with Spanish-speaking individuals because the MMSE may overestimate dementia in this population. Roca et al. (1984) also recommend other instruments for patients with less than 8 years of schooling for similar reasons. In contrast, the MMSE may underestimate cognitive impairment in psychiatric populations (Faustman, Moses, & Csernansky, 1990). Finally, the MMSE has lower sensitivity with mildly impaired individuals who are more likely to be labeled as demented (Doyle, Dunn, Thadani, & Lenihan,

1986). As such, the MMSE

is most useful for patients

with moderate to moderately severe dementia. Cognitive Capacity Screening Examination (CCSE). The CCSE is a 30-item scale designed to detect diffuse organic disorders, especially delirium, in medical populations. The instrument was developed by Jacobs, Berhard, Delgado, and Strain (1977) and is recommendedif delirium is suspected. The items include questions of orientation, digit recall, serial sevens, verbal short-term memory, abstractions, and arithmetic, all of which are

helpful in detecting delirium (Baker, 1989). The CCSE has been used with geriatric patients (McCartney & Palmateer, 1985), as well as hospitalized medical-surgical patients (Foreman, 1987). In one comparison study of several similar brief screening instruments, the CCSE was shown to be the most reliable and valid (Foreman, 1987). Like the MMSE, the CCSE also is influenced by the educational level of the subject. However, unlike the MMSE, the CCSE cannot differentiate levels of

cognitive impairment or types of dementias, and is most appropriate for cognitively intact patients (Judd et al., 1986). Short Portable Mental Status Questionnaire (SPMSQ). The SPMSQ (Pfeiffer, 1975) is a 10-item scale for use with community and/or institutional residents. This scale is unique in that it has been used with rural and less-educated populations (Baker, 1989). The items assess orientation, and recent and remote memory; however, visuospatial skills are not tested. The

SPMSQ is.a reliable detector of organicity (Haglund & Schuckit, 1976), but should not be used to predict the progression or course of the disorder (Berg, Edwards, Danzinger, & Berg, 1987). High Sensitivity Cognitive Screen (HSCS). This scale was designed to be as sensitive and comprehensive as lengthier instruments while still being clinically convenient. It was developed by Faust and Fogel (1989) for use with 16- to 65-year-olds, native Englishspeaking subjects, with at least an eighth-grade education who are free from gross cognitive dysfunction. The 15 items include reading, writing, immediate and delayed recall, and

sentence construction, among others. The HSCS has shown adequate reliability and validity, and is best used to estimate presence, scope, and severity of cognitive impairment (Faust & Fogel, 1989). The HSCS cannot pinpoint specific areas of involvement and, because most of these scales should represent a first step toward cognitive evaluation, should not be a substitute for a standard neuropsychological assessment. Mental Status Questionnaire (MSQ).The MSQ is a 10-item scale developed by Kahn, Goldfarb, Pollack, and Peck (1960). It has been used successfully with medical geriatric

2

SCREENING

FOR PSYCHIATRIC DISORDER

41

patients (LaRue, D’Elia, Clark, Spar, & Jarvik, 1986), community residents (Shore, Over-

man, & Wyatt, 1983), and long-term care patients (Fishback, 1977). Disadvantages of this measure include its sensitivity to education and ethnicity of the subject, its reduced sensitivity with mildly impaired individuals, and its omission of tests of retention, registration, and cognitive processing (Baker, 1989). Other Instruments. Three measures are appropriate for primary care use because their main function is to simply rule out or detect the presence of dementia. FROMAJE (Libow, 1981) classifies individuals into normal, mild, moderate, and severe dementia groups, and

has been used successfully with long-term care patients (Rameizl, 1984). The Blessed Dementia Scale (Blessed, Tomlinson, & Roth, 1968) measures changes in activities and habits, personality, interests, and drives, and is useful for determining presence of dementia,

although not its progression. Finally, the Global Deterioration Scale (GDS; Reisberg, Ferris, deLeon, & Crook, 1982) distinguishes between normal aging, age-associated memory impairment,

and primary degenerative disorder (such as Alzheimer’s

disease). The GDS

is

useful for assessing the magnitude and progression of cognitive decline (Reisberg, 1984). One final measure of particular interest to the field of psychiatry is worth mentioning here. The Cognitive Levels Scale (Allen & Allen, 1987) is designed to measure cognitive impairment and social dysfunction in patients with mental disorders. Cognitive impairment is classified according to six levels (profoundly disabled to normal), and has implications for patients’ functioning at home and at work.

COGNITIVE SCREENING IN GERIATRIC POPULATIONS Screening the geriatric patient often can be a challenging enterprise for a number of diverse reasons. First, these patients often present with sensory, perceptual, and motor problems that seriously constrain the use of standardized tests. Poor vision, diminished hearing, and other physical handicaps can undermine the appropriateness of tests that are dependent on these skills. Similarly, required medications can cause drowsiness or inalertness, or in other ways

interfere with optimal cognitive functioning. Illnesses such as heart disease and hypertension, common

in the elderly, also have been shown to affect cognitive functioning (Libow,

1977). These limitations require screening instruments that are flexible enough to be adapted to the patient with handicaps or illnesses, and yet be sufficiently standardized to allow normative comparisons. Another difficulty with this population involves distinguishing cognitive impairment from aging-associated memory loss, and from characteristics of normal aging. This distinction requires a sensitive screening instrument because the differences between these conditions are often subtle. Normal aging and dementia can be differentiated through their different effects on such functions as language, memory, perception, attention, information processing speed, and intelligence (Bayles & Kaszniak, 1987). The Global Deterioration Scale (GDS) is a screening test designed for this specific purpose, and has been shown to describe the magnitude of cognitive decline and to predict functional ability (Reisberg, Ferris, deLeon, & Crook,

1988).

A final problem encountered when screening in geriatric populations is the comorbidity of

depression. Depression is one of several disorders in the elderly that may imitate dementia, resulting in a syndrome known as pseudodementia. These patients have no discernable organic impairment, and the symptoms of dementia usually will remit when the underlying

42

DEROGATIS AND DELLAPIETRA

affective disorder is treated. Variability of task performance can distinguish these patients from truly demented patients who tend to have an overall lowered performance level on all tasks (Wells, 1979). If depression is suspected, it should be the focus of a distinct diagnostic work-up.

COGNITIVE SCREENING AMONG MEDICAL POPULATIONS

INPATIENT

When attempting to screen for cognitive impairment in medical populations, several of the limitations mentioned earlier as pertaining to geriatric populations also will apply, because the groups often overlap. Medical patients often are constrained by their illness and may not be able to respond in the required manner to the test. In addition, these patients often are bedridden, necessitating the use of a portable, bedside instrument. Perhaps the most demanding issue when evaluating this population is discriminating between the dementing patient and the patient with acute confusional states, or delirium. This is particularly important, not only because of the increased occurrence of delirium in medical patients, but because if left untreated delirium can progress to an irreversible condition. Delirium can have multiple etiologies, such as drug intoxication, metabolic disorders, fever, cardiovascular disorders, or effects of anesthesia. The elderly and medical patients are

both susceptible to misuse or overuse of prescription drugs, as well as metabolic or nutritional imbalances. Hypothyroidism, hyperparathyroidism, and diabetes are a few of the medical conditions that often are mistaken for dementia (Albert, 1981). In addition, cognitive impair-

ment also can be caused by infections, such as pneumonia. Fortunately, three cardinal characteristics enable us to distinguish dementia from delirium. First is the rate of onset of symptoms. Delirium is marked by acute or abrupt onset of symptoms, whereas dementia is a more gradual progression. Second is the impairment of attention. Delirious patients have special difficulty sustaining attention on tasks such as serial sevens and digit span. Third is nocturnal worsening, which is characteristic of delirium, but not dementia (Mesulam & Geschwind, 1976).

COGNITIVE SCREENING IN PRIMARY CARE SETTINGS As mentioned previously, many cases of cognitive impairment go undetected. This may be because the early stages of cognitive dysfunction are often quite subtle and many of these cases first present to primary care physicians (Mungas, 1991) who tend to have their principal focus on other systems. Also, many are unfamiliar with the available procedures for detecting cognitive impairment, whereas others are reluctant to add a formal cognitive

screening to their schedule of procedures. Although brief, the 10 to 30 minutes required by most of the instruments remain a formidable requirement considering that, on average, a family practice physician spends 7 to 10 minutes with each patient. Because cognitive screening techniques are highly transportable and actuarial in nature and may be adminis-

tered by a broad range of health-care professionals, the solution to introducing such screening in primary care may be to train nurses or physician’s assistants to conduct screening. Such an approach would not add to the burden of physicians, and would at least effect an initiation of such programs so that we can realistically evaluate their utility.

Methodological Issues in Screening for Psychiatric Disorders THE PROBLEM OF LOW BASE RATES Almost 40 years ago, a paper appeared in the psychological literature (Meehl & Rosen, 1955) that sensitized psychologists to the dramatic impact of low base rates on the predictive validity of psychological tests. The authors pointed out that attempts to predict rare attributes or events, even with highly valid tests, would result in substantially more misclassifications than correct classifications if the prevalence of the event was sufficiently low. Knowledge of this important but little known fact remained limited to the field of psychometrics at the time. However,

11 years later, Vecchio (1966) published a report in the medical literature dealing

with essentially the same phenomenon. In the latter article, where the substantive aspects of the report dealt with screening tests in medicine, the information reached a much wider audience. Knowledge of the special relationship between low base rates and the predictive validity of screening tests has since become well established. To be precise, low prevalence does not equally affect all aspects of a test’s validity; its impact is felt only in the validity partition that deals with correctly classifying positives or cases. Predictive validity concerning negatives, or noncases, is minimally impaired, because with extremely low prevalence even a test with moderate validity will perform adequately. This relationship is summarized in Table 2.5, which synopsizes data originally given by Vecchio (1966). In the example developed by Vecchio, the sensitivity and specificity of the screening test are given as .95, values that do not represent realistic validity coefficients for a psychological screening test. Table 2.6 provides a more realistic example of the relationship between prevalence and positive predictive value, based on a hypothetical cohort of N = 1,000. In this example, validity coefficients (i.e., sensitivity and specificity) are more consistent with those that might be genuinely anticipated for such tests. The data in Tables 2.5 and 2.6 make it clear that as prevalence drops below 10%, the predictive value of a positive experiences a precipitous decline. In the first example, when prevalence reaches

1%, the predictive value of a positive is only 16%, which means,

in

practical terms, that in such situations five out of six positives will be false positives. The predictive value of a negative remains extremely high throughout the range of base rates

TABLE 2.5 Predictive Values of Positive and Negative Tests at Varying Prevalence (Base) Rates

Prevalence

or

Base Rate (%)

Note.

Predictive Value of a + (%)

Predictive Value of a - (%)

1 2 5 10 20 50

16.1 27.9 50.0 67.9 82.6 95.0

99.9 99.9 99.7 99.4 98.7 95.0

75

98.3

83.7

100

100

Sensitivity and specificity = 95%.

43

44

DEROGATIS AND DELLAPIETRA TABLE 2.6 Relationship of Prevalence (Base Rate) and Positive Predictive Value Assumed Test Sensitivity = 0.80 Assumed Test Specificity - 0.90 Prevalence = 0.30 Actual

T Pos. Eat 2400 S - 60 i 300

Prevalence = 0.05

Disorder

Neg. Omron) 630 690 700

Pos. Predict Val. = 240/310 = 77%

Actual

Disorder

Prevalence = 0.01 Actual

Disorder

T Pos. Neg. E +40 95 135 S-10 855 865 le. pee eae 50 950

T Pos. Neg. Ee +5 99 107 S-2 891 893 eee ee 10 990

Pos. Predict. Val = 40/135 = 30%

Pos. Predict. Val. = 8/107 = 7.5%

depicted, and is essentially unaffected by low prevalence situations. The example from Table 2.6 is more realistic, in that the validity coefficients are more analogous to those commonly reported for psychological screening tests. In the screening situation depicted here, the predictive value of a positive drops from 77% when prevalence equals 30% (e.g., the rate of psychiatric disorders among specialized medical patients), to 7.5%, when prevalence falls to 1%. In the latter instance, 12 out of 13 positives would be false positives.

SEQUENTIAL SCREENING: A KEY FOR LOW BASE RATES Although screening for psychiatric disorders in general is not usually affected by problems of low base rates, there are specific phenomena (e.g., suicide) and diagnostic categories (e.g., panic disorder) that reveal prevalences that are quite low. In addition, as Baldessarini, Finklestein, and Arana (1983) noted, the particular population being screened can markedly affect the quality of screening outcome. A good example of the latter distinction involves the dexamethasone suppression test (DST) when used as a screen for major depressive disorder (MDD). The DST functions relatively effectively as a screen for MDD on inpatient affective disorders units where the prevalence of MDD is quite high. However, in general medical

practice where the prevalence of MDD is estimated to be about 5%, the DST results in unacceptable rates of misclassification. The validity of the DST is insufficient to support effective screening performance in populations with low base rates of MDD. A method designed to help overcome low base rate problems commonly is referred to as sequential screening. In a sequential screening paradigm, there are two phases to screening and two screening tests. Phase I involves a less refined screen, whose primary purpose is to correctly identify individuals without the condition and eliminate them from consideration in

Phase II evaluation. The initial screening also has the important effect of raising the prevalence of the index condition in the remaining cohort. In Phase II, a separate test of equal or superior sensitivity is then utilized. Because the base rate of the index condition has been raised significantly by Phase I screening, the performance of the Phase II screen will involve

much lower levels of false positive misclassification. A hypothetical example of sequential screening is given in Table 2.7. In Phase I of the hypothetical screening, a highly valid instrument with sensitivity and

2

SCREENING

FOR PSYCHIATRIC DISORDER

45

TABLE 2.7 Hypothetical Example of Sequential Screening as a Strategy for Dealing With Low Base Rates

Phase 1 N = 10,000; Sensitivity = .90; Specificity = .90 Prevalence (Base Rate) = 4% Predictive Value of a Positive = 360

360 + 960 = 0.272 or 27.2%

Phase II N = 1,320; Sensitivity = .90; Specificity = .90 Prevalence (Base Rate) = 27.2% Predictive Value of a Positive = (.272) (.90)

(.272) (.90) + (.728) (.10) = 0.77 or 77%

specificity equal to .90 is used in a large population cohort (NV = 10,000) with a prevalence of 4% for the index condition. Because of the low base rate, the predictive value of a positive is only 27.2%, meaning essentially that less than one out of every three positives will be true positives. The 1,320 individuals screened positive from the original cohort of 10,000 subsequently become the cohort for Phase II screening. With an equally valid, independent test (sensitivity and specificity = .90) and a base rate of 27.2%, the predictive value of a positive in Phase II rises to 77%, representing a substantial increase in the level of screening perfor-

mance. Sequential screening essentially zeros in on a high-risk subgroup of the population of interest by virtue of a series of consecutive sieves. These have the effect of eliminating from consideration

individuals

with low likelihood of having the disorder, and simultaneously

raising the base rate of the condition in the remaining sample. Sequential screening can become expensive because of the increased number of screening tests that must be administered. However,

in certain situations where prevalence is low (e.g., HIV screening in the

general population) and the validity of the screening test already is close to maximum, it may be the only method available to minimize errors in classification.

ROC ANALYSIS Although some screening tests operate in a qualitative fashion, depending on the presence or absence of a key indicator, psychological screening tests function, as do many others, along a

quantitative continuum. The individual being screened must obtain a probability or score above some criterion threshold or cutoff to be considered a positive or a case. The cutoff

value usually is determined to be the value that will maximize correct classification and minimize misclassification relative to the index disorder. If the relative consequences of one type of error are considered more costly than the other (e.g., false negative = missed fatal

but potentially curable disease), the cutoff value often will be adjusted to take this differential utility into account. Although elegant quantitative methods exist to estimate optimal thresh-

46

DEROGATIS AND DELLAPIETRA

old values (Weinstein et al., 1980), they are just as often selected by simple inspection of cutoff tables and their associated sensitivities and specificities. The selection of a cutoff value automatically determines both the sensitivity and specificity of the test because it defines the rates of correct identification and misclassification. However, it is intuitively obvious that multiple cutoff values are possible, and that there is an entire distribution of them. It follows that corresponding distributions of sensitivities and specificities must also exist, associated with those cutoffs. Viewed from this perspective, a

test should not be thought of as being characterized by a sensitivity and specificity. Rather, it should be perceived as possessing an entire distribution of sensitivities and specificities associated with the distribution of possible threshold values. Receiver operating characteristic (ROC) analysis is a method that enables the visualization

of the entire distribution of sensitivity/specificity combinations for all possible cutoff values. As such, it enables the selection of a test criterion threshold to be based on substantially more information, and represents a more sophisticated clinical decision process. ROC analysis was first developed by Swets (1964) in the context of signal detection paradigms in psychophysics. Subsequently, applications of the technique were developed in the areas of radiology and medical imaging (Hanley & McNeil,

1982; Metz,

1978; Swets,

1979). More recently,

Madri and Williams (1986) and Murphy et al. (1987) introduced and applied ROC analysis to the task of screening for psychiatric disorders. Typical ROC comparison curves are presented for two hypothetical tests in Fig. 2.1. Such curves are developed by plotting corresponding values of a test’s sensitivity (true positive rate) on the vertical axis, against the compliment of its specificity (false positive rate) on the horizontal axis, for the entire range of possible cutting scores from lowest to highest. The ROC curve demonstrates the discriminative capacity of a test at each possible definition of threshold (cutoff score) for psychiatric disorder. If the discriminative capacity of a test is no better than chance, the curve will follow a diagonal straight line from the origin of the graph (lower left) to its uppermost right corner. This line is termed the line of no information. The ROC curve rises from the origin (point 0,0) to its termination point (1,1) on a plane defined earlier. To the extent that a test has discriminative ability, the curve will bow in a convex

manner toward the upper left corner of the graph. The greater the deviation toward the upper left corner, the greater discriminative ability the test has for the particular application at hand. An ROC summary statistic describing the discriminative capacity of a test is referred to as the “area under the curve” (AUC). When the ROC curve follows the line of no information, the

AUC = .50; in the situation of theoretically optimal discrimination, the ROC curve would

FIG.2.1. _ROCcurves fortwohypothetical psychiatric screening tests. From “Performance of Screening and Diagnostic Tests” by J. M. Murphy et al., 1987, Archives of General Psychiatry, 44, pp. 550-555. Copyright 1987 by American Medical Association. Reprinted by permission.

Line of No Information

(Sensitivity) Rate True-Positive

0.2

04

0.6

0.8

False-Positive Rate (1 - Specificity)

1.0

2

SCREENING

FOR PSYCHIATRIC DISORDER

47

follow the outline of the ordinate of the graph from point 0,0 to point 1,0, and then move at right angles to point 1,1. In this situation, the AUC would equal 1.0. Although ROC analysis has been introduced to the area of screening for psychiatric disorders only recently, investigators already have found multiple uses for the technique. In addition to simply describing the distribution of validity coefficients for a single test, ROC analysis has been used to compare various screening tests (Grant, Hasin, & Harfold, 1989; Weinstein, Berwick, Goldman, Murphy, & Barsky, 1989), aid in the validation of new tests, compare different scoring methods for a particular test (Birtchnell, Evans, Deahl, & Master, 1989), contrast the screening characteristics of a test in different populations (Burnam, Wells, Leake, & Landsverk, 1988; Hughson, Cooper, McArdle, & Smith, 1988), and even assist in

validating a foreign language version of a standard test (Chong & Wilkinson, 1989). Although ROC analysis does not represent a complete solution for the complex problems of psychiatric screening, it significantly increases the information available to the decisionmaker and provides a relatively precise method for making decisions.

Conclusion Given the important advances in the field over the past two decades, little doubt remains that psychiatric disorders now meet the WHO criteria for conditions appropriate for the development of effective health screening programs (Wilson & Junger, 1968). The magnitude of the health problem they represent is extensive, and the morbidity and mortality associated with these conditions is imposing. We currently possess valid, cost-efficient psychological tests to identify these conditions, and the efficacy of our treatment regimens for many psychiatric disorders is quite good (Regier et al., 1988). Although evidence concerning the incremental advantage of early detection remains somewhat equivocal, evidence is compelling that, left to their natural courses, such conditions will result in chronic, compound moribidity of both a physical and psychological nature (Derogatis & Wise, 1989; Katon et al., 1990; Regier et al., 1988). Also, as indicated in the introduction,

it is of little consequence

to develop

effective systems of treatment planning and outcome assessment if the great majority of individuals who would benefit from their application are lost to the system. In large measure, this undesirable reality results from the fact that a substantial majority of patients with psychiatric conditions are never seen by mental health professionals, and up to 20% are never seen by any health-care professional. The majority of individuals with psychi-

atric morbidity encountered in our health-care system are attended by primary care physicians who have insufficient training in the identification and treatment of these conditions. This fact results in a substantial plurality of these cases going unrecognized, and of those in whom a correct diagnosis is made, only a minority are referred to mental health professionals. Typically, primary care physicians have shown a preference to treat these cases personally, even though they feel less than completely competent to do so. Current evidence suggests that, in the immediate future, primary care physicians will play

more rather than less of a gatekeeper role in our health-care system relative to psychiatric disorders. This being the case, it seems imperative that we rapidly develop mechanisms to effectively educate these professionals and help them identify psychiatric disorders. Although the future may hold the promise of biological markers bringing enhanced refinement to the identification of psychiatric morbidity (Jefferson,

1988; Targum,

1990; Tollefson,

1990), current psychological screening methods can deliver valid, cost-effective identifica-

DEROGATIS AND DELLAPIETRA

tion of these conditions now. Considering the cost—benefit involved, such systems should be implemented in an extensive and effective manner as soon as possible.

References Albert, M. (1981). Geriatric neuropsychology. Journal of Consulting and Clinical Psychology, 49(6), 835-850. Allen, C., & Allen, R. (1987). Cognitive disabilities: Measuring the social consequences of mental disorders. Journal of Clinical Psychiatry, 48(5), 185-190.

_

American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd ed., rev.). Washington, DC: Author. Anderson, S. M., & Harthorn, B. H. (1989). The recognition,

diagnosis,

and treatment of

mental disorders by primary care physicians. Medical Care, 27(1), 869-886. Baker,

F. (1989).

Screening

tests for cognitive

impairment. Hospital and Community Psychiatry, 40(4), 339-340. Baldessarini, R. J., Finklestein, S., & Arana, G. W. (1983). The predictive power of diagnostic tests and the effect of prevalence of illness. Archives of General Psychiatry, 40, 569-573. Barrett, J. E., Barrett, J. A., Oxman,

T. E., &

Gerber, P. D. (1988). The prevalence of psychiatric disorders in a primary care practice. Archives of General Psychiatry, 45, 1100— 1106. Bayles, K., & Kaszniak, A. (1987). Communication and cognition in normal aging and dementia. Boston: Little, Brown. Bech, P. (1987). Observer rating scales of anxiety and depression with reference to DSM-III for clinical studies in psychosomatic medicine. Advances of Psychosomatic Medicine, 17, 55-70. Bech, P., Grosby, H., & Husum, B. (1984). Generalized anxiety and depression measured by the Hamilton Anxiety Scale and the Melancholia Scale in patients before and after cardiac surgery. Psychopathology, 17, 253-263. Beck, A. T., & Beck, R. W. (1972). Screening

depressed patients in family practice: A rapid technic. Postgraduate Medicine, 52(6), 81— 85. Beck, A. T., Ward, C., & Mendelson, M.

(1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 53-63. Benjamin, G., Kaszniak, A., Sales, B., & Shanfield, S. (1986). The role of legal education in producing psychological distress among law students and lawyers. American Bar Foundation Research Journal, 2, 225—252.

Berg, G., Edwards, D., Danzinger, W., & Berg, L. (1987).

Longitudinal change in three brief

assessments of SDAT. Journal of the America Geriatrics Society, 35(3), 205-212. Birtchnell, J., Evans, C., Deahl, M., & Master, N. (1989). The Depression Screening Instrument (DSI): A device for the detection of depressive disorders in general practice. Journal of Affective Disorders, 16, 269-281. Blazer, D., George, G. K., Landerman, R., Pennybacker, M., Melville, M. L., Woodbury, M.,

Mantor,

K.

G., Jordan,

K., & Locke,

B. (1984). Psychiatric disorders: A_ rural/ urban comparison. Archives of General Psychiatry, 41, 959-970.

:

Blessed, G., Tomlinson, B., & Roth, M. (1968). The association between quantitative measures of dementia and of senile change in the cerebral gray matter of elderly. British Journal of Psychiatry, 114, 797-811. Bridges, K., & Goldberg, D. (1984). Psychiatric illness in in-patients with neurological disorders: Patient’s view on discussions of emotional problems with neurologists. British Medical Journal, 289, 656-658. Burnam,

M.

A., Wells,

K.

B., Leake,

B., &

Landsverk, J. (1988). Development of a brief screening instrument for detecting depressive disorders. Medical Care, 26(8), 775-789. Burns, B. J., & Burke, J. D. (1985). Improving mental health practices in primary care. Public Health Reports, 100, 294-299. Chong, M., & Wilkinson, G. (1989). Validation of 30- and 12-item versions of the Chinese Health Questionnaire (CHQ) in patients admitted for general health screening. Psychological Medicine,

19, 495—505.

2 Clark, L., Levine, M., & Kinney, N. (1988/ 1989). A multifaceted and integrated approach to the prevention, identification, and treatment of bulimia on college campuses. Journal of College Student Psychotherapy, 3(2—4), 257-298. Cochran, C. D., & Hale, W. D. (1985). College students norms on the Brief Symptom Inventory. Journal of Clinical Psychology, 31, 176— 184. Commission on Chronic Illness. (1987). Chronic illness in the United States. 1, Cambridge: Commonwealth Fund, Harvard University Press. Comstock, G. W., & Helsing, K. J. (1976). Symptoms of depression in two communities. Psychological Medicine, 6, 551-564. Craske, M., & Krueger, M. (1990). Prevalence of nocturnal panic in a college population. Journal

of Anxiety

Disorders,

4,

125-139.

Craven, J. L., Rodin, G. M., & Littlefield, C. (1988). The Beck Depression Inventory as a screening device for major depression in renal dialysis patients. International Journal of Psychiatry in Medicine, 18(4), 365-374. Cummings,

J. & Benson,

F. (1986).

Dementia

of the Alzheimer Type: An inventory of diagnostic clinical features. Journal of the American Geriatrics Society, 34(1), 12-19. Davis, T. C., Nathan, R. G., Crough, & M. A.,

Bairnsfather, L. E. (1987). Screening depression with a new tool: back to basics with a new tool. Family Medicine, 19, 200-202. Depue, R., Krauss, S., Spoot, M. & Arbisi, P. (1989). General Behavior Inventory: Identification of unipolar and bipolar affective conditions in a nonclinical university population. Journal of Abnormal Psychology, 98(2), 117— 126. Derogatis, L. R. (1977). SCL-90-R: Administration, scoring and procedures manual-I. Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1983). SCL-90-R: Administration, scoring and procedures manual-II. Baltimore: Clinical Psychometric Research. Derogatis, L.R. (1990). SCL-90-R:A bibliography of research reports 1975-1990. Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1992). Administration, scoring and procedures manual for the Brief Symptom Inventory II (2nd ed.). Baltimore: Clinical Psychometric Research. Derogatis, L., DellaPietra, L., & Kilroy, V.

SCREENING

FOR PSYCHIATRIC DISORDER

(1992). Screening for psychiatric disorder in

medical populations. In M. Fava, G. Rosenbaum, & R. Birnbaum (Eds.), Research designs and methods in psychiatry (pp. 145— 170). Amsterdam: Elsevier. Derogatis, L. R., Lipman, R. S., Rickels, K.,

Uhlenhuth, E. H., & Covi, L. (1974a). The Hopkins Symptom Checklist (HSCL). A selfreport symptom inventory. Behavioral Science, 19, 1-15.

Derogatis, L. R., Lipman, R. S., Rickels, K.,

Uhlenhuth, E. H., & Covi. L. (1974b). The Hopkins Symptom Checklist (HSCL). InP. Pinchot (Ed.), Psychological measurements in psychopharmacology (pp. 79-111.) Basel: Karger. Derogatis, L. R., & Melisaratos, N. (1983). The Brief Symptom Inventory: An introductory report. Psychological Medicine, 13, 595—60S. Derogatis, L. R., Morrow, G. R., Fetting, J., Penman, D., Piasetsky, S., Schmale, A. M.,

Henrichs, M., & Carnicke, C. L. M. (1983). The prevalence of psychiatric disorders among cancer patients. Journal of the American Medical Association, 249, 751-757. Derogatis, L. R., & Spencer, P. M.

(1982).

BSI

Administration and procedures manual I. Baltimore: Clinical Psychometric Research. Derogatis, L. R., & Wise, T. N. (1989). Anxiety and depressive disorders in the medical patient. Washington, DC: American Psychiatric Press. Dick, J., Guiloff, R., Stewart, A., Blackstock, J., Bielawska, C., Paul, E., & Marsden, C. (1984). Mini-Mental State Examination in neurological patients. Journal of Neurology, Neurosurgery, and Psychiatry, 47, 496-499. Dohrenwend, B. P., & Dohrenwend, B. S. (1982). Perspectives on the past and future of psychiatric epidemiology. American Journal

of Public Health, 72(11), 1271-1279. Doyle, G., Dunn, S., Thadani., I., & Lenihan, P. (1986). Investigating tools to aid in restorative care for Alzheimer’s patients. Journal of Gerontological Nursing, 12(9), 19-24. Erikkson, J. (1988). Psychosomatic aspects of coronary artery bypass graft surgery. (1988). A prospective study of 101 male patients. Acta Psychiatrica Scandinavica, 77(Suppl. 340),

112. Escobar, J., Burnam, A., Karno, M., Forsythe, A., Landsverk, J., & Golding, J. (1986). Use of the Mini-Mental State Examination (MMSE) in a community population of mixed

49

50

DEROGATIS AND DELLAPIETRA ethnicity:

Cultural

and

linguistic

artifacts.

Journal of Nervous and Mental Disease,

174,

607-614. Fauman,

M. A.

(1983).

Psychiatric components

of medical and surgical practice, II. Referral and treatment of psychiatric disorders. American Journal of Psychiatry, 140, 760-763. Faust, D., & Fogel, B. (1989). The development and initial validation of a sensitive bedsider cognitive screening test. Journal of Nervous and Mental Disease,

177(1), 25-31.

Faustman, W., Moses, J., & Csernansky, J. (1990). Limitations of the Mini-Mental State Examination in predicting neuropsychological functioning in a psychiatric sample. Acta Psychiatrica Scandinavica, Fishback,

D.

(1977).

81, 126—131.

Mental

status

question-

naire for organic brain syndrome with a new visual counting test. Journal of the American Geriatrics Society, 35(4), 167-170. Folstein, M., Folstein, S., & McHugh, P. (1975).

Mini-Mental State. Journal of Psychiatric Research, 12, 189-198.

Foreman, M. (1987). Reliability and validity of mental status questionnaires in elderly hospitalized patients. Nursing Research, 36(A4), 216-220. Frerichs, R. R., Areshensel, C. S., & Clark, V. A. (1981). Prevalence of depression in Los Angeles County. American Journal of Epidemiology, 113(6), 691-699. Fulop, G., & Strain, J. J. (1991). Diagnosis and treatment of psychiatric disorders in medically ill inpatients. Hospital and Community Psychiatry, 42(4), 389-394. Galton, F. (1883). Inquires into human faculty and its development. New York: Macmillan. Goldberg, D. (1972). The detection of psychiatric illness by questionnaire. Oxford: Oxford University Press. Goldberg, D., & Hillier, V. F. (1979). A scaled version of the General Health Questionnaire. Psychological Medicine, 9, 139-145. Goldberg, D., & Williams, P. (1988). A user’s guide to the General Health Questionnaire. Windsor: Nfer-Nelson. Grant, F. B., Hasin, D. S., & Harford, T. C. (1989). Screening for major depression among alcoholics: An application of receiver operating characteristic analysis. Drug and Alcohol Dependence, 23, 123-131.

Haglund, R., & Schuckit, M.

(1976). A clinical

comparison of tests of organicity in elderly

patients. Journal of Gerontology, 31(6), 654— 659. Hajek, V., Rutman, D., & Scher, H. (1989). Brief assessment of cognitive impairment in patients with stroke. Archives of Physical Medicine

and Rehabilitation,

70,

114-117.

Hamilton, M. (1959). The assessment of anxiety states by rating. British Journal of Medical Psychology, 32, 50-55. Hamilton, M. (1960). A rating scale for depression. Journal of Neurosurgery Psychiatry, 23, 50-55. Hamilton, M. (1967). Development of a rating scale for primary depressive Illiness. British Journal of Social and Clinical Psychology, 6 278-296. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Diagnostic Radiography, 143(1), 29-36. Hawton, K. (1981). The long-term outcome of psychiatric morbidity detected in general medical patients. Journal of Psychosomatic Research, 25(3), 237-243. Hedlund, J. L., & Vieweg, M. D. (1979). The Hamilton Rating Scale for Depression: A comprehensive review. Journal of Operational Psychiatry, 10(2), 149-165. Hoeper, E. W., Nyczi, G. R., & Cleary, P. D. (1979). Estimated prevalence of RDC mental disorders in primary care. International Journal of Mental Health, 8, 6-15.

Hughson, A. V. M., Cooper, A. F., McArdle, C. S., & Smith, D.C. (1988). Validity of the General Health Questionnaire and its subscales in patients receiving chemotherapy for early breast cancer. Journal of Psychosomatic Research, 32(4/5), 393-402. Jacobs, J., Berhard, M., Delgado, A., & Strain, J. (1977). Screening for organic mental syndromes in the medically ill. Annals of Internal Medicine, 86, 40—46. Jefferson, J. W. (1988). Biologic systems and their relationship to anxiety. Psychiatric Clinics of North America, 11(2), 463—472. Johnson, R., Ellison, R., & Heikkinen, C. (1989). Psychological symptoms of counseling center clients. Journal of Counseling Psychology, 36(1), 110-114. Jones, L. R., Badger, L. W., Ficken, R. P., Leepek, J. D., & Anderson, R. L. (1987). Inside the hidden mental health network: Examining mental health care delivery of prima-

2 ry care physicians. General Hospital Psychiatry, 9, 287-293. Judd, B., Meyer, J., Rogers, R., Gandhi, S., Tanahashi, N., Mortel, K., & Tawaklna, T. (1986). Cognitive performance correlates

with cerebrovascular impairments in multiinfarct dementia. Journal of the American Geriatrics Society, 34(5), 355-360. Kahn,

R., Goldfarb,

A., Pollack,

M., & Peck,

A. (1960). Brief objective measures for the determination of mental status in the aged. American Journal of Psychiatry, 117, 326— 328. Kamerow,

D. B., Pincus, H. A., & MacDonald,

D. I. (1986). Alcohol abuse, other drug abuse, and mental disorders in medical practice: Prevalence,

cost, recognition,

and treat-

ment. Journal of the American Medical Association, 255, 2054—2057. Kane, M. T., & Kendall, P. C. (1989). Anxiety disorders in children: A multiple-baseline evaluation of a cognitive-behavioral treatment. Behavior Therapy, 20(4), 499-508. Katon, W., Von Korff, M., Lin, E., Lipscomb, P., Russo, J., Wagner, E., & Polk, E. (1990). Distressed high utilizers of medical care: DSM-III-R diagnoses and treatment needs. General Hospital Psychiatry, 12, 355-362. Kedward, H. B., & Cooper, B. (1966). Neurotic disorders in urban practice: A 3-year followup. Journal of College of General Practice, 12, 148-163. Kempf, E. J. (1914). The behavior chart in mental diseases. American Journal of Insanity,

7,

761-772. Kessler, L. G., Amick, B. C., & Thompson, J. (1985). Factors influencing the diagnosis of mental disorder among primary care patients. Medical Care, 23, 50-62.

Kramer, M., German, P., Anthony, J., Von Korff, M., & Skinner, E. (1985). Patterns of mental disorders among the elderly residents of eastern Baltimore. Journal of the American Geriatrics Society, 11(4), 236-245. LaRue, A., D’Elia, L., Clark, E., Spar, J., & Jarvik, L. (1986). Clinical tests of memory in dementia, depression, and healthy aging. Psychology and Aging, 1(1), 69-77. ‘Lesher, E., & Whelihan, W. (1986). Reliability of mental status instruments administered to nursing home residents. Journal of Consulting and Clinical Psychology, 54(5), 726-727. Libow, L. (1977). Senile dementia and pseudo-

SCREENING

FOR PSYCHIATRIC DISORDER

senility: Clinical diagnosis. In C. Eisdorfer & R. Fredel (Eds.), Cognitive and emotional disturbance in the elderly. Chicago: Year Book Medical Publishing. Libow, L. (1981). A rapidly administered, easily remembered mental status evaluation: FROMASJE. In L. S. Libow & F. T. Sherman (Eds.), The core of geriatric medicine (pp. 85— 91). St. Louis: C. V. Mosby. Linn, L., & Yager, J. (1984).

Recognition of de-

pression and anxiety by primary care physicians. Psychosomatics, Luce,

R.

D.,

&

Narens,

25, 593-600. L. (1987).

ment scales on the continuum.

Measure-

Science, 236,

1527-1532. Madri, J. J., & Williams, P. (1986). A comparison of validity of two psychiatric screening questionnaires. Journal of Chronic Disorders, 39, 371-378. Malt, U. F. (1989). The validity of the General Health Questionnaire in a sample of accidentally injured adults. Acta Psychiatrica Scaninavica, 80(Suppl. 355), 103-112. Maser, J. D., & Cloninger, C. R. (1990). Comorbidity of mood and anxiety disorders. Washington, DC: American Psychiatric Press. McCartney, J., & Palmateer, L. (1985).

Assess-

ment of cognitive deficit in geriatric patients: A study of physician behavior. Journal of the American Geriatrics Society, 33(7), 467—471. McDermott, R., Hawkins, W., Littlefield, E., & Murray,S. (1989). Health behavior correlates of depression among university students. Journal of American College Health, 38, 115-119. McDougall, G. (1990). A review of screening instruments for assessing cognition and mental status in older adults. Nurse Practitioner, 15(11), 18-28. Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52, 194-216.

Mesulam, M., & Geschwind, N. (1976). Disordered mental status in the postoperative period. Urologic Clinics of North America, 3, 199-215. Metz, C. E. (1978). Basic principles of ROC analysis.

Seminars

in Nuclear

Medicine,

8,

283-298. Moore, J. T., Silimperi, D. R., & Bobula, J. A. (1978). Recognition of depression by family medicine residents: The impact of screening. Journal of Family Practice, 7, 509-513.

51

we

DEROGATIS AND DELLAPIETRA Mungas, D. (1991). In-office mental status testing: A practical guide. Geriatrics, 46, 54-66. Murphy, J. M., Berwick, D. M., Weinstein, M. C., Borus, J. F., Budman, S. H., & Klerman, G. L. (1987). Performance of screening and diagnostic tests. Archives of General Psychiatry, 44, 550-555. Myers, J. K., Weissman, M. M., Tischler, G. L., Holzer, C. E. Il, Leaf, P. J., Orvaschel, H., Anthony, J. C., Boyd, J. H., Burke, J. D., Kramer,

M.,

&

Stoltzman,

R.

(1984).

Six-

month prevalence of psychiatric disorders in three comunities. Archives of General Psychiatry, 41, 959-970. Nielson, A. C., & Williams, T. A. (1980). Depression in ambulatory medical patients. Archives of General Psychiatry, 37, 999-1009. Nunnally, J. (1978). Psychometric theory. New York: McGraw-Hill. O’Hara,

M.

N., Ghonheim,

M.

M.,

Heinrich,

J. V., Metha, M. P., & Wright, E. J. (1989). Psychological consequences of surgery. Psychosomatic Medicine, 51, 356—370. Orleans, C. T., George, L. K., Houpt, J. L., &

Brodie, H. (1985). How primary care physicians treat psychiatric disorders: A national survey of family practitioners. American Journal of Psychiatry, 142, 52-57. Parikh, R. M., Eden, D. T., Price, T. R., & Rob-

inson, R. G. (1988). The sensitivity and specificity of the Center for Epidemiologic Studies Depression Scale in screening for post-stroke depression. [International Journal of Psychiatry in Medicine, 18(2), 169-181. Pfeiffer, E. (1975). A short portable mental status questionnaire for the assessment of organic brain deficit in elderly patients. Journal of the American Geriatrics Society, 23(10), 433441. Radloff, L. S. (1977). The CES-D Scale: A selfreport depression scale for research in the general population. Applied Psychological Measurement, 1, 385-401. Radloff, L. S., & Locke, B. Z. (1985). The Community Mental Health Assessment Survey and the CES-D Scale. In M. M. Weissman, J. K. Myers, & C. G. Ross (Eds.), Com-

munity survey of psychiatric disorder. New Brunswick: Rutgers University Press. Rameizl, P. (1984). A case for assessment technology in long-term care: The nursing perspective. Rehabilitation Nursing, 9(6), 29-31. Rand, E. H., Badger, L. W., & Coggins, D. R.

(1988). Toward a resolution of contradictions: Utility of feedback from the GHQ. General Hospital Psychiatry, 10, 189-196. Regier, D. A., Boyd, J. H., Burke, J. D., Rae, D. S., Myers, J. K., Kramer, M., Robins, L. N., George, L. K., Karno, M., & Locke, B. Z. (1988). One month prevalence of mental disorders in the United States. Archives of General Psychiatry, 45, 977-986. Regier, D. A., Goldberg, I. D., Burns, B. J., Hankin,

J., Hoeper,

E. W., & Nyez,

G. R.

(1982). Specialist/generalist division of responsibility for patients with mental disorders. Archives of General Psychiatry, 39, 219-224. Regier, D., Goldberg,

I., & Taube,

C.

(1978).

The defacto U.S. Mental Health Services system: A public health perspective. Archives of General Psychiatry, 35, 685-693.

Regier, D. A., Robert, M. A., Hirschfeld, R. M., Goodwin, F. K., Burke, J. D., Lazar, J. B., & Judd,

L. L. (1988).

The NIMH

Depression

Awareness, Recognition, and Treatment Program:

Structure,

aims,

and

scientific

basis.

American Journal of Psychiatry, 145(11), 1351-1357. Reisberg, B. (1984). Stages of cognitive decline. American Journal of Nursing, 84(2), 225-228. Reisberg, B., Ferris, S., deLeon, M., & Crook, T. (1982). The Global Deterioration Scale for assessment of primary degenerative dementia. American

Journal

of

Psychiatry,

139(9),

1136-1139. Reisberg, B., Ferris, S., deLeon, M., & Crook, T. (1988). Global Deterioration Scale (GDS). Psychopharmacology Bulletin, 24(4), 661— 663. Reynolds, W. M., & Gould, J. W. (1981). A psychometric investigation of the standard and short form Beck Depression Inventory. Journal of Consulting Clinical Psychology, 49, 306-307. Riskind, J. H., Beck, A. T., Brown, G., & Steer, R. A. (1987). Taking the measure of anxiety and depression: Validity of the reconstructed Hamilton scales. Journal of Nervous and Mental Disorders, 175(8), 474—479. Roberts, R. E., Rhoades, H. M., & Vernon, S. W. (1990). Using the CES-D Scale to screen for depression and anxiety: Effects of language and ethnic status. Psychiatry Research, 31, 69-83. Robins, L. N., Helzer, J. E., Weissman, M. M.,

2 Orvaschel, H., Greenberg, E., Burke, J. D., & Regier, D. A. (1984). Lifetime prevalence

of specific psychiatric disorders in three sites. Archives of General Psychiatry, 41, 949-958. Roca, P., Klein, L., Kirby, S., McArthur, J., Vogelsang, G., Folstein, M., & Smith, C. (1984). Recognition of dementia among medical patients. Archives of Internal Medicine, 144, 73-75.

chiatry, 42, 1164-1170. Schurman, R. A., Kramer, M., & Mitchell, J. B. (1985). The hidden mental health network. of General

Psychiatry,

42, 89-94.

Seay, T., & Beck, T. (1984). Alcoholism among college students. Journal of College Student Personnel, 25(1), 90—92. Seltzer, A. (1989). Prevalence, detection and referral of psychiatric morbidity in general medical patients. Journal of the Royal Society of

Medicine, 82, 410-412. Shapiro, S., German, P., Skinner, E., Von Korff, M., Turner, R., Klein, L., Teitelbaum, M., Kramer, M., Burke, J., & Burns, B. (1987). An experiment to change detection and management of mental morbidity in primary care. Medical Care, 25, 327-339. Shore, D., Overman, C., & Wyatt, R. (1983). Improving accuracy in the diagnosis of Alzheimer’s disease. Journal of Clinical Psychia-

try, 44, 207-212. Shrout, P. E., & Yager, T. J. (1989). Reliability and validity of screening scales: Effect of reducing scale length. Journal of Clinical Epidemiology, 42(1), 69-78. Snyder, S., Strain, J. J., & Wolf, D. (1990). Differentiating major depression from adjustment disorder with depressed mood in the medical setting. General Hospital Psychiatry, 12, 159-

165.

FOR PSYCHIATRIC DISORDER

Strain, J. J., Fulop, G., Lebovits, A., Ginsberg, B., Robinson, M., Stern, A., Charap, P., & Gany, F. (1988). Screening devices for diminished cognitive capacity. General Hospital Psychiatry,

10, 16-23.

Striegel-Moore, R., Silberstein, L., Frensch, P., & Rodin, J. (1989). A prospective study of disordered eating among college students. /nternational Journal of Eating Disorders, 8(5),

Rosenthal, T. L., Miller, S. T., Rosenthal, R. H., Sadish, W. R., Fogleman, B. S., & Dismuke, S. E. (1991). Assessing emotional interest at the internist’s office. Behavioral Research & Therapy, 29(3), 249-252. Schmidt, N., & Telch, M. (1990). Prevalence of personality disorders among bulimics, nonbulimic binge eaters, and normal controls. Journal of Psychopathology and Behavioral Assessment, 12(2), 160—185. Schulberg, H. C., Saul, M., McClelland, M., Ganguli, M., Christy, W., & Frank, R. (1985). Assessing depression in primary medical and psychiatric practices. Archives of General Psy-

Archives

SCREENING

499-509. Swets, J. A. (1964). Signal detection and recognition by human observers. New York: Wiley. Swets, J. A. (1979). ROC analysis applied to the evaluation of medical imaging techniques. /nvestigatory Radiology, 14, 109-121. Szulecka,

T.,

Springett,

N.,

&

De

Pauw,

K. (1986). Psychiatric morbidity in first-year undergraduates and the effect of brief psychotherapeutic intervention—A pilot study. British Journal of Psychiatry, 149, 75-80. Targum, S. D. (1990). Differential responses to anxiogenic challenge studies in patients with major depressive disorder and panic disorder. Society of Biological Psychiatry, 28, 21-34. Telch, M., Lucas, J., & Nelson, P. (1989). Nonclinical panic in college students: An investigation of prevalence and symptomatology. Journal of Abnormal Psychology, 93(3), 300— 306. Teri, L., Larson, E., & Reifler, B. (1988). Behavioral disturbance in dementia of the Alzheimer type. Journal of the American Geriatrics Society, 36(1), 1-6. Tollefson, G. D. (1990). Differentiating anxiety and depression. Psychiatric Medicine, 8(2), 27-39. Tucker, M. A., Ogle, S. J., Davison, J. G., & Eilenberg, M. D. (1987).; Validation of a brief screening test for depression in the elderly. Age and Aging, 16, 139-144. Vecchio, T. J. (1966). Predictive value of a single diagnostic test in unselected populations. New England Journal of Medicine, 274, 1171. Von Korff, M., Dworkin, S. F., leResche, L., Kruger, A. (1988). An epidemiologic comparison of pain complaints. Pain, 32, 173183. Weinstein, M. C., Berwick, D. M., Goldman, P. A., Murphy, J. M., & Barsky, A. J. (1989). A comparison of three psychiatric screening tests using receiver operating characteristic (ROC) analysis. Medical Care, 27(6), 593607.

53

54

DEROGATIS AND DELLAPIETRA Weinstein, A.

M.

C., Fineberg,

S., Neuhauser,

D.,

H.

Neutra,

V., Elstein, R.

R.,

&

McNeil, B. J. (1980). Clinical decision analysis, Philadelphia: W. B. Saunders. Weissman, M. M., & Merikangas, K. R. (1986). The epidemiology of anxiety and panic disorder: An update. Journal of Clinical Psychiatry, 47, 11-17. Weissman, M. M., Myers, J. K., & Thompson, W. D. (1981). Depression and its treatment in a U.S. urban community. Archives of General Psychiatry, 38, 417—421. Wells,

C. (1979).

Pseudodementia.

American

Journal of Psychiatry, 136, 895-900. Wells,

K. B., Golding,

J. M., & Burnam,

M.

A. (1988). Psychiatric disorders in a sample of the general population with and without chronic medical conditions. American Journal of Psychiatry, 145, 976-981. Wells,

V., Klerman,

G., & Deykin,

E. (1987).

The prevalence of depressive symptoms in college students. Social Psychiatry, 22(1), 20— 28. West, R., Drummond, C., & Eames, K. (1990). Alcohol consumption, problem drinking and anti-social behaviour in a sample of college students. British Journal of Addiction, 85(4), 479-486. Whitaker, A., Johnson, J., Shaffer, D., Rapoport, J., Kalikow, K., Walsh, B., Davies, M., Brai-

man, S., & Dolinsky, A. (1990). Uncommon trouble in young people: Prevalence estimates of selected psychiatric disorders in a nonreferred adolescent population. Archives of General Psychiatry, 47, 487—496.

Williams, J. B. (1988). A structured interview guide for the Hamilton Depression Rating Scale. Archives of General Psychiatry, 45(8), 742-747. Wilson, J. M., & Jungner, F. (1968).

Principles

and practices of screening for diseases. Geneva: WHO. Wise, M. G., & Taylor, S. E. (1990). Anxiety and mood disorders in mentally ill patients. Journal of Clinical Psychiatry, 51(1), 27-32. Woodworth,

R. S. (1918).

Personal data sheet.

Chicago: Stoelting. Yopenic, P. A., Clark, C. A., & Aneshensel, C. S. (1983). Depression problem recognition and professional consultation. Journal of Nervous

and Mental Disorders,

171, 15—23.

Zung, W. K. W. (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63— 70. Zung, W. K. W. (1972). The Depression Status Inventory: An adjunct to the Self-Rating Depression Scale. Journal of Clinical Psychology, 28, 539-543. Zung, W. K. W. (1974). The measurement of affects: Depression and anxiety. In P. Pichot (Ed.), Psychological measurement in psychopharmacology (pp. 179-188). Basel: Krager. Zung, W., Magill, M., Moore, J., & George, D. (1983). Recognition and treatment of depression in a family practice. Journal of Clinical Psychiatry, 44, 3-6. Zung, W. K. W., & Zung, E. M.

(1986).

Use of

the Zung Self-Rating Depression Scale in the elderly. Clinical Gerontologist,

148.

5(1—2),

137-

Chapter 3 Use of Psychological Tests/Instruments for Treatment Planning Larry E. Beutler

Phylis Wakefield Rebecca E. Williams University of California, Santa Barbara

N.K.! was a 38-year-old, married, professional man who traveled from a distant city to get help for a sleep disturbance. He attributed his problem to the residual effects of a benzodiazapine that initially had been prescribed for him over a year previously. This prescription had been given to him by a psychiatrist who had diagnosed him as having a “reactive depression” and had prescribed the high-potency benzodiazapine to reduce anxiety. Paradoxically, upon taking the medication, N.K. began having significant sleep disturbances and became very anxious. He terminated further therapy after two sessions, but it was 2 months before he was able to discontinue the medication. However, following his self-directed withdrawal from the medication, N.K. continued to experience extreme anxiety and sleep impairment. In the subsequent months, two neurological examinations and a two-night polysomnographic sleep evaluation produced negative results for organic conditions, and at least three doctors suggested that the problem was secondary to depression. At the time that he sought treatment from the first author, N.K. reported sleeping less than 2 hours per night, attributing the problems to the “additive effects” of the medication. The clinician presented with a client like N.K. must deal with a number of important questions, all compounded by the fact that the patient was from out of town and requested

help within the brief time span of 1 week..Is this condition treatable? Is psychotherapy an appropriate treatment modality? What about family therapy? Should the treatment focus on

the symptom of sleep problems, those of depression and anxiety, or on an underlying, dynamic problem? Should he be hospitalized for further evaluation? The immediate challenge to the clinician is to decide on the most productive intervention by which to commence treatment and engage the client. Simultaneously, the clinician must foment a treatment plan that will be maximally effective in addressing the client’s needs. In pursuing these objectives, it is implicitly acknowledged that treatments that are effective for one client or problem may be ineffective for another. In recognition of this fact, health-care researchers have attempted to develop guidelines that will assist clinicians by identifying 1This case example was discussed originally by Beutler and Clarkin (1990). Identifying information has been

changed to protect the identity of the patient.

55

56

BEUTLER, WAKEFIELD, WILLIAMS

treatments that have the highest likelihood of success and that might be either inappropriate or minimally effective. The prescription of effective treatments and the proscription of ineffective ones is called “differential therapeutics” (Frances, Clarkin, & Perry, 1984). Many mental health clinicians have come to rely on standardized psychological tests to develop differential treatment strategies. Because of their psychometric qualities, relative to unstructured interview methods, and because they are adaptable to complex statistical manipulations, psychological tests seem to be ideal for developing standardized procedures for differentially assigning or recommending psychosocial treatments (Butcher, 1990; Graham, 1987, 1990). However, most of the indicators and signs that are employed in making differential mental health treatment decisions from psychological tests are based on clinical experiences and conjectures, rather than on empirical evidence of improved treatment efficacy. Accordingly, this chapter is devoted to providing a selective overview of some of the limited research available, which suggests that test performance may predict both treatment outcome and, more importantly, a differential response to available treatments.

Predictive Dimensions in Differential Therapeutics Psychological tests primarily are used in three ways: (a) to determine a clinical diagnosis, (b)

to assess the frequency and intensity of transitory states, and (c) to assess enduring traits that predict future behaviors or symptoms. Unfortunately, not all of these uses are equally effective for differentially assigning patients to specific mental health treatments. For example, the use of psychological tests for obtaining information about symptoms and patterns that may then be checked against diagnostic criteria outlined by the DSM-III-R has decided limitations, largely arising from the fact that diagnostic labels reflect conceptual, rather than actual, entities. The criteria of disorders in DSM-III and its derivatives represent a

consensual opinion of a committee of psychiatric experts as to whether a given pattern of symptoms should be accorded the status of a socially viable syndrome or disorder. The committee’s decisions to recognize a given cluster of symptoms as a diagnosable condition traditionally has been based on: (a) the presence and frequency of the symptoms, (b) an analysis of the symptom’s social significance and interpersonal effects, and where the empirical evidence has warranted, (c) the specificity of the symptomatic response to various classes of drugs. However, the committees that have been responsible for the development of the various diagnostic and statistical manuals, although seeking to include specificity of drug response, largely have excluded or ignored empirical information about the patient dimensions (e.g., coping styles, resistance, conflicts, etc.) that have been associated with differential responses to various psychosocial treatments. Consequently, although a reliable diagnosis may be used

to indicate some medical interventions, it provides little information from which to develop a differentially sensitive psychotherapeutic program. For example, with the information available, we may reliably diagnose N.K. as having a

major depressive disorder. Given this diagnosis and his history of response to benzodiazapines, we may logically predict that tricyclic antidepressants may be more effective

than anxiolytics. Yet, we have little clue about whether he may respond better to cognitive therapy, behavior therapy, or psychodynamic therapy; or, for that matter, whether he should be seen individually, in group treatment, or in a treatment that includes other family members. To make

these latter treatment

assignments

differentially,

information

beyond

the

diagnosis is needed. We may find some clues on which to base decisions in the knowledge

3

TREATMENT PLANNING

that both N.K. and his wife reported that the anxiolytic medication initially was given at a time when the couple was experiencing marital conflict. At that time, their only son had been expelled from school on two occasions. The school personnel suspected that he was drinking quite heavily, but N.K. and his wife blamed the school officials for most of the difficulty, asserting that these officials simply did not understand teenage children. Arguments over how to handle the situation and whether to seek the family counseling recommended by the school authorities had taken its toll and the couple had contemplated a separation. It also may be helpful to know that when N.K. saw the psychiatrist who prescribed the benzodiazapine and recommended individual psychotherapy, the patient was reluctantly acceding to his wife’s request to seek family treatment. However, in the 3 months following his rejection of individual psychotherapy and his discontinuance of medication, N.K.’s wife had become committed to helping him through this difficult time. The marital problems were set aside and his son began doing better at school, but N.K. had become quite unable to go to work. The entire family became focused on N.K.’s symptoms, his ensuing lawsuit against the drug company for failing to inform him of the addictive properties of the drug, and his ongoing quest for help from doctors and physicians throughout the country. Most clinicians will develop a larger and richer array of treatment possibilities when they are provided with the foregoing, extra-diagnostic information. It may become obvious to most that whereas relying on variables that predict differential responses of patients to different classes of medication may have provided much needed reliability in the diagnostic system, doing so failed to address considerations of the relative value of different models (e.g., cognitive therapy, psychodynamic therapy, interpersonal therapy) and formats (i.e., family, individual, group) of psychosocial therapies. DSM diagnostic entities, with rare exceptions, do not embody the information that is needed to assign these treatments effectively and discriminately. This point is illustrated further in the observation that virtually any psychoactive drug in common use can be identified with a list of diagnostic conditions for which it is both indicated and contraindicated. In contrast, it is unlikely that an informed clinician would suggest that cognitive therapy was both indicated for a condition like major depression and was contraindicated for anxiety disorders, personality disorders, minor depressions, eating disorders, tics, sexual dysfunctions, sleep disorders, impulse control disor-

ders, substance abuse disorders, adjustment disorders, or virtually any other condition in which thought and/or behavior is disrupted. Indeed, cognitive therapy, behavior therapy, psychodynamic therapy, interpersonal therapy, and others all have been advocated as treatments for a multitude of diagnostic conditions. Determinations of differential assignment of these therapies require an assessment of extra-diagnostic variables.

Many authors have attempted to define the extra-diagnostic dimensions that may allow discriminating application of therapeutic procedures. Most of these efforts have provided guidelines for the application of different procedures within a single theoretical framework. For example, Hollon and Beck (1986) suggested the conditions under which cognitive therapy might be directed to schematic change versus changes in automatic thoughts, and Strupp and Binder (1984) suggested guidelines within which the psychodynamic therapist may differentially offer interpretations or support. However, because any single theoretical framework is less than comprehensive of the many foci, procedures, and strategies that are advocated by the available array of psychotherapies, these mono-theoretical guidelines are necessarily incomplete. Recognizing the limitations that exist when only procedures advocated by a single theory are selected for use, in recent years there has emerged a strong movement toward technical

eclecticism among practitioners and researchers (Arkowitz, 1992). The several approaches that comprise this movement, although diverse in type, share the objective of developing

57

58

BEUTLER, WAKEFIELD, WILLIAMS

guidelines for the selection of maximally effective interventions from the broadest possible array of proven procedures, regardless of the theories that spawned them. These guidelines specify the characteristics of patients and circumstances that best fit or match with the therapeutic procedures that have been defined by various theories of intervention. For example, Lazarus (1981) suggested that an analysis of patients’ Behaviors, Affects, Sensory experiences, /magery, Cognitions, /nterpersonal relationships, and need for Drugs (BASICID) portends the selection of interventions that best address those areas of functioning in which problems exist. Other authors have taken the view that the application of effective differential treatments is at least partially dependent on the stage of readiness or phase of treatment (Beitman, 1987; Fuhriman, Paul, & Burlingame,

1986; Prochaska, 1984). However, these stage models vary

from one to another by virtue of the degree of emphasis they place on patient variables (Prochaska, 1984) or intervening therapy goals (Beitman, 1987) as the stage indicators. Such differences have real implications for the use of psychological tests and the use of patient information in the planning of treatment.

Patient Predisposing Variables Potentially, psychometrically stable measurements of treatment relevant, pretreatment, patient dimensions (i.e., predisposing variables) could be used to identify markers for the application of different interventions. Unavoidably, however, the patient, treatment, and matching dimensions that potentially affect treatment assignment is virtually limitless (Arkowitz,

1992; Beutler, 1991).

To bring some order to the many variables and diverse hypotheses associated with the several models of differential treatment assignment, and to place them in the perspective of empirical research, Beutler and Clarkin (1990) grouped patient characteristics presented by the different theories into a series of superordinate and subordinate categories. This classification included five relatively specific classes of patient variables, which are distinguished by their susceptibility to measurement using established psychological tests and their ability to predict differential responses to psychosocial treatment. These categories included: (a) symptom severity, (b) problem-solving phase achieved, (c) problem complexity, (d) potential

to resist therapeutic influences, and (e) style of coping with threat. These “Patient Predisposing” dimensions provide points of reference for organizing the topics of this chapter as we consider the use of psychological tests for treatment planning. Because of the limited space available, we restrict our discussion of treatment planning to psychosocial interventions and initial patient predisposing variables. We give only cursory attention to how tests have been used for the selection either of medical/somatic

medication, etc.), establishing alterations in mid-course.

interventions (e.g., hospitalization, ECT,

a DSM diagnosis, or for the purposes of making treatment

Likewise, our discussion does not include differential treatment

planning for children and adolescents, explorations of the relationship between treatment initiated changes and subsequent modifications of treatment plans, or the relationship between psychotherapy process events and subsequent outcomes. The reader is referred to Beutler and Clarkin (1990) for a more extensive consideration of both patient and treatment variables within these latter classes of treatments. Table 3.1 summarizes some representative instruments that may be used for assessing these various dimensions.

3

TREATMENT PLANNING

59

TABLE 3.1 Representative Tests

Symptom Severity

Test

BDI" HRSD** SCL-90-R* STAI* Stages of change” CCRT™ lIP* MMPI* BPRS** Therapeutic reactance” CPi*

Stage of Change

Problem Complexity

Resistance Potential

Coping Style

x

x Xx

x XK x KK

x x

xX x

X

x X

xX

Note. * = self-report instruments; ** = observer report instruments.

SYMPTOM SEVERITY Symptom severity is a variable that typically is considered to be a transitory or changeable symptom state, and, as such, is often used as a dependent variable in assessing the efficacy of treatment. However, in addition to their role as dependent measures for evaluating treatment

efficacy, psychological measures of the severity of psychological symptoms have been used as indices of both treatment prognosis and indicators for the differential application of medical and psychosocial treatments. In clinical research, the Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Depression (HRSD; Hamilton,

Mock,

& Erbaugh,

1961), the Hamilton Rating Scale for

1967), the SCL-90-R (Derogatis,

1977), and the State-Trait

Anxiety Inventory (STAI; Spielberger, Gorsuch, & Lushene, 1970) have been studied most often for these purposes. The empirical literature on the relationship between symptom severity and treatment prognosis reveals an immediate paradox. On the one hand, some studies suggest that psychosocial interventions are most effective among patients with initially and relatively low levels of psychiatric symptom severity. On the other hand, there are studies that suggest that treatment efficacy is associated positively with symptom severity. The observations of a negative relationship between initial severity and treatment outcome seem to be especially consistent among patient samples with somatic complaints. For example, Blanchard,

Schwarz, Neff, and Gerardi (1988) determined that, among patients with

irritable bowel syndrome, those whose STAI trait scores were in the mild and moderate ranges were most likely to benefit from behavioral and self-regulatory treatment. Similarly, using the BDI as a measure

of depression severity, Jacob, Turner, Szekely, and Eidelman

(1983) suggested that low scorers on the BDI were more likely than high scorers to benefit from relaxation as a treatment for headaches. In this latter study, those patients with moderately high scores were more inconsistently affected by treatment. To complement this picture, there is evidence that patients whose scores suggest quite

severe symptomatology respond well to medical interventions. For example, the well-known National Institutes of Mental Health (NIMH) collaborative study of depression (Elkin et al., 1989), using composite measures of initial symptom severity, determined that those with

60

BEUTLER, WAKEFIELD, WILLIAMS

severe symptoms responded more rapidlyto tricyclic antidepressants than to psychotherapy. These contrasting findings suggest that psychiatric symptom severity may differentially predict the value of psychosocial and medical therapies. The picture is not quite as clear as the foregoing retrospective finding indicates, however. For example, several studies have attempted to use symptoms “endogeneity” (generally considered to represent a dimension of severity and nonreactivity) as an index for predicting the differential efficacy of medical and psychosocial interventions among depressed patients. Although these studies have confirmed the hypothesis that pharmacotherapy achieves its greatest efficacy among patients with endogenous symptoms, compared with those with less severe reactive depressions, they (see Simons & Thase, 1992) surprisingly found that psychosocial therapy (various forms of cognitive therapy) was equally effective among patients with either type of depression. Moreover, the overall success rates were at least equivalent for both types of treatment among the more serious depressions. Although supportive of our thesis, all of the studies suffered from a relatively restricted range of symptom severity, with the largest numbers of subjects being in the reactive and moderately depressed groups. Hence, it is uncertain if this equivalence would be manifest if samples were large enough to detect small differences. To complete the paradox, there also is evidence that responsiveness to psychotherapy is associated positively with symptom severity. This apparent contradiction is resolved only if one considers the range of severity represented by these various findings. Within the group of patients with no greater than moderate levels of mood and emotional disturbances, several studies suggested that psychosocial treatments achieve their greatest effects among those with relatively high levels of symptom severity (e.g., Klerman, 1986; Klerman, DiMascio, Weissman, Prusoff, & Paykel, 1974; Lambert & Bergin, 1983). Especially among those with

ambulatory depressions, general anxiety, and diffuse medical symptoms, there is accumulating evidence that initial symptom severity is related positively to efficacy of psychosocial treatments. Using the BDI, for example, Parker, Holmes, and Manicavasager (1986) found

that initial depression severity was correlated positively with treatment response among general medical patients. Likewise, Mohr et al. (1990) observed that the likelihood (although not the magnitude) of response to treatment was associated positively and linearly with general symptom severity on the SCL-90-R among patients with moderately severe depression. Indeed, among patients with more typical, mild, and moderate symptom severity, consensual evidence suggests that psychosocial interventions are as effective as antidepressant and antianxiety medications (Elkin et al., 1989; Nietzel, Russell, Hemmings, & Gretter, 1987; Robinson, Berman, & Neimeyer, 1990).

In addition to being predictors of the likelihood of a positive response to generic psychosocial treatments and medication, indicators of subjective symptomatic distress also have found suggestive value for predicting differential response to specific forms of psychotherapeutic treatment. For example, in the NIMH collaborative study of moderate depression, severity as measured by the BDI and the HRSD, on post hoc analysis, proved to differentiate the efficacy of the psychotherapeutic treatments (Imber et al., 1990). Those

patients with the most severe symptoms were treated most effectively by interpersonal

psychotherapy, compared with cognitive therapy. When taken with the other findings reviewed, such findings suggest that a curvilinear relationship may exist between the severity of depressive and anxious symptoms on the one hand, and the effectiveness of psychosocial treatments on the other hand. Although very severe symptoms may serve as indicators for medical treatments and may contraindicate

psychosocial interventions as primary treatments, mild and moderate levels of depression

and anxiety may be predictive of a good response to psychotherapy. Hence, differential

3

61

TREATMENT PLANNING

assignment of medication and psychosocial interventions may be dependent on the severity level achieved. The treatment of severe symptoms of anxiety and depression through psychosocial interventions may be especially difficult when psychotic symptoms and a history of impoverished sociodemographic and interpersonal resources complicate the picture. Likewise, such treatments may be ineffective when patients have little subjective distress to motivate them. Collectively, these findings suggest that severe, incapacitating levels of depression and anxiety may be taken as indicators for the use of medication and environmentally restrictive or intensive interventions. This point is seen most clearly clinically when suicidal symptoms are observed and protective controls are required. However, in the absence of debilitating psychotic or suicidal symptoms, psychosocial interventions normally are indicated. Within the cluster of psychotherapy types, more severe symptoms seem to be most responsive to therapies that focus on interpersonal and insight objectives, rather than on interpersonal and symptomatic patterns of response and behavior.

PROBLEM-SOLVING Prochaska (1984; Prochaska & DiClemente, 1986) suggested that the degree to which the therapeutic processes associated with different theoretical approaches are realized is dependent on the corresponding stage of problem resolution and readiness achieved by the patient or client. He and his colleagues have identified four stages through which an individual proceeds in the course of normal problem resolution: precontemplation, contemplation, action, and maintenance. At each of these four stages (later extended to five to include preparation), patients are considered to be maximally susceptible to the change processes that differentially characterize various psychotherapy models. For example, at the precontemplative phase, one may be most susceptible to procedures that raise consciousness and increase sensory arousal. At the contemplative stage, one may be more responsive to insight procedures. At the action stage, one may be maximally susceptible to stimulus control and counterconditioning procedures. Prochaska’s work has stimulated more than a modest amount of research on how people prepare themselves for making changes and how these processes may be used for treatment planning (e.g., Prochaska, Rossi, & Wilcox,

1991). Prochaska, Velicer, DiClimente,

and

Fava (1988) developed the Stages of Change Questionnaire, an instrument designed to assess these stages of readiness and receptivity to different interventions. Accordingly, they have

applied this instrument to the successful prediction of response to varieties of treatments designed to reduce smoking. Although this work is very promising, the application of the concepts to more serious disorders is still pending.

A different perspective on stages used by Beitman (1987) identifies not the patient qualities, but the objectives to be accomplished at a given time in treatment and for which different procedures may be differentially useful. Stages from this perspective represent shifts in the focus of therapeutic interventions, rather than patient demarcating points for selecting techniques themselves. These foci include relationship building, pattern inspection, implementation of changes in these observed patterns, and preparation for termination. Beitman proposes that procedures from different theoretical perspectives are differentially effective at achieving the goals of these phases and should be selected accordingly. Because these are considered to be the same for all patients and are not associated with an established

measure, Beitman’s formulations have less meaning to differential treatment planning than those of Prochaska et al. (1988). However, Beutler and Clarkin (1990) extended Beitman’s

PHASE

62

BEUTLER, WAKEFIELD, WILLIAMS

proposals to differential treatment planning by relating them to the phases and processes of change suggested by Prochaska. Accordingly, they offered the tentative hypothesis that the stages of personal problem solving described by Prochaska and the stages of therapy proposed by Beitman may be reciprocal—the stage of problem resolution, as defined by Prochaska, may dictate whether the short-term objectives of therapy should be on relationship enhancement, pattern search, pattern change, or termination, as defined by Beitman. However, at this point these proposals remain speculative and await empirical verification.

PROBLEM COMPLEXITY In addition to the severity of symptomatic presentation and the stage of problem resolution achieved, problems also vary in their complexity (Beutler, 1983; Beutler & Clarkin, 1990).

Complexity is distinguished from severity most directly by the pervasiveness of recurrent interpersonal patterns associated with problems and symptoms. These recurrent patterns are often responsive to symbolized, rather than direct or obvious similarities among evoking events. Indeed, the presence of a common symbolic meaning among diverse problems is thought to underlie the enactment of recurrent interpersonal themes. Hence, these symbolic themes often are expressed through disruption in a large number and variety of social systems and transcend specific events and situations. The definition of these dimensions requires the measurement of a combination of symptomatic and trait-like qualities. As the relationship between the conditions that originally provoked the symptoms and behaviors and their subsequent form and manifestation becomes less easily specified, and as problematic behavior becomes more easily classifiable as a pervasive, recurrent pattern that is interwoven within an individual’s social behaviors, the resulting complexity of the problem may serve as a differential indicator for using treatments whose objectives are on thematic changes, rather than on symptom removal alone. The importance of this latter point is seen by the observation that psychosocial interventions, as a rule, are aimed at more broadly based changes than medical ones (DeRubeis et al., 1990; Simons, Garfield, & Murphy, 1984); and

that, within this former domain, insight-oriented approaches are aimed at broader changes than behavioral ones (Strupp & Binder, 1984).

A body of additional literature (see review by Beutler, 1979) extended these findings to differences among a wide variety of psychotherapies, and suggested that differential efficacy of these approaches may be observed as a function of the degree of complexity of the problem and the breadth of the focus of the therapeutic interventions used. Support for the suggestion that pervasive interpersonal disturbances may serve as differential predictors for the use of psychosocial interventions came from a study by Mohr et al. (1990). Using the Inventory of Interpersonal Problems (Horowitz et al., 1988) as an indicator of pervasive interpersonal disturbance, Mohr et al. (1990) found that high scores on this latter

measure, when taken along with an index of symptom severity from the SCL-90-R, predicted both those who would respond positively and those who would respond negatively to psychosocial interventions. Three interventions—cognitive therapy, experiential therapy, and selfdirected therapy—all proved to have higher success rates among patients who had both moderate symptoms of depression and high levels of interpersonal distress. These findings suggest that more and less broadly focused treatments may be differentially effective as a function of the degree to which pervasive interpersonal patterns and themes are characteristic

of the problem presented. As a very broadly focused set of treatments, interpersonal and dynamic therapies may be indicated for complex problems of moderate severity, whereas

3

TREATMENT PLANNING

progressively narrowly focused cognitive, psychopharmacological therapies may be indicated as the targeted problems become less complex. Some measurement instruments are currently under development that hold promise both for identifying pervasive themes and for assisting the clinician in the application of specific theme focused procedures. For example, the Core Conflictual Relationship Theme (CCRT) method (Crits-Christoph, Luborsky, et al., 1988; Luborsky, Crits-Christoph, & Mellon, 1986) relies on a structured clinical interview for assessing the basic themes that underlie complex, dynamically oriented problems. The method relies on an assessment of basic relationships in an individual’s life and identifies three sequential aspects of recurring interpersonal behavioral patterns: (a) the organizing wishes served by the pattern, (b) the responses anticipated from others if these wishes are expressed, and (c) the consequential selfreflections made by the patient. Using the CCRT method, recurrent patterns among these three variables can be identified reliably and a determination can be made of the pervasiveness with which a finite number of themes characterize interpersonal relationships. However, the CCRT is extremely time consuming and labor intensive, requiring up to 22 hours to complete. Thus, recent work is moving toward defining a standard set of thematic categories (Crits-Christoph & Demorest, 1988) and a self-report measure (Barber,

1989).

From the standpoint of predicting responses to treatment, the presence of pervasive and recurrent themes may indicate the relative value of symptom and conflict-focused therapies. For example, dynamic therapies assume that a small number of themes are reenacted in one’s important relationships, reflecting transference from earlier experiences. Accordingly, a major focus of treatment from this standpoint is the interpretation of these pervasive patterns as they emerge in psychotherapy, both with the therapist (the transference neurosis) and with others in the patient’s life. If different themes are manifest in an individual’s different relationships, it indicates that there is not a pervasiveness to any finite number of patterns, and that situational discrimination of interpersonal expectations exists. Hence, among those with multiple or nonpervasive themes, dynamic therapy may be expected to be less powerful. Conversely, it is logical to expect that, in the latter case, a treatment may be of more value if it sequentially addresses each of the independent problem areas (i.e., is more symptom than conflict focused).

Although this latter hypothesis awaits empirical test, recent evidence suggests that the value of psychodynamic interpretations of thematic content is greater when the therapy is based on psychodynamic theoretical constructs than when the interpretations and associated constructs are not part of this theoretical framework (Goldfried, 1991). Evidence also is emerging to indicate that, once identified, pervasive recurrent themes are useful as guides for constructing interpretative interventions. Indeed, the more accurate the interpretation, as assessed by its correspondence with the identified pattern, the greater the likelihood of a successful therapy outcome (Crits-Christoph, Cooper, & Luborsky, 1988). Although limited by some of the concerns highlighted in the foregoing paragraphs, the available data support several conclusions. Specifically, psychosocial interventions are effective for a variety of depressive conditions varying in complexity. Combined with emerging research on psychodynamic themes and response to therapy, this means that both symptomfocused psychopharmacological interventions and symptom-focused psychological interventions may be indicated most clearly among patients whose conditions are relatively uncomplicated by social and interpersonal precipitators and symbolic transformations. In turn, the degree of efficacy of broad

band

or conflict-focused,

psychosocial

intervention

will be

manifest most directly when problems are reflective of a pervasively recurrent dynamic or

63

64

BEUTLER, WAKEFIELD, WILLIAMS

interpersonal process. In other words, treatment efficacy appears to be greatest when the breadth of the interventions used correspond with the complexity of the problem presented.

REACTANT/RESISTANCE TENDENCIES Several investigations have been undertaken to find indicators of patients’ resistance to psychosocial interventions. These indicators usually are considered to be moderately reactive states, rather than traits, and are thus differentially influenceable by therapeutic interventions. However, some efforts to explore these qualities have assumed them to have some traitlike qualities and to predict general prognosis as well as serving as differential indicators for various therapeutic interventions. For example,

Arkowitz

(1991) determined

that resistance

traits both retarded

overall

outcomes and served as differential predictors of response to different psychotherapies. To better define the nature of resistance, Khavin (1985) compared the psychological characteristics of 50 young adult males who were being treated for stuttering with psychosocial interventions. Those whose MMPI profiles were characterized by anxiety, impulsivity, and conflicts over gender role attitudes were found to be resistant to intervention. In a more global fashion, Knight-Law,

Sugerman,

and Pettinati (1988) applied a personality classification

system from the MMPI to patients being treated for alcoholism. They found that those classified as situationally “reactive” and “essential” were responsive to treatment. Similar classification systems have been applied to the behavioral treatment of individuals with somatic symptoms (LaCroix, Clarke, Bock, & Doxey, 1986), alcohol use (Sheppard, Smith, & Rosenbaum, 1988), eating disorders (Edwin, Anderson, with chronic back pain (Trief & Yuan, 1983).

& Rosell,

1988), and patients

Both the Ego Strength (ES) subscale of the MMPI (Barron, 1953) and the Therapeutic Reactance Scale (Dowd, Milne, & Wise, 1991) were designed specifically as trait-like measures of prognosis to be used to predict resistance to treatment. However, the success of the ES subscale has been mixed (Graham,

1987), and the Therapeutic Reactance Scale has

yet to be tested in clinical populations. Overall, efforts to develop and use state-like measures of resistance potential have proven to be more successful than general trait predictors of patient prognosis or motivation when applied to the prediction of differential responses to treatment styles. For example, Kolb, Beutler, Davis, Crago, and Shanfield (1985) demonstrated that an

SCL-90-R measure of paranoid defensiveness predicted the differential value of the therapist’s empathetic communication and value similarity to the client. Highly defensive individuals resisted empathic statements and resisted assimilating the therapist’s value and attitudinal stances, responding better to less empathic therapists with whom they were able to maintain value differentiation. In recent years, several investigators have begun to look for qualities of clients that

portend the differential use of directive and evocative interventions based on these resistance traits and states. For example, Shoham-Salomon and Hannah (1991) proposed that client reactance level portended differential response to directive interventions. Similarly, Beutler (1983, 1991) suggested that reactance or resistance propensities were indicative of the use of nondirective interventions. Shoham-Salomon, Avner, and Neeman (1989) extended this proposal to include the use of paradoxical directives among high resistance-prone (reactant) clients.

A prospective test of the hypothesis that clients who varied on resistance potential also would vary in response to directive and nondirective therapies was undertaken by Beutler et

Ps

3

TREATMENT PLANNING

65

al. (1991), using a combination of MMPI social desirability and anxiety research scales as an index of defensive anxiety. This study demonstrated that two different manualized therapies that utilized different types, but similar amounts of therapist directiveness were more effective for reducing depressive symptoms than a nondirective therapy among subjects who were predertermined to be resistance-prone. Conversely, among low resistance-prone depressed subjects, the nondirective therapy surpassed the directive ones in effecting change in depressive symptoms. This result was cross-validated at a 1-year follow-up, in which relapse rates were studied (Machado, Beutler, Engle, & Mohr, 1993), and also was extended to a crosscultural sample of several alternative measures of resistance (Beutler, Mohr, Grawe, Engle,

& MacDonald, 1991). Overall, these results suggest that resistance on the part of patients may be countered best with nondirective nonresistance on the part of the therapist. The possibility of a complementary patient—therapist relationship on factors such as dominance and reactivity is intriguing.

COPING STYLES Coping styles are conceptualized most frequently as trait-like qualities and are measured by way of omnibus personality tests (Butcher, 1991). The instruments used most to assess coping styles are the MMPI and the Eysenck Personality Inventory. Implications for treatment planning also have been explored using other instruments, such as the Rorschach, the Millon Clinical Multiaxial Inventory (MCMI), and the more diagnostically oriented Brief Psychiatric Rating Scale (BPRS), but this literature is either too meager or too contradictory to provide many clear guidelines. Nonetheless, research is beginning to suggest that behaviorally and insight-oriented psychotherapies are differentially effective for patients who vary on the dimension of coping style. For example, in a well-controlled study of interpersonal and behavioral therapies, Kadden, Cooney, Getter, and Litt (1990) determined that alcoholic subjects who were high

and low on the California Psychological Inventory (CPI) Socialization subscale, a measure of sociopathy, were differentially responsive to treatments that were based on behavioral and interpersonal insight models, respectively. Continued improvement over a 2-year follow-up period also was found to be greatest among these compatibly matched client-therapy dyads (Cooney, Kadden, Litt, & Getter, 1991).

A similar client dimension based on the MMPI also has been differentially predictive of response to different psychotherapies among anxious and depressed outpatients. Depressed clients who scored high on MMPI

externalization subscales (Pd + Pa) were found to be

relatively more responsive to manualized cognitive-behavioral therapy than to manualized insight-oriented psychotherapies, a pattern that was reversed among those who scored low on this acting out dimension (Beutler, Engle et al., 1991). This finding was cross-culturally validated on other measures of coping style as well (Beutler, Mohr et al., 1991). Similarly, Beutler and Mitchell (1981) compared the outcomes of anxious and depressed patients who were dichotomized on the basis of Welsh’s (1952) MMPI-internalization ratio and whose therapists also varied in their preference for psychoanalytic and experiential orientations. The authors found a main effect favoring experiential therapy and a therapist-

type by client-type interaction effect. The latter effect indicated that analytically oriented trainee therapists exerted their strongest effects on clients whose MMPIs indicated the presence of internalizing (in contrast to externalizing or impulsive) coping styles. This latter

finding was later cross-validated by Calvert, Beutler, and Crago (1988) in a larger sample of

trainee therapists as well as therapies representing both insight and behavioral approaches. In

66

BEUTLER, WAKEFIELD, WILLIAMS

this latter study, it was determined that trainee therapists whose preferred orientations emphasized behavioral contingencies exerted the most positive effects on psychiatric clients who presented with externalizing, rather than internalizing, coping styles. Conversely, therapists whose approaches were designed to facilitate insight and internal change produced the most positive results among internalizing patients. In contrast to the complementary pattern that emerged between resistance tendencies and therapist directiveness, coping style research suggests a different pattern is conducive to differential assignment of treatment methods. The evidence reviewed here suggests similarity (rather than complementarity) of focus between externalizing patients and externally oriented treatments, and, conversely, between internalizing patients and internally focused treatment

is enhancing of positive outcomes.

COMBINATIONS OF MATCHING DIMENSIONS Although not yet reflected in empirical research, theoretical literature suggests that matching patients to treatments by utilizing a number of dimensions at once may enhance outcome more than by using any single dimension (Beutler & Clarkin, 1990). This may be especially true if the matching dimensions are related to, and well grounded in, a guiding theory. To the degree that it is available, empirical literature supports this suggestion and indicates that theory-driven, multidimensional matching may allow for a relatively efficient selection of coping and therapy dimensions, rendering the findings potentially more generalizable. The most visible research of this type has been based on a circumplex model of personality, originally associated with Jungian theory. This model incorporates concepts that appear conceptually similar to both coping style and defensive dimensions as described in the previous paragraphs, hence its inclusion in this discussion. Over the course of years, a large number of matching studies found the Myers-Briggs Type Indicator (MBTI), based on Jungian concepts, to be promising for predicting efficacious client—therapist matches (Mendelsohn & Geller, 1963). A review of literature (Carl-

son, 1985) on the MBTI, however, revealed a large number of published studies, but relatively few systematic, well-controlled investigations on actual therapy populations over the past decade. Contemporary researchers on the circumplex model are studying therapeutic processes as objective states, rather than as subjective traits. This more recent work is represented best by the theoretical and empirical work of Kiesler (1988) and Benjamin

(1974). The common methodological roots of Kiesler’s and Benjamin’s work are in the early writings of Leary (1957). Leary maintained that social behavior can be understood as the interplay between two orthogonally defined interpersonal motives—love and dominance. More recent modification have broadened the definition of love to encompass a dimension of affiliation or friendliness, with the opposing end of this dimension represented as externalized hostility. Dominance, similarly, is conceptualized as a dimension ranging from striving for interpersonal control to submitting to the control of others. As related to treatment recommendations, these dimensions roughly correspond to coping style and resistance qualities, respectively. Theoretical formulation express these motives as existing along one or more circumplex surfaces. Behaviors within the dominance to submission continuum (i.e., resistance) elicit an

opposite response from others (e.g., dominance elicits submission and vice versa). In contrast, behaviors within the active affiliation dimension (e.g. , active and passive coping styles)

3

TREATMENT PLANNING

67

elicit responses in kind from others (e.g., friendliness begets friendliness and anger begets anger). The principal distinction among the various measurement methods within this interpersonal model is one of complexity. For example, Benjamin (1974) proposed a complex of three circumplex surfaces, whereas Kiesler (1988) proposed a single, two-dimensional circumplex. As applied to psychotherapy research, the simpler model of Kiesler and Watkins (1989) has lent itself to use as both a method of matching clients to compatible therapist styles and as a measure of the quality of the therapeutic relationship. The more complex model of Benjamin has been applied to evaluating theory relevant changes during therapy (Henry, Schacht, & Strupp, 1990), and to assessing therapist—client interactions (Henry & Strupp, 1991; Rudy, McLemore, & Gorsuch, 1985). Generally, at least moderate levels of complementarity (i.e.,

similar levels of friendliness and contrasting levels of expressed control) in therapeutic communication has been found to be associated with positive outcomes (Dietzel & Abeles, 1975; Kiesler & Watkins, 1989; Tracey & Hays, 1989).

Selection of Appropriate Instruments The selection of appropriate instruments to measure the patient’s presenting symptoms, personality traits, and transitional states is an important concern for the clinician. In each of the previous sections, representative instruments were presented whose use has been explored empirically. There are numerous other instruments available, however, and those presented are not the only ones of value. Hence, the clinician must keep in mind several important considerations when selecting and using instruments for treatment planning purposes (Cattell & Johnson,

1986; Goldstein, & Hersen,

1990).

First, the clinician must select instruments that, together, measure a variety of dimensions, such as those discussed in the foregoing, which have been selected because of their potential importance in making the treatment decisions required. Some instruments measure more than one dimension, but few measure all of the qualities recommended here. Even if they did, one also must consider the advantages and costs of including multiple instruments, some of which reflect different points of view or embody different perspectives. Both observer ratings and self-report measures should be used whenever possible to prevent the unchecked influence of a single perspective from biasing the results. Additionally, whenever possible, one should select instruments that have been referenced

to relevant norm groups. Special emphasis should be given to ensuring that the normative data includes groups that are similar to those on which the instrument will be applied in planning treatment. Normative data allows one to compare an individual’s scores to represen-

tative, larger groups. This reliance on normative data, rather than clinical experience, avoids potential bias that may occur if one’s clinical experience is not representative. The concern with comparing patient scores to normative values is especially important when dimensions are being assessed that do not provide a specific means for classifying patients into discrete groups (e.g., diagnoses), but provide a continuous measure of some

normally distributed trait. The measurement of most personality traits and states usually is accomplished best by tests into discrete categories. In normative expectation that Beyond these concerns,

that provide continuous scores, rather than classifying patients such cases, it is only when one can compare a given patient with one can make meaningful treatment decisions. special attention should be given to selecting instruments that

68

BEUTLER, WAKEFIELD, WILLIAMS

embody stable psychometric properties. Four general psychometric criteria should be considered. Whenever possible, each instrument should embody the following qualities: (a) reliability, (b) sensitivity, (c) specificity, and (d) concurrent or predictive validity.

RELIABILITY The first standard that one should seek in selecting instruments for planning treatment is reliability. Information on this aspect of an instrument’s psychometric value is that which is published most often in the manuals that describe the use of various tests. Reliability defines the degree to which the scores obtained from a given person are stable indicators of some characteristic. Reliability information indicates that some quality actually is being measured, but it does not indicate directly whether it is the quality identified as significant by the clinician. Our discussions of validity address the importance of these latter determinations. At this point, it is important to observe that there are several different types of reliability, each suited to different measurement needs. Stability over time (test—retest reliability) is important to establish when measuring trait-like qualities; interrater reliability is necessary when assessing. changeable and relatively abstract states; and internal consistency is necessary when one is assessing unidimensional qualities. It is in the clinician’s best interest to become familiar with the various types of reliability and the literature on this topic that is relevant to each instrument used. It is especially important to determine if the instrument under consideration has been used reliably with the client group of interest previously.

SENSITIVITY The second and third psychometric criteria to be considered reflect aspects of concurrent validity. A test’s sensitivity provides an indication of the accuracy with which patients who actually have a condition or quality are classified. Unfortunately, all measurement devices produce a certain proportion of “false positive” classifications (i.e., they misidentify some patients that do not have a condition as being in the group who do). Two types of errors are included in false positive rates. First, the test may identify those who do not have a condition as being high on the dimension. For example, to be sensitive as a measure of symptom severity, the BDI must distinguish accurately those with high levels of severity from those with low levels on this quality. Second, the test may be unable to distinguish those who have a given characteristic from those who have other, similar qualities. Specificity of this type requires a demonstration of discriminant validity—the test must be able to detect the specific quality under study from among a variety of other qualities. In other words, the question the clinician must ask is, “Can the instrument distinguish the specific variable it purports to measure?” This criteria of measurement is especially important when the instrument is being used to select a differential diagnosis or to design a specific treatment plan for the patient. Most tests of depression are able to distinguish individuals with high symptom severity or distress from those with low levels of symptom severity, but often find it difficult to distinguish between those with symptoms of anxiety and those with depression. This is because anxiety and depression are overlapping qualities that have many symptoms in

common.

3

TREATMENT PLANNING

69

SPECIFICITY Even good test sensitivity does not mean that those who do not have a given condition will be identified accurately. Tests produce a certain percentage of “false negative” errors (i.e., they misidentify people who do not have a given quality as having it). Therefore, specificity of a test refers to the degree to which the instrument is able to identify accurately those individuals who are not characterized by the targeted quality. This is the “true negative” hit rate, and indicates that the instrument is useful for identifying patients who are inappropriate for different aspects of treatment. Assurance of specificity allows the clinician to make reliable decisions about whether the patient’s symptomatic severity is too low to support psychosocial intervention. The relationship between sensitivity and specificity is easily seen. For example, an instrument like the Beck Depression Inventory (BDI; Beck et al., 1961) is valuable for measuring symptom severity only if it can identify those who have high levels of severity (sensitivity) and those who have low levels (specificity). As one might imagine, it would hamper treatment planning if a measure used to assess such concepts as symptomatic distress or symptom severity yielded scores that were indistinguishable from scores on tests of resistance potential or coping style. Such lack of discriminant validity may lead one to make erroneous and unproductive treatment plans. Thus, the MMPI/MMPI-2, because it is more sensitive to detecting patient personality traits (Butcher, 1990), may be more serviceable for determining trait-like coping styles and resistance potential than in assessing symptom severity.

PREDICTIVE VALIDITY A fourth factor to consider in selecting specific instruments is whether there is empirical support for the use of a particular instrument to predict treatment outcome. For example, the Stages of Change Questionnaire, suggested for assessing the problem-solving phase in which the patient presents him or herself, appears to be a valid instrument for predicting smoking cessation (Prochaska et al., 1988). Moreover, MMPI measures of coping style have demonstrated validity for predicting differential effects of different therapies (Beutler et al., 1991). In conclusion, although clinical instruments have the potential to assist the clinician significantly in treatment planning, there is much to consider when selecting a particular instrument. It behooves the clinician to remain current in assessment literature, and to interface with other clinicians utilizing personality measurement tools for case conceptualiza-

tion.

Summary and Conclusions Psychological tests have been used widely in the prediction of response to treatment. The

types of tests used to assess these various dimensions generally have been classifiable as state measures of symptoms, state and trait measures of responsivity to treatment demands, traitlike measures of coping style, or some combination of these. Symptom measures of intensity, severity, or complexity are most valuable for assessing outcome or predicting prognosis to treatment generally. Their specificity for assigning treatments differentially is weak.

70

BEUTLER, WAKEFIELD, WILLIAMS

In contrast, both state-like measures

of resistance and readiness for change,

and trait

measures of resistance potential, thematic conflict, and coping styles appear promising for selecting treatments that vary in directiveness and insight foci, respectively. These measures generally are drawn from omnibus personality measures and, in the case of coping style measures, may reflect complex processes that encompass both unconscious and conscious experience. These processes are reflected in symptomatic measures of severity, in indicators of emotional states, and in indicators of trait-like qualities.

‘Generally, research on these various uses of psychological tests suggests the following conclusions: 1. Symptomatic severity may indicate both the use of somatic therapies and the use of intensive or combinations of therapies. Although not consistently demonstrated, psychosocial interventions tend to be less effective than medical and somatic interventions among those with severe symptoms. However, among the mild and moderate levels of symptom severity, a positive relationship between severity and benefit of psychosocial interventions may be expected. Apparently, some level of distress is valuable for the effective use of psychosocial interventions, although extreme disturbance, especially if complicated by psychotic symptoms and demographic instability, may contraindicate psychosocial interventions. In the case of N.K., whose social impairment was quite severe, but whose symptoms of sleep disturbance and depression were moderately severe, his previous response to medication swung the decision toward psychosocial interventions. We elected a brief course of psychotherapy to correspond to the limited time that he was available to the treatment center. The observed, positive influence of his symptoms on the level of family disturbance further suggested the advisability of including his wife in the treatment sessions. 2. Stage of patient problem resolution may be an indicator of the type and focus of psychotherapeutic procedures. As a general rule, these relationships have not been studied among patients with personality, anxiety, and mood disorders, but they suggest that matching the action level of the intervention with the level of problem resolution achieved may be productive of treatment efficacy. N.K. presented at the active stage of problem resolution. He was seeking and trying solutions actively, although many of these were ineffective and aborted because they failed to address his beliefs about the nature of the symptoms and their causes. This observation indicated the value of treatment procedures that were active and that emphasized stimulus control or contingency management. 3. When the collection of patient problems are characterized by the presence of a smaller number of pervasive and recurrent interpersonal themes, treatments with broad areas of focus that include thematic patterns and relationships may be indicated. On the one hand, narrowly focused treatments, including somatic treatments that focus only on symptomatic behaviors and behaviorally oriented interventions, may be indicated most clearly when problems are responses to specific situations. In the case of N.K., there was no history of prior problems of this type. The onset of symptoms appeared to be quite specific to a precipitating situation that involved both a poorly managed medication regimen and contemporary, reinforcing family dynamic. Although medication effects may have accounted for symptom onset, the pattern of response suggested that the reduction of family distress that accompanied symptom reduction may have served as an external source of reinforcement. The situational specificity of the symptoms suggested the potential value of a symptom-focused intervention, but the possibility of preexisting depression and family problems raised the possibility that

more extended and conflict-focused interventions would be needed. We elected to treat the pressing symptoms of sleep disturbance and refer the patient and his wife for further marital treatment in their home city. 4. Patient resistance measures usually are extracted from omnibus personality measures and reflect state-like qualities that respond to differences in interpersonal contexts. The presence of these indicators of resistance seems to be indicative of the use of both directive and paradoxical interventions. Low levels of resistance indicators suggest a person who is responsive to a variety of interventions representing both directive and nondirective therapist styles. N.K.’s MMPI profile was characterized by peak scores on indices of anxiety, suspiciousness, and

3

TREATMENT PLANNING

71

somatic reactivity. Coupled with his history of resistance to treatment recommendations, these indices suggested high levels of potential resistance, and even the likelihood that he might respond in an oppositional fashion to therapeutic directives. Accordingly, we elected a paradoxical intervention. Then we confirmed his beliefs by suggesting that his sleep disturbance may have been caused by the medication. We suggested that medication and medication withdrawal may have sped up his biological clock, making it difficult or impossible to sleep. He was instructed to go to bed and rest, but to “avoid” falling asleep, with the rationale that this would provide both the needed rest and would also reset his biological clock. On the following day, when he apologized for having fallen asleep, we scolded him and sent him to try it again. This was repeated on the third and fourth nights, which also were characterized as failures to do the homework assignment. 5. Patient coping styles vary widely, but generally have been measured as extensions of a dimension from intropunitive to extrapunitive. Current research is active in the area of defining the relationships between these coping styles and psychosocial interventions, which vary in the degree to which they attempt to achieve insight versus behavioral change. As a general rule, patients whose behavioral coping emphasizes acting out and projecting blame are most responsive to behaviorally focused interventions; those whose coping styles are characterized by self-inspection and self-criticism are more responsive to insight-oriented interventions. N.K.’s externalizing coping style confirmed our decision to focus on behavior change and to deemphasize insight-oriented interventions. After three nights of “the best sleep I’ve had in months,” for example, we sent him home with the observation that we had failed in our treatment efforts, because we had been unable to keep him awake long enough at night to reset his biological clock. We expressed hope that his sleep would continue to improve, but cautioned that if (or when) sleep problems returned, he should attempt our assignment of staying awake at night again. We also recommended that he see a psychotherapist, along with his wife, so that she could learn to support his efforts to maintain his sleep pattern. A 6-month follow-up revealed that, although he had not taken the latter recommendation, the patient was still symptom-free.

Taken together, the research reported in this brief and selective review suggests that various combinations of dimensions allow discrimination among treatment variables and may point to directions in which the development and applications of treatments may evolve in clinical practice.

Acknowledgments .The authors express appreciation to Edwin Desax for this assistance on this chapter. This work was partially supported by NIAAA Grant No. AA08970.

References American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington DC: Author. Arkowitz, H. (1992). Integrative theories of therapy. In D. K. Freedheim (Ed) History of psychotherapy:A century of change (pp. 261— 303). Washington, DC: American Psychological Association. Arkowitz, H. (1991, August). Psychotherapy integration: Bringing psychotherapy back to

psychology. A paper presented at the annual meeting of the American Psychological Association, San Francisco, CA. Barber, J. P. (1989). The Central Relationship Questionnaire (version 1.0). Unpublished of Pennsylvania, University manuscript, School of Medicine. Philadelphia, PA. Barron, F. (1953). An ego strength scale which predicts response to psychotherapy. Journal of Consulting Psychology, 17, 327-333.

Tis

BEUTLER, WAKEFIELD, WILLIAMS

Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-569. Beitman, B. D. (1987). The structure of individual psychotherapy. New York: Guilford. Benjamin, L. S. (1974). Structural analysis of social behavior. Psychological Review, 8&1,

392-445. Beutler, L. E. (1979). Toward specific psychological therapies for specific conditions. Journal of Consulting and Clinical Psychology, 47,

882-897. Beutler, L. E. (1983). Eclectic psychotherapy:A systematic approach. New York: Pergamon. Beutler, L. E. (1991). Have all won and must all have prizes? Revisiting Luborsky, et al.’s verdict. Journal of Consulting and Clinical Psychology, 59, 226—232. Beutler, L. E., & Clarkin, J. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. New York: Brunner/ Mazel. Beutler, L. E., Engle, D., Mohr, D., Daldrup, R. J., Bergan, J., Meredith, K., & Merry, W. (1991). Predictors of differential and selfdirected psychotherapeutic procedures. Journal of Consulting and Clinical Psychology, 59,

333-340. Beutler, L. E., & Mitchell, R. (1981). Psychotherapy outcome in depressed and impulsive patients as a function of analytic and experiential treatment procedures. Psychiatry, 44,

297-306. Beutler, L. E., Mohr, D. C., Grawe, K., Engle, D., & MacDonald, R. (1991). Looking for differential effects: Cross-cultural predictors of differential psychotherapy efficiency. Journal of Psychotherapy Integration, 1, 121-142. Blanchard, E. B., Schwarz, S. P., Neff, D. F., & Gerardi, M. A. (1988). Prediction of outcome from the self-regulatory treatment of irritable bowel syndrome. Behavior, Research

and Therapy, 26, 187-190. Butcher, J. N. (1990). The MMPI-2 in psychological treatment. New York: Oxford University Press. Calvert,0:S. _J.,, Beutler. 9, sEgsid:| ‘Crago; M. (1988). Psychotherapy outcome as a function of therapist-patient matching on selected variables. Journal of Social and Clinical Psychology, 6, 104—117. Carlson, J. G. (1985). Recent assessments of the

Myers-Briggs Type Indicator. Journal of Personality Assessment, 49, 356-365. Crits-Christoph, P., Cooper, A., & Luborsky, L. (1988). The accuracy of therapists’ interpretations and the outcome of dynamic psychotherapy. Journal of Consulting and Clinical Psychology, 56, 490-495. Crits-Christoph, P., & Demorest, A. (1988, June). The development of standard categories for the CCRT method. Paper presented at the Society for Psychotherapy Research, Santa

Fe, NM. Cooney, N. L., Kadden, R. M., Litt, M. D., & Getter, H. (1991). Matching alcoholics to coping skills or interactional therapies: Twoyear follow-up results. Journal of Consulting and Clinical Psychology, 59, 598—601. Crits-Christoph, P., Luborsky, L., Dahl, L., Popp, C., Mellon, J., & Mark, D. (1988). Clinicians can agree in assessing relationship patterns in psychotherapy. Archives of General Psychiatry, 45, 1001-1004. Derogatis, L. R. (1977). SCL-90: Administration, scoring & procedures manual for the revised version. Baltimore: Clinical Psychometric Research. DeRubeis, R. J., Evans, M. D., Hollon, S. D., Garvey, M. J., Grove, W. M., & Tuason, V. B. (1990). How does cognitive therapy work? Cognitive change and symptom change in cognitive therapy and pharmacotherapy for depression. Journal of Consulting and Clinical Psychology, 58, 862-869. Dietzel, C. S., & Abeles, N. (1975). Clienttherapist complementarity and therapeutic outcome. Journal of Counseling Psychology, 22,

264-272. Dowd, E. T., Milne, C. R., & Wise, S. L. (1991). The Therapeutic Reactance Scale: A measure of psychological reactance. Journal of Counseling and Development, 69, 541-—

545. Edwin, D., Anderson, A. E., & Rosell, F. (1988). Outcome prediction of MMPI in subtypes of anorexia nervosa. Psychosomatics,

29, 273-282. Elkin, I., Shea, T., Watkins, J. T., Imber, S. D., Sotsky, S. M., Collins, J. F., Glass, D. R., Pilkonis, P. A., Leber, W. R., Docherty, J. P., Feister, S. J., & Parloff, M. B. (1989). National Institute of Mental Health treatment of depression collaborative research prograrn. Archives of General Psychiatry, 46, 971-982.

3 Frances, A., Clarkin, J., & Perry, S. (1984). Differential therapeutics in psychiatry. New York: Brunner/ Mazel. Fuhriman, A., Paul, S. C., & Burlingame, G. M. (1986). Eclectic time-limited therapy. In J. C. Norcross (Ed.), Handbook of eclectic psychotherapy (pp. 226-259). New York: Brunner/

TREATMENT PLANNING

Jacob, R. G., Turner, S. M., Szekely, B. C., & Eidelman, B. H. (1983). Predicting outcome of relaxation therapy in headaches: The role of “depression.” Behavior Therapy, 14, 457—

465.

Goldfried, M. R. (1991). Research issues in psychotherapy integration. Journal of Psychotherapy Integration, 1, 5—25. Goldstein, G., & Hersen, M. (Eds.). (1990). Handbook of psychological assessment (2nd ed.). New York: Pergamon. Graham, J. R. (1987). The MMPI: A practical guide (2nd ed.). New York: Oxford University Press. '

Kadden, R. M., Cooney, N. L., Getter, H., & Litt, M. D. (1990). Matching alcoholics to coping skills or interactional therapies: Posttreatment results. Journal of Consulting and Clinical Psychology, 57, 698-704. Khavin, A. B. (1985). Individual-psychological factors in prediction of help to stutterers. Voprosy-Psikhologii, 2, 133-135. Kiesler, D. J. (1988). Therapeutic metacommunication: Therapist impact disclosure as feedback in psychotherapy. Palo Alto, CA: Consulting Psychologists Press.

Graham,

Kiesler, D. J., & Watkins, L. M.

Mazel.

J. R.

(1990).

The MMPI-2:

Assessing

personality and- psychopathology. New York: Oxford University Press. Hamilton, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology, 6,

278-296. Henry, W. P., Schacht, T. E., & Strupp, H. H. (1990). Patient and therapist introject, interpersonal process, and differential psychotherapy outcome. Journal of Consulting and Clinical Psychology, 58, 768—774. Henry, W. P., & Strupp, H. H. (1991). Vanderbilt University: The Vanderbilt Center for Psychotherapy Research. In L. E. Beutler and M. Crago (Eds.), Psychotherapy research: An international review of programmatic studies (pp. 166-174). Washington, DC: American Psychological Association. Hollon, S. D., & Beck, A. T. (1986). Research on cognitive therapies. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 443-482). New York: Wiley. Horowitz, L. M., Rosenberg, S. E., Baer, B. A., Ureno, G., & Villasenor, V. A. (1988). Inventory of interpersonal problems: Psychometric properties and clinical applications. Journal of Consulting and Clinical Psychology, 56, 885-892. Imber, S. D., Pilkonis, P. A., Sotsky, S. M., Elkin, I., Watkins, J. T., Collins, J. F., Shea, M. T., Leber, W. R., & Glass, D. R. (1990).

Mode-specific effects among three treatments for depression. Journal of Consulting and Clinical Psychology, 58, 352-359.

(1989).

Inter-

personal complementarity and the therapeutic alliance: A study of relationship in psychotherapy. Psychotherapy, 26, 183-194. Klerman, G. L. (1986). Drugs and psychotherapy. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 777-818). New York: Wiley. Klerman, G. L., DiMascio, A., Weissman,

M. M., Prusoff, B., & Paykel, E. S. (1974). Treatment of depression by drugs and psychotherapy. American Journal of Psychiatry, 131,

186-191. Knight-Law, A., Sugerman, A. A., & Pettinati, H. M. (1988). An application of an MMPI classification system for predicting outcome in a small clinical sample of alcoholics. American Journal of Drug and Alcohol Abuse, 14,

325-334. Kolb, D. L., Beutler, L. E., Davis, C. S., Crago, M., & Shanfield, S. (1985). Patient personality, locus of‘control, involvement, therapy relationship, drop-out and change in psychotherapy. Psychotherapy: Theory, Research and Practice, 22, 702-710. LaCroix, J. M., Clarke, M. A., Bock, J. C., & Doxey, N. C. (1986). Predictors of biofeedback and relaxation success in multiple-pain patients: Negative findings. International Journal of Rehabilitation Research, 9, 376—

378. Lambert, M. J., & Bergin, A. E. (1983). Therapist characteristics and their contribution to psychotherapy outcome. In C. E. Walker (Ed.), The handbook of clinical psychology

73

74

BEUTLER, WAKEFIELD, WILLIAMS (Vol. 1, pp. 205-241). Homewood, IL: Dow Jones-Irwin. Lazarus, A. A. (1981). The practice of multimodal therapy. New York: McGraw-Hill. Leary, T. (1957). Interpersonal diagnosis in personality. New York: Ronald Press. Luborsky, L., Crits-Christoph, P., & Mellon, J. (1986). The advent of objective measures of the transference concept. Journal of Consulting and Clinical Psychology, 54, 39-47. Machado, P. P. P., Beutler, L. E., Engle, D., & Mohr, D. (1983). Differential patient X treatment maintenance among cognitive, experiential, and self-directed psychotherapies. Journal of Psychotherapy Integration, 3, 15-32. Mendelsohn, G. A., & Geller, M. H. (1963). Effects of counselor-client similarity on the outcome of counseling. Journal of Counseling Psychology, 10, 71-77. Mohr, D. C., Beutler, L. E., Engle, D., Shoham-

Salomon, V., Bergan, J., Kaszniak, A. W., & Yost, E. (1990). Identification _of patients at risk for non-response and negative outcome in psychotherapy. Journal of Consulting and Clinical Psychology, 58, 622—628. Nietzel, M. T., Russell, R. L., Hemmings, K. A., & Gretter, M. L. (1987). Clinical significance of psychotherapy for unipolar depression: A meta-analytic approach to social comparison. Journal of Consulting and Clinical Psychology, 55, 156-161. Parker, G., Holmes, S., & Manicavasagar, V. (1986). Depression in general practice attenders—“caneness,” natural history and predictors of outcomes. Journal of Affective Disorders,

10, 27-35.

Prochaska, J. O. (1984). Systems of psychotherapy: A transtheoretical analysis (2nd ed.). Homewood,

IL: Dorsey Press.

Prochaska, J. O., & DiClemente, C. C. (1986). The transtheoretical approach. In J. C. Norcross (Ed.), Handbook of eclectic psycho-

therapy

(pp. 163-200).

New

York:

Brun-

ner/Mazel. Prochaska, J. O., Rossi, J. S., & Wilcox, N. S. (1991). Change processes and psychotherapy outcome in integrative case research. Journal of Psychotherapy Integration, I, 103-120. Prochaska, J. O., Velicer, W. F., DiClemente, C. C., & Fava, J. (1988). Measuring process of change: Applications to the cessation of smoking. Journal of Consulting and Clinical Psychology, 56, 520-528.

Robinson, L. A., Berman, J. S., & Neimeyer, R. A.(1990). Psychotherapy for the treatment

of depression: A comprehensive review of controlled

outcome

research.

Psychological

Bulletin, 108, 30-49. Rudy, J. P., McLemore, C. W., & Gorsuch, R. L. (1985). Interpersonal behavior and therapeutic progress: Therapists and clients rate themselves and each other. Psychiatry, 48,

264-281. Sheppard, D., Smith, G. T., & Rosenbaum, G. (1988). Use of MMPI subtypes in predict-

ing completion of a residential alcoholism treatment program. Journal of Consulting and Clinical Psychology, 56, 590-596. Shoham-Salomon, V., Avner, R., & Neeman, K. (1989). “You are changed if you do and changed if you don’t”: Mechanisms underlying paradoxical interventions. Journal of Consulting and Clinical Psychology, 57, 590-

598. Shoham-Salomon, V., & Hannah, M. T. (1991). Client-treatment interactions in the study of differential change processes. Journal of Consulting and Clinical Psychology, 59, 217-225. Simons, A. D., Garfield, S. L., & Murphy, G. E. (1984). The process of change in cognitive therapy and pharmacotherapy for depression. Archives of General Psychiatry, 41,

45. Simons, A. D., & Thase, M. E. (1992). Biological markers, treatment outcome, and 1-year follow-up in endogenous depression: Electroencephalographic sleep studies and response to cognitive therapy. Journal of Consulting and Clinical Psychology, 60, 392—401. Spielberger, C. D., Gorsuch, R. L., & Lushene, R. E. (1970). The State-Trait Anxiety Inventory (STAI) test manual for Form X. Palo Alto, CA: Consulting Psychologists Press. Strupp, H. H., & Binder, J. L. (1984). Psychotherapy in a new key. New York: Basic Books. Tracey, T. J., & Hays, K. (1989). Therapist complementarity as a function of experience and client stimuli. Psychotherapy, 26, 462—

468. Trief, P. M., & Yuan, H. A. (1983). The use of the MMPI in a chronic back pain rehabilitation program. Journal of Clinical Psychology, 39,

46-53. Welsh, G. S. (1952). An anxiety index and an internalization ratio for the MMPI. Journal of Consulting Psychology, 16, 65-72.

Chapter 4 Use of Psychological Tests for Outcome Assessment Michael J. Lambert Brigham Young University

Outcome assessment is a branch of applied psychology that illuminates the strength of the effects of psychological interventions on patient functioning. In psychotherapy research, this assessment is done in the context of specific research designs aimed at answering specific questions of theoretical importance. The broad questions that are addressed in outcome assessment require an equally broad range of research designs to draw conclusions. Although assessment of outcome occurs only within the context of a particular research strategy (single case design, program evaluation, comparative outcome study, etc.), a limited set of procedures and principles guides the selection of outcome measures. In the following chapter, a brief history of outcome assessment is offered. This is followed by some guidelines for selecting outcome measures, followed by recommendations for assessing outcome in psychotherapy. This chapter serves as a guide in selecting outcome measures for use in clinical practice and research. A great deal of psychotherapy research has been undertaken over the past 50 years. Therefore, much is known about ‘outcome measure-

ment. This knowledge can aid the interested practitioner in selecting outcome measures in this age of accountability.

Brief History of Outcome Assessment The problems associated with assessing the changing psychological status of patients are, as Luborsky (1971) suggested, a “hardy perennial” in the field of psychotherapy. Historically,

psychotherapists have devoted themselves to defining and perfecting treatments, not assessing the consequences of these treatments systematically. Likewise, social or personality psychologists historically have developed assessment devices in contexts devoid of interest in personality change or symptomatic improvement. Personality psychologists have been more interested in static traits and stability than in change per se. Outcome assessment is the neglected domain between these two fields of study. Although measurement and quantification are central properties of empirical science, the

15

76

LAMBERT

earliest attempts at quantifying treatment gains lacked scientific rigor. But the field gradually has moved from complete reliance on therapist ratings of gross and general improvement to the use of outcome indices of specific symptoms that are quantified from a variety of viewpoints, including the patient, outside observers, relatives, physiological indices, and environmental data such as employment records. The data generated from these viewpoints are always subject to the limitations inherent in the methodology relied upon; none is “objective” or most authoritative, but represent an improvement from previous measurement methods, which were difficult to replicate because of their lack of clear operational definitions and systematic means of data collection. In the past, attempts at measuring change have reflected the fashionable theoretical positions of the day. Early studies relied on devices developed out of Freudian dynamic psychology. These devices (e.g., Rorschach and TAT) largely have been discarded as measures of outcome, because of their poor psychometric qualities, reliance on inference, and the fact that they mainly reflected the interest of orientations that emphasized unconscious processes. Even if scoring systems such as Exner’s for the Rorschach have overcome some of the psychometric problems associated with projective testing, these devices are not used in outcome studies because of practical constraints (they are time intensive). The use of these measures was followed by the use of devices consistent with client-centered theory (e.g., the Q-sort technique), behaviorism (behavioral monitoring), and, more recently, cognitive theo-

ries with their emphasis on automatic thoughts. Although outcome assessment always will be guided by “in-vogue” theoretical positions, the field as a whole has moved a long way from its early theoretical foundations. Nonetheless, there are important lessons to be learned from past attempts at measuring change. It would be unfortunate if we did not take something with us from the past to guide our current attempts to measure patient gains. Contemporary research reflects some significant lessons from early scientific efforts. The most important developments in assessing outcome have been the tendencies to: (a) clearly specify what is being measured, so that replication is possible; (b) measure change from multiple perspectives;

(c) employ different types of rating scales and methods;

(d)

employ symptom-based atheoretical measures; and (e) examine, to some extent, patterns of change over time. These practices are an improvement over the past, and they are highlighted further in the sections that follow.

The Current State of Outcome Assessment: Diversity if Not Chaos COMMON

MEASURES OF OUTCOME

All measures of outcome have weaknesses.

But, using measures that have a history of

frequent use will provide advantages that are not available with new or little known mea-

sures. Primary among these advantages is easy comparison across studies for comparable

levels of pathology and for comparable changes following treatment. Several surveys have summarized measures that occur frequently in studies of psychotherapy. Lambert (1983) reported that the following self-report scales were the most commonly used outcome measures in the Journal of Consulting and Clinical Psychology from 1976 to 1980: State-Trait

4 Anxiety Inventory (STAI),

Minnesota

Multiphasic

OUTCOME ASSESSMENT

Personality Inventory (MMPI),

Rotter

Internal-External Locus of Control, S-R Inventory of Anxiousness, and the Beck Depression Inventory (BDI). In their review of 21 separate American journals published between 1983 through 1988, Froyd and Lambert (1989) summarized usage data from 334 outcome studies. The most frequently used self-report scales were the: Beck Depression Inventory (BDI), State-Trait Anxiety Inventory (STAI), Symptom Checklist-90 (SCL-90), Locke-Wallace Marital Adjustment Inventory, and the Minnesota Multiphasic Personality Inventory

(MMPI). In a recent review of articles in the Journal of Consulting and Clinical Psychology (years 1986-1991), Lambert and McRoberts (1993) reviewed 116 studies of psychotherapy with adults. The frequency of outcome measure categorized by source is presented in Table 4.1. As can be seen, the measures employed continue to be similar within the category of selfreport methodology. Clearly, the BDI, STAI, SCL-90, and MMPI remain the most popular measures used across a broad sampling of disorders. As one moves within a specific disorder, the listing of most frequent scales can be expected to change, although it appears that the scales mentioned earlier, by Virtue of their focus on anxiety or depression, remain relevant and popular. Beyond self-report methodology, there is less consensus within categories of usage. The Hamilton Rating Scale for Depression (Hamilton, 1967) was used frequently in the studies reviewed by Lambert (1983) and Froyd and Lambert (1989). In the more recent survey, it remains relatively popular, either in the hands of the therapist or through the use of expert raters. The Locke-Wallace Marital Adjustment Inventory (Locke & Wallace, 1959) is the most frequently used specific scale employed with significant others. Despite the fact that some measures are used repeatedly, their frequency is still not high. For example, of the 384 uses of self-report scales reported in Table 4.1, the MMPI made up only 1.6% of the total. It is only a commonly used measure in the relative sense of the word. It is startling to discover the seemingly endless number of measures used to objectify outcome. In the Froyd and Lambert (1989) review of outcome studies, these journals were representative of a broad range of therapy as practiced and reported in contemporary professional literature. A total of 1,430 outcome measures were identified and represented a wide variety of patient diagnoses, treatment modalities, and therapy types. Of this rather large number, 840 different measures were used just once. A second review was of a more homogeneous set of studies, in which data on agoraphobia outcome published during the 1980s (Ogles, Lambert, Weight, & Payne, 1990) located 106 studies that used 98 unique outcome measures. This occurred in a well-defined, limited disorder, treated with an equally narrow range of interventions—mainly behavioral and cognitive-behavioral therapies. Similar conclusions have been drawn by Wells, Hawkins, and Catalano (1988), who reported more than 25 ways to measure drug usage in addiction outcome research. The proliferation of outcome measures (a sizable portion of which were unstandardized scales) is overwhelming. Those who assess change have not agreed on a standard battery of tests and procedures even within homogeneous patient populations. This seeming disarray of instruments is partly a function of the complex and multifaceted nature of psychotherapy outcome as reflected in the divergence in clients and their problems, treatments and their underlying assumptions and techniques, and the multidimensionality of

the change process. But it also represents the struggle (failure) of scientists and practitioners to agree on valued outcomes. Indeed, measuring the outcomes of psychotherapy promises to be a hardy perennial for years to come.

77

\—



YS

02 9

«=

Ov S

ON

.)

fw

a

ee

6 LL

€Ov

6L %

rat

le

yolqns JO MOIM9}U|

yelqns

sonsuepeseyo JO s0IWeYeq jo Buney

(Z9 = N) JaA19aSqO

JoIAeyaq oyloeds yo Aouenbesj el SF. ‘ON



z

jaray00

= EG ee

Le

Se es

pl

SS

ON SS

ae

ae

(99 = N) jsidesay, peures,

s6unes Buluonoury JO J@AQ| 40 peqo/6-maluezu| ee Se

uolsseideq Joy efedg Buyey uoyRwe}

SS

[eos epnymy peuonouryshq

efeos Se|SSe}

eluaiydoziyos pue siapsosig sARIayVY 40} e|NPEYOS 9

Aypeuosiag aiseydiinyy eyoseuUlW,

Y-06-10S

Aroyuenu| 9

S

S

el

cir OL ot

Asoyusauy Aeixuy Wes 1-8}21S

syyBnou Jo/pue Joineyeq Aueiq

at Le

rl

Leos OL

= QE aye uoieuidsey

eee

(P8E = N) Hodey-yeS

ee

sesreuuonsanb Jo Sajeos peyeoso-s0}UaWedx3 Aiojueau| uorssseideq 49eq

pasf) saws 0 'ON

le 8 30oOO 96 =O ee (0S = N) jejuawingysu

l230L jo %

uogisoduioo BAIIES BIE einsseid poojg ayeuueeH

ae

9

,

S iS th 6

%

osfe aes ST] ysidezaty aty ue) Joyjo auoeuros Aq pasa|stuTUIpe Sem }I UaYM aunseaul JaAJasqo peuyey & Se PayuNod SUM

SD 8

ee aJJEUUORSENH

queuysnipy Ajiwes ‘afeog “wewuolAUR

Ajurey YS!IpjoeuD

woydwihs eyt7 Ajwes

jo ainseew jo sn ajbuls

=O OL Ss vb = SIL

2 2

““B'a) Bujuogoury Ajiuey

queuojul Aq \sipjoeyo Wwajqoid soineyeq dyjoeds uo uoHeWUOjUu;

2 ot 22

‘ON %

BU

(St = N) suayio jueoyiubis

pee

ee

cs an %

22

by S1IeVL

ee 2 eS 8 sjuawissessy JO Spoujeyy pue SeliojUsAU| Pesp AjuoWWOD

78

-

4

OUTCOME ASSESSMENT

pis

CHANGE IS COMPLEX Although most current outcome research studies focus on seemingly homogeneous samples of patients (e.g., unipolar depression, agoraphobia), it is clear that each patient is unique and brings unique problems to treatment. For example, although the major complaint of a person is summed up as “depression” and this person meets diagnostic criteria for major depression, this same patient can have serious interpersonal problems, somatic concerns, evidence of anxiety, financial difficulties, problems at work, problems parenting children, substance abuse, and so on. These diverse problems may be addressed in therapy, and proper assessment of outcome may require that changes in all these problems be measured. Obviously this is a demanding task—one that cannot be accomplished fully in any particular study or by the practitioner. The complexity of human behavior and the complexity of theories and conceptions of human behavior invite incredible complexity in operationalizing the changes that occur as a result of psychotherapy. For example, Williams (1985) documented considerable evidence that, even within the

seemingly limited diagnosis of agoraphobia, there is considerable diversity among patients. He notes that there is considerable diversity in the situations that provoke panic across patients, including numerous phobias that often appear as simple phobias (e.g., fear of flying, heights). At the same time, the most frequent panic provoking situation (driving on freeways) was rated as “no problem” by nearly 30% of agoraphobics. The typical agoraphobic usually is severely handicapped in some situations, moderately handicapped in others, and not at all restricted in other situations. Furthermore, agoraphobics have many fears that are common to social phobia (e.g., fear of causing a public disturbance, being stared at) as a primary or secondary fear. They also have many somatic complaints for which they often and persistently seek medical consultation for a physical diagnosis even after agoraphobia is diagnosed. These fears overlap with both hypochondriacal and hysterical disorders. The configuration of fears in agoraphobics is so highly idiosyncratic that it is substantially true that no two agoraphobics have exactly the same pattern of phobias, and that two people with virtually no

overlapping areas of phobia disability can both be called agoraphobic. (Williams, 1985, p. 112) It is clear that however specific a diagnosis may seem, the term does not denote only a precise set of symptoms that occur independently of other symptoms. Thus, a single measure cannot hope to capture the complexity of psychological functioning nor adequately evaluate therapeutic change, because no single measure of disability can routinely capture the complexity of the individual patient. Given the great complexity in the persons to be treated, I can only suggest that researchers begin studying outcome by identifying major targets of treatment, while accepting that the resulting picture of change will be far from complete. One cannot change this fact, but one

can recognize its implications. One can also deal with the complexities of outcome assessment by employing a conceptual scheme that helps organize issues and procedures.

Conceptualizing Measures and Methods Figure 4.1 presents a conceptual scheme that organizes several important issues in measuring outcome.

These

issues involve the content of outcome

assessment,

outcome assessment, and the sources that generate outcome data.

the methods

used in

80

LAMBERT

a

Content se

‘ Technology a

Source el

eee

Intrapersonal

Evaluation

Self-Report

1 2 3

1 2 *

1 2 *

i.

Description 1

Trained Observers

2 e * Observation

1 2 * =

a

1

Relevant Other

4

2

1

Social Role Performance

e

rs

: Interpersonal 1 2

Status

i

2

Therapist Rating

e

1 2 Institutional 1 2

FIG. 4.1.

Scheme for organizing and selecting outcome measures.

The first concept in Fig. 4.1 is entitled content. Content areas covered by outcome measures can be divided into intrapersonal, interpersonal, and social role performance. Thus, content can be seen as a dimension that reflects the need to assess changes that occur within the client, in the client’s intimate relationships, and, more broadly, in the client’s participation in community and social.roles. This dimension can be considered a continuum that represents the degree to which an instrument measures subjective discomfort, intrapsychic attributes, and bodily experiences, versus characteristics of the client’s participation in the interpersonal world. It is a matter of intellectual curiosity and values, if not empirical importance, to know about the content, of the changes that are targeted and modified in treatment efforts. Empirically, the results of outcome studies are more impressive when content is measured broadly considering more than a single content area, because interventions can have side effects as well as more and less extensive effects. Certainly these content

areas reflect the values and interests of clients, of mental health providers, third-party payors, government agencies, and society at large. In the Froyd and Lambert (1989) 74% of outcome measures focused on intrapersonal content, whereas 17% and 9% on interpersonal and social role performance, respectively. To date, changes in those areas have been underrepresented in outcome research. Several issues other than content are represented in the conceptual scheme. One sion that is of central importance in outcome assessment is the source from which generated.

CHANGE SHOULD BE MEASURED PERSPECTIVES

review, focused last two dimendata are

FROM MULTIPLE

In the ideal study of change, all the parties involved who have information about change might be represented. This would include the client, therapist, relevant (significant) others, trained judges (or observers), and societal agencies who store information, such as employ-

ment and educational records. Unlike the physical sciences, measurement in psychotherapy is highly affected by the politics and biases of those providing the data. Seldom is one able to merely observe phenomena of interest without seeing it through some filtering lens. In my review of 116 outcome studies found in the Journal of Consulting and Clinical Psychology (JCCP) between 1986 and 1991, I examined research practices related to assess-

&

4

OUTCOME ASSESSMENT

ment source (Lambert & McRoberts, 1992). Specific outcome measures were classified into five source categories: self-report, trained observer, significant other, therapist, or instrumental (a category that included societal records or instruments, such as physiological recording devices). Frequency data were then computed on the usage of specific instruments and instrument sources across studies. As may be expected, the most popular source for outcome data was the client. In fact, 25% of the studies used client self-report data as the sole source for evaluation. Of the studies that relied solely on this type of self-report scale, three fourths used more than a single selfreport scale. The next most frequent procedure employed two data sources simultaneously (self-report and observer ratings). This combination occurred in 20% of the studies, followed

by self-report and therapist ratings (15%); self-report and instrumental sources (8%). A selfreport scale was used alone or in combination in over 90% of the studies. Significant other ratings rarely were employed. They were utilized alone or in combination with some other data sources in about 9% of the studies reported. The therapist rated outcome alone or in combination with other measures in about 25% of the studies. Impressively, 30% of the studies used six or more instruments to reflect changes in patients. The most ambitious effort had a combination of 12 distinct measures to assess changes following psychotherapy. Clearly, one of the most important conclusions to be drawn from past psychotherapy outcome research is that the results of studies can be misunderstood easily and even misrepresented through failure to appreciate the effects that different perspectives can have in reflecting the degree of change that results from therapy. The necessary and, to some degree, common practice of applying multiple criterion measures in research studies (Lambert, 1983) has made it obvious that multiple measures from different sources do not yield unitary results. For example, in studies using multiple criterion measures, one finds that a specific treatment used to reduce seemingly simple fears may result in a decrease in behavioral avoidance of the feared object (provided by observers), whereas not affecting the selfreported level of discomfort associated with the feared object (Mylar & Clement, 1972; Ross & Proctor, 1973; Wilson & Thomas, 1973). Likewise, a physiological indicator of fear may

show no change in response to a feared object as a result of treatment, whereas improvement in subjective self-report will be marked (Ogles et al., 1990).

An example of the impact on conclusions that relying on different sources of assessment can have is provided by Glaister (1982). In a review of the effects of relaxation training, he

found that relaxation in contrast to other procedures (mainly exposure) had its principal impact on physiological indices of change. Indeed it was superior to other treatments in 11 of 12 comparisons, whereas the other (exposure) conditions were superior in 28 of 38 comparisons that used verbal reports of improvement by the patients. On behavioral measures (including assessor ratings), neither exposure nor relaxation appeared superior. Farrell, Curran, Zwick, and Monti (1983), although showing that raters can discriminate social skill deficits from anxiety level on the Simulated Social. Skills Test, also found that

there was poor correspondence between self-ratings and behavior ratings of these variables. This lack of convergence between measurement methods also was apparent when physiological measures

were

added

(Monti et al., 1983). Little convergent validity was

found for

measurement method. It appears that different measures of the same target problem often disagree (e.g., self-report of sexual arousal and physiological measures; Sabalis, 1983). This conclusion is supported further by factor analytic studies that have combined a variety of outcome measures. The main factors derived from some “older” studies that employed factor analytic data tend to be associated closely with the measurement method or the source of observation used in collecting data, rather than being identified by some theoretical or conceptual variable that would be expected to cut across techniques of measurement (Cart-

81

82

LAMBERT wright, Kirtner, & Fiske, 1963; Forsyth & Fairweather, 1961; Gibson, Snyder, & Ray, 1955; Shore, Massimo, & Ricks, 1965). A more recent example was reported by Pilkonis, Imber,

Lewis, and Rubinsky (1984), who factor analyzed 15 scales representing a variety of traits and symptoms from the client, therapist, expert judges, and significant others. These scales were reduced to three factors that most clearly represented the source of data, rather than the content of the scale. Beutler and Hamblin (1986) reported similar results.

However, few studies have recognized or dealt with adequately the complexities that result from divergence between sources, although creative efforts and some progress have been made. Berzins, Bednar, and Severy (1975) directly addressed the issue of consensus among criterion measures. They studied the relationship among outcome measures in 79 client—therapist dyads, using the MMPI,

the Psychiatric Status Schedule,

and the Current

Adjustment Rating Scale. Sources of outcome measurement involved the client, therapist, and trained outside observers. Data from all three sources and a variety of outcome measures showed generally positive outcomes for the treated group as a whole at termination. There was the usual lack of consensus between criterion measures. However, the primary thesis of Berzins et al. was that problems of intersource consensus can be resolved through the application of alternatives to conventional methods of analysis. The principal components analysis showed four components: (a) changes in patients’ experienced distress as reported by clients on a variety of measures; (b) changes in observable maladjustments as noted by psychometrist, client, and therapist (an instance of intersource agreement); (c) changes in impulse expression (an instance of intersource disagreement between psychometrist and therapist); and (d) changes in self-acceptance (another type of client-perceived change). The practical implications of these results is that a single criterion might suffice for measuring changes in one area of interest such as maladjustment, whereas this practice would be misleading if a single criteria were employed in another area of functioning, such as impulse control. Mintz, Luborsky, and Christoph (1979) addressed the question of intersource consensus

by analyzing data in two large uncontrolled studies of psychotherapy—the Penn Psychotherapy Project and the Chicago study, reported by Cartwright et al. (1963). They reported that there was substantial agreement among the viewpoints of patient, therapist, and outside raters when outcome was defined broadly as posttherapy adjustment or overall benefit. They concluded that, contrary to common opinion, consensus measures of psychotherapy outcome could be defined meaningfully. Despite this consensus, they noted that “distinct viewpoints do exist” (p. 331). In fact, if one considers the effect sizes reported by Mintz et al. (1979), it is clear that the range of improvement varied, as a minimum from 0.52 to 0.93 on pre- to post-changes. The lowest effect size came from the MMPI Hypochondriasis scale and the highest from the Inventory of Social and Psychological Functioning, an observer rating of social adjustment. In addition, although correlations between viewpoints were statistically significant, they were often low. For example, in the Chicago data (NV = 93), correlations

between viewpoints on ratings of adjustment ranged from 0.39 to 0.59. The lack of consensus across sources of outcome evaluation, especially when each source

presumably is assessing the same phenomena, has been viewed as a threat to the validity of data. Indeed, it appears that outcome data provide evidence about changes made by the individual, as well as information about the differing value orientations and motivations of

the individuals providing outcome data. This issue has been dealt with in several ways, ranging from discussion of “biasing motivations” and ways to minimize bias to discussions of the value orientation of those involved (Strupp & Hadley,

1977). The consistency of

findings illuminating factors associated with the source of ratings, rather than the content of patient problems, highlights the need to pay careful attention to divergence of changes that

4

OUTCOME ASSESSMENT

83

follow psychological interventions, and the way information from different perspectives is analyzed and reported in outcome studies. The fact that source factors have been replicated across a variety of scales, patient populations, and three or four decades also suggests that these findings are robust. It is clear that outcome studies need to collect outcome data from a variety of sources. Finding ways to combine these data to estimate overall change remains a task for future research.

TECHNOLOGY OF CHANGE MEASURES In addition to selecting different sources to reflect change, the technology used in devising scales can have an impact on the final index of change. Smith, Glass, and Miller (1980)

suggested that several factors associated with rating scales affect estimates of psychotherapy outcome.

When

summed,

these factors were labeled “reactivity.” The variables of impor-

tance were: (a) the degree to which a measure could be influenced by either the client or therapist, (b) the similarity between therapy goals and the measure, and (c) the degree of blinding in the assessment process. The correlation between these dimensions (ratings of reactivity) and effect size was .18. This was a statistically significant and substantial relationship. The type of outcome measure was categorized. Those measures showing the highest effect sizes were ratings of fear and anxiety, vocational or personal development, emotionalsomatic complaints, and measures of global adjustment. Those with the smallest effect sizes were personality traits, life indicators of adjustment, work, and school achievement. Figure 4.1 lists several different technologies that have been employed in outcome measurement. These include evaluation (global ratings including measures of client satisfaction), description (specific symptom indexes), observation (behavioral counts), and status (physiological and institutional measures). Unfortunately, these procedures (or technologies) for collecting outcome data on patient change vary simultaneously on several dimensions, making it difficult to isolate the aspect of the measurement method that may be most important. A broad dimension on which they vary appears to be a direct—indirect dimension. Here, the data are seen as possibly reflecting a bias determined by the propensity of subjects to produce effects consciously. Thus, global ratings of outcome and client satisfaction measures call for (either implicitly or explicitly) raters (usually clients) to directly evaluate outcome. Their attention is drawn to the question, “Did I get better in therapy?” In contrast, specific symptom indices focus the raters’ attention (before and after treatment) on the status of specific symptoms and signs at the time the rating is made without explicit references to the outcome of therapy. Although there is still knowledge at posttesting that the therapy (or even the therapist) is being evaluated directly, the tendency to rate change is diminished compared with global ratings. Observer ratings in the form of behavioral counts can be even more objective if enough attention is devoted to the procedures that are used. Ideally, these observer ratings call for counting behaviors in real-life circumstances, in which the patients do not know they are being observed or have plenty to focus on besides the impression they are making on the observer. Physiological monitoring usually is not under the conscious control of the patient, or, at the very least, presents a real and serious challenge to conscious distortion. Institutional measures, such as grade point average (GPA), are usually the culmination of a host of complex behaviors influenced by a wide variety of factors, and usually this type of behavior is produced without reference to the research project. Therefore, this type of data may be the least reactive data that can be collected. Green, Gleser, Stone, and Siefert (1975) compared final status scores, pretreatment to

84

LAMBERT

posttreatment difference scores, and direct ratings of global improvement in 50 patients seen in brief crisis-oriented psychotherapy. The Hopkins Symptom Checklist was filled out by the patient, while a research psychiatrist rated the patient on the Psychiatric Evaluation Form and the Hamilton Depression Rating Scale. Ratings of global improvement were made by the patient and the therapist. Green et al. (1975) concluded that the type of rating scale used has a great deal to do with the percentage of patients considered improved—more so, in fact, than improvement per se. They also suggested that outcome scores have more to do with the finesse of rating scales than whether ratings are objective. Global improvement ratings by therapists and patients showed very high rates of improvement, with no patients claiming to do worse. When patients had to rate their symptoms more specifically, however, as with the Hopkins Symptom Checklist, they were likely to indicate actual intensification of some symptoms and to provide more conservative data than gross estimates of change (see also, Garfield, Prager, & Bergin, 1971). Ratings of client satisfaction are valuable as indicies of outcome and often produce results that are correlated with other technologies and methods of assessing outcome (Berger, 1983). But because they usually do not provide the kind of theoretically important information that is desired in outcome research (e.g., the specific kind of symptoms or problems that are changing during therapy), they are not valued highly in theoretically oriented work. Nevertheless, they can provide important data about satisfaction and improvement, albeit more soft data than that which usually is sought in formal research (Lambert, DeJulio, 1983).

OUTCOME ASSESSMENTS CHANGE

Christensen,

&

MUST BE SENSITIVE TO

As the preceding discussion suggests, a central issue in outcome assessment is the degree to which different measures and measurement methods are likely to reflect changes that actually occur as a result of participation in therapy. For example, if the Beck Depression Inventory (a self-report instrument) is chosen as an outcome measure, will it reflect the same degree of

change as the Hamilton Rating Scale for Depression (a clinician rating)? Will gross ratings of overall change provided by the patient show larger or smaller amounts of improvement than a scale that measures change on specific symptoms? To what extent are the conclusions drawn in comparative outcome studies determined by the specific measures selected by researchers? Do the techniques of meta-analysis actually allow us to summarize across the different outcome measures that are employed in different studies (essentially combining them), and thereby facilitate accurate conclusions about differential treatment effects? There is a growing body of evidence to suggest that there are reliable differences in the sensitivity of instruments to change. In fact, the differences between measures is not trivial, but large enough to raise questions about the interpretation of research studies. Two examples of such differences will make the importance of instrument selection clear. Table 4.2 presents data from the Ogles et al. (1990) review of agoraphobia outcome studies published in the 1980s. The effect sizes presented (based on pretest—posttest differences) show remarkable disparity in estimates of improvement as a function of the outcome instrument or method of measurement selected for a study. The two extremes based on measuring scales, Fear Survey Schedule (mean = .99) and Phobic Anxiety and Avoidance Scale (mean = 2.66), suggest different conclusions. The average patient taking the Fear Survey Schedule moved from the mean (50th percentile) of the pretest group to the 16th

percentile after treatment. In contrast, the average patient being assessed with measures of

iy

4

OUTCOME ASSESSMENT

TABLE 4.2

Overall Effect Size (ES) Means and Standard Deviations by Scale* Scale

Na

Mes

SDes

Phobic anxiety and avoidance

65

2.66

1.83

Global Assessment Scale Self-rating severity Fear Questionnaire

31 52 56

2.30 2.12 1.93

1.14 1255 1.30

Anxiety during BAT®

48

1.36

Depression measures

60

1.11

72

Fear Survey Schedule

26

.99

47

Heart rate

21

44

.56

Behavioral Approach Test

54

85

1.15

1.07

*Based on Ogles, Lambert, Weight, and Payne (1990). 4N = the number of treatments whose effects were measured by each scale.

BAT = Behavioral Avoidance Test.

Phobic Anxiety/Avoidance moved from the 50th percentile of the pretest group to the .00 percentile of the pretest group following treatment. Comparisons between the measures depicted in Table 4.2 are confounded somewhat by the fact that the data were aggregated across all studies that used either measure. But similar results can be found when only studies that give both measures to a patient sample are aggregated. Table 4.3 presents data from a comparison of three frequently employed measures of depression: the Beck Depression Inventory (BDI) and Zung Self-Rating Scale for Depression (ZSRS), self-report inventories; and the Hamilton Rating Scale for Depression (HRS-D), an expert judge rating (Lambert, Hatch, Kingston, & Edwards, 1988). Metaanalytic results suggest that the most popular dependent measures used to assess depression following treatment provide reliably different pictures of change. It appears that the HRS-D, as employed by trained professional interviewers, provides a significantly larger index of change than the BDI and ZSRS. Because the amount of actual improvement that patients experience after treatment is never known, these findings are subject to several different interpretations. It may mean that the HRS-D overestimates patient improvement, but it could be argued just as easily that the HRS-D accurately reflects improvement and that the BDI and

TABLE 4.3 Matched

Pairs of Mean

Effect Size (ES) Values

Scale Pair

Na

Mes}

SDes

HRSD/ZSRS BDI/HRS-D ZSRS/BDI

AZ; 49 13

0.94*/0.62” loulciaalesy 0.70/1.03

0.61/0.30 0.86/1.08 0.46/0.52

t

1.88 alt 1.65

Note. HRS-D = Hamilton Rating Scale for Depression: ZSRS = Zung Self-Rating Scale; BDI = Beck Depression Inventory. an] = the number of treatments whose effects were measured by each pair of depression scales. byalues derived from studies in which subjects’ depression was measured on two scales at a time. Effectsize represents within-study comparisons. ep< 05: ES p25.

85

86

LAMBERT

ZSRS underestimate the amount of actual improvement. Both over- and underestimation also may be suggested, with true change falling somewhere in between the HRS-D estimate and those provided by the BDI and ZSRS. However, it does appear that there are reliable differences between measures and that these differences need to be explored and understood. Do self-report scales generally produce smaller effects than expert judge ratings? Is the difference due to the fact that the content of the scales is not identical or that different sources are providing the data? Additional metaanalytic data suggest further differences between the size of treatment effects produced by different outcome measures (cf. Miller & Berman, 1983; Ogles, Lambert, Weight, & Payne, 1990; Shapiro & Shapiro,

1982; Smith et al., 1980). Abstracting from these and related

studies, the following conclusions tentatively can be drawn: 1. Therapist and expert judge-based data, in which judges are aware of the treatment status of clients, produce larger effect sizes than self-report data, data produced by significant/relevant others, institutional data, or instrumental data. 2. Gross ratings of change produce larger estimates of change than ratings on specific dimensions or symptoms.

3. Change measures based on the specific targets of therapy (such as individualized goals or anxietybased measures taken in specific situations) produce larger effect sizes than more distal measures, including tests of personality. 4. Life adjustment measures that tap social role performance in the natural setting (e.g., GPA) produce smaller effect sizes than more laboratory-based measures. 5. Measures collected soon after therapy show larger effect sizes than measures collected at a later date. 6. Physiological measures such as heartrate usually show relatively small treatment effects.

These tentative conclusions are worthy of continual exploration in future research. This research should replicate past research in at least one regard—in instructing those who provide data (such as clients) to report their status or judgments as honestly as possible. Past research

has been

interested

in discovering

the truth about outcomes,

not in providing

inflated estimates of treatment effects. Many measures are very susceptible to the instructional set given to those who are providing the data. Further research is needed to clarify the various factors that inflate and deflate estimates of change. For now, however, it is clear that dependent measures are not equivalent in their tendencies to reflect change and that meta-analysis, because it typically is used to combine different measures, cannot overcome the differences between measures. The thoughtful researcher and consumer of research will give careful attention to the way in which technology, source and content of measurement effects estimates of improvement and the meaning of the results of outcome studies.

Practical Advances That May Close The Gap Between Research and Practice Outcome assessment, as part of an applied science of psychotherapy research, should have a

direct impact on clinical practice. Two developments in outcome assessment may play an important role in bridging the gap between research and practice: individualizing outcome assessment and assessing clinically significant outcome.

4

OUTCOME ASSESSMENT

87

INDIVIDUALIZING OUTCOME ASSESSMENT As has already been pointed out in this chapter, state-of-the-art assessment of outcome relies heavily on the application of atheoretical, mono-trait, standardized scales applied with homogeneous patient samples. This practice can be contrasted with the practice of relying on a careful analysis of the unique goals of an individual patient. The possibility of tailoring change criteria to each individual in therapy was mentioned frequently in the 1970s and seemed to offer intriguing alternatives for resolving several recalcitrant dilemmas in measuring change. In the 1980s and early 1990s, there has been a new surge of interest in making change measures more ideographic. This interest has been bolstered by the flux of general articles on qualitative research methods (Polkinghorne, 1991) and the desire to make psychotherapy research more responsive to the needs of the clinician and general clinical practice. Typical of these approaches is the case-formulation method advocated by Persons (1991). She criticized outcome research for being incompatible with psychotherapy as it actually is practiced. Among her criticisms was the overreliance of research on standardized measures of outcome. She noted that even patients with a homogeneous disorder have a wide range of problems, including work problems; social isolation; financial stresses; medical problems; and tension in relationships with parents, spouse, children, or friends, to name a few. She

argued that the typical standardized assessment procedure ignores most of these difficulties,

whereas the therapist does not. She further noted that these assessment procedures in psychotherapy, as guided by theory, are ideographic and multifaceted, not standardized and limited to a single problem or related set of symptoms. Her suggestions for improving psychotherapy research called for individualization of outcome: each patient will have a different set of problems assessed with a different set of measures. Her suggestions have not gone unchallenged (Garfield, 1991; Herbert & Mueser, 1991; Messer, 1991; Schacht, 1991; Silverman, 1991). But Persons (1991) was hardly the first

to make such recommendations. For example, Strupp, Schacht, and Henry (1988) argued for the principle of problem-treatment-outcome congruence. Unfortunately, their and Person’s proposals have yet to face the foreboding task of empirical application. Similar, if not more practical, approaches were undertaken in the 1970s and 1980s with mixed success. One method that has received widespread attention and use is Goal Attainment Scaling (GAS; Kiresuk & Sherman, 1968). Goal Attainment Scaling requires that a number of treatment goals be set up prior to intervention. These goals are formulated by an individual or a combination of clinicians, client, and/or

a committee assigned to the task. For each goal

specified, a scale with a graded series of likely outcomes, ranging from least to most favorable, is devised. These goals are formulated and specified with sufficient precision that an independent observer can determine the point at which the patient is functioning at a given time. The procedure also allows for transformation of the overall attainment of specific goals into a standard score. In using this method for the treatment of obesity, for example, one goal could be the specification and measurement of weight loss. A second goal could be reduction of depressive symptoms as measured by a single symptom scale, such as the BDI. Marital satisfaction” could be assessed if the patient has serious marital problems. The particular scales and behaviors examined could be varied from patient to patient, and, of course, one may include other specific types of diverse measures from additional points of view. Several methodological issues need to be attended to while using GAS or similar methodology in controlled research (Cytrynbaum, Ginath, Birdwell, & Brandt,

1979).

88

LAMBERT

Goal Attainment Scaling has been applied within a variety of settings with varied success. Woodward, Santa-Barbara, Levin, and Epstein (1978) examined the role of GAS in studying

family therapy outcome. The authors used content analysis to analyze the nature of the goals that were set and the kind of goals set by therapists of different types. In their study, which focused on termination and 6-month follow-up goals, 270 families were considered. This resulted in an analysis of 1,005 goals. The authors, advocates of GAS, reported reliable ratings that reflected diverse changes in the families studied. They also noted that GAS correlated with other measures of outcome,

and thus seemed to be valid.

This study also suggests an advantage of GAS: it not only is applicable with individuals, but can be used to express change in larger systems. Thus, it has been recommended for use in marital and family therapy (Russell, Olson, Sprenkle, & Atilano, 1983). It continues to be applied with families as a way to express changes in the family as a whole, rather than limiting assessment to the identified patient (Fleuridas, Rosenthal, Leigh, & Leigh, 1990).

More critical analyses show that GAS suffers from many of the same difficulties as other individualized goal-setting procedures. The correlations between goals seems to be around 0.65, raising the question of their independence. Goals judged either too easy or too hard to obtain are often included for analysis, but, most important, goal attainment is judged on a relative, rather than an absolute, basis so that behavior change is confounded with expecta-

tions as well as importance (Clark & Caudrey, 1986). Further, the choice and attainment of goals are related to client as well as therapist characteristics that effect goal setting as well as change rating. Calsyn and Davidson (1978) reviewed and assessed GAS as an evaluative procedure. These authors suggested that GAS has poor reliability, because there is insufficient agreement between raters on the applicability of predefined content categories to particular patients. In addition, the interrater agreement for goal attainment ranged from r = 0.51 to 0.85, indicating variability between those making ratings (e.g., therapist, client, expert judge). In general, studies that have correlated GAS improvement ratings with other ratings of improvement, such as MMPI scores, client satisfaction, and therapist improvement ratings, have failed to show substantial agreement and frequently coefficients have been below 0.30 (Fleuridas et al., 1990). In addition, Calsyn and Davidson (1978) pointed out that the use of

GAS also frequently eliminates the use of statistical procedures, such as covariance, that could otherwise correct for sampling errors. Because of this problem, as well as the unknown effects of low reliability, it is suggested that, if GAS is used, it only should be used in conjunction with standard scales applied to all patients. Suggestions for the use of GAS in psychotherapy research have been made by Mintz and Kiesler (1982), and the interested researcher may wish to review their recommendations or the review by Calsyn and Davidson (1978). Since these reviews, Lewis, Spencer, Haas, and DiVittis (1987) have described methods of data gathering and scale construction that they feel increase the reliability and validity of GAS. They applied GAS in conjunction with family-based interventions with inpatients. Specific procedures for goal creation and later evaluation increased. reliability and validity, without reducing the advantages of individualized goals. Among the innovations suggested by these researchers was the use of GAS

ratings only at follow-up, with evaluations of the pattern of adjustment built into goal expectations and evaluations. GAS still is being applied in a variety of settings such as inpatient and school (Maher & Barbrack, 1984), with a variety of patient groups and treatment methods such as group therapy (Flowers & Booarem, 1990) or with the retarded (Bailey & Simeonsson, 1988). However, examination of these studies reveals widespread modification in its use, so that it is

misleading to consider it a single method: GAS is a variety of different methods for recording

4

OUTCOME ASSESSMENT

89

and evaluating client goal attainment. It is not possible to compare goal attainment scores accurately from one study to the next. In addition to the previously stated problems, several issues are raised, given that: (a) individualized goals will not be much more than poorly defined subjective decisions by patient or clinician; (b) units of change derived from individually tailored goals are unequal and therefore hardly comparable; (c) different goals are differentially susceptible to psychotherapy influence; (d) the tendency for goals to change early in therapy (which requires revision of the goals); and (e) the fact that some therapies have a unitary goal or set of goals, the status of individually tailored goals is tenuous. Effective individualization of goals for the purpose of empirically assessing patient change remains an ideal, rather than a reality. The clinician may be better off having a wide range of standardized scales available. Among these scales would

be those itemized in this book.

available a measure

At the very least, a clinician should have

of depression (BDI), a measure

of anxiety (STAI), and a measure of

marital relationship (Marital Adjustment Inventory) for adult patients.

CLINICAL VERSUS STATISTICAL SIGNIFICANCE Most psychotherapy research is aimed at questions of theoretical interest. Is dynamic therapy more effective than cognitive therapy? Is exposure in vivo necessary for fear reduction? These and a host of similar questions give rise to the research designs that have been used in outcome research. The data acquired in outcome studies are submitted to statistical tests of significance. Group means are compared, the within-group and between-group variability are considered, and the resulting numerical figure is compared with a preset critical value. When the magnitude of the distance between groups is sufficiently large, it is agreed that the results are not likely to be the result of chance fluctuations in sampling, thus statistical significance has been demonstrated. This is the standard for most research and is an important part of the scientific process. However, a common criticism of outcome research is that the results of studies, because they typically are reported in terms of statistical significance, obscure both the clinical relevance of the findings and the impact of the treatment on specific individuals. Unfortunately, statistically significant improvements do not necessarily equal practically important improvements for the individual client. Therefore, statistically significant findings may be of limited practical value. This fact raises questions about the real contributions of empirical studies for the practice of psychotherapy. It is conceivable that, in a well-designed study, small differences between large groups after treatment could produce findings that reach statistical significance, whereas the real-life difference between patients receiving different treatments is trivial in terms of the reduction

of painful symptoms. For example, a behavioral method of treatment for obesity may create a statistically significant difference between treated and untreated groups if all treated subjects lost 10 pounds and all untreated subjects lost 5 pounds. However, the clinical utility of an extra 5-pound weight loss is debatable, especially in the clinically obese patient. This dilemma goes to the core of outcome assessment: adequate definitions and quantification of improvement. Numerous attempts have been aimed at translating reports of treatment effects into metrics that reflect the importance of the changes that are made: (a) In the earliest studies of therapy

outcome, patients were categorized “posttherapy” with gross ratings of “improved,” “cured,”

and the like, implying meaningful change. The lack of precision in such ratings, however, resulted in their waning use (Lambert,

1983). (b) Those interested in operant conditioning

and single-subject designs developed concepts such as “social validity” to describe prac-

90

LAMBERT tically important improvement (Kazdin, 1977; Wolf, 1978). However, decisions about the

importance of change remained subjective and unquantified. (c) Some disorders easily lend themselves to analysis of important changes, because improvement can be defined as the absence of a behavior, such as cessation of drinking, smoking, or drug use. Unfortunately, most symptoms targeted in psychotherapy cannot be defined and measured so easily. There is growing recognition that the concept of clinical significance is important and that many different approaches can be used to operationalize it (Jacobson, 1988). Several related methods have been compared in a special issue of Behavioral Assessment (e.g., Kendall & Grove, 1988). A discussion and illustration of some of these methods clarify their contempo-

rary use. Jacobsen, Follette, and Revenstorf (1984) brought clinical significance into prominence

by proposing statistical methods that would illuminate the degree to which individual clients recovered at the end of therapy. Recovery was proposed to be a posttest score that was more likely to belong to the functional than dysfunctional population of interest. Estimating clinical significance requires norms for the functional sample and presumes certain assumptions about the test scores have been met. For change to be clinically significant, a patient must change enough so that one can be confident that the change exceeds measurement error (calculated by a statistic titled the Reliable Change Index). When a patient moves from one distribution (dysfunctional) to another (functional), and the change reliably exceeds measure-

ment error (the Reliable Change Index is calculated by dividing the absolute magnitude of change by the standard error of measurement), change is viewed as clinically significant. The patient is more likely functional than dysfunctional. Jacobsen et al. (1984) applied their criteria of clinical significance to past studies of behavioral marital therapy. They were able to develop cut-off scores based on the normative data of functional and dysfunctional couples who had taken the Locke-Wallace Marital Adjustment Inventory. A growing number of studies have employed these techniques with various treatment samples with considerable success (Lacks & Powlishta, Mavissakalian, 1986; Perry, Shapiro, & Firth, 1986; Schmaling & Jacobsen, 1987).

1989;

These same procedures were applied to the Symptom Check List 90-R (SCL-90-R; Derogotis & Melisaratos, 1983) by Tingey, Burlingame, Lambert, and Barlow (1990). The SCL-90-R was chosen because it is a frequently applied psychotherapy outcome measure that taps a variety of commonly encountered complaints including depression and anxiety. Figure 4.2 presents a graph of the General Symptom Index (GSI; total symptom score of the SCL-90-R) for the purpose of demonstrating how cutoff scores on the SCL-90-R can be used to define clinically significant change. To produce Fig. 4.2, it was necessary to find normative data. The normative data for the severely symptomatic sample were found in existing literature on the SCL-90-R as applied to inpatients (Derogotis & Melisaratos, 1983), outpatients (Burlingame & Barlow, in press) and

the original normative standardization sample (general population) collected by Derogotis and Melisaratos (1983). The asymptomatic sample was collected by Tingey et al. (1990) for the purpose of identifying a group of subjects that was nominated and carefully screened to exclude persons who were not well adjusted (in contrast to the typical normative sample, which is based on a random sample of persons, some of whom may evidence psychopathol-

ogy). Plotted points (“A” and “B”) would indicate subject A’s and subject B’s pre- and posttreatment GSI scores: pretreatment along the horizontal axis and posttreatment along the vertical. The continuous diagonal line signifies no change between pre- and posttreatment scores; a subject receiving identical pre- and posttreatment scores would fall on this line. The area

above this line denotes an increase in the GSI score from pretreatment indicating greater

4

severely

symptomatic

moderately

Posttreatment Score Bpeiogd

mildly

HY ®a wo Ry w® ®@® ~ON wuoe

OUTCOME ASSESSMENT

symptomatic

symptomatic

asymptomatic

eB

A 1234

5

67

8

9

1

1112131415

Pretreatment

16 1.7 18 1.9 20

Score

FIG. 4.2. Sample figure illustrating SCL-90-R GSI cutoffs between the normative samples. Continuous diagonal line indicates points of no change between pre- and posttreatment.

symptomology (subject “A”), whereas the area below denotes a decrease and less symptomology (subject “B”). The three horizontal lines signify the cutoff point between two adjacent samples’ distributions, and are used to determine one aspect of clinical significance (i.e., the movement of subjects from one sample to the next). In Fig. 4.2, the cutoff lines were established empiri-

cally by Tingey et al. (1990). The subjects (A and B) are hypothetical. The areas separated by the cutoff lines indicate the four normative samples: asymptomatic, mildly, moderately, and severely symptomatic. A plotted point on the graph indicates a subject’s placement (in reference to the normative samples) at posttreatment. Unless there is no change pre- to posttreatment, this point will fall some place other than on the diagonal line. By drawing an imaginary vertical line from this point to the no-change diagonal line, the subject’s placement at pretreatment can be determined. For example, subject A’s posttreatment plotted point places him or her in the severely symptomatic sample, and the vertical line drawn down from this point places him or her at pretreatment in the moderately symptomatic sample (it intersects the no-change line within this sample’s area). According to GSI scores, this indicates that, during the course of therapy, this subject deteriorated and moved from the moderately symptomatic to the severely symptomatic sample. Subject B, however, improved and moved from the severely symptomatic sample to the mildly symptomatic sample. For change to be considered clinically significant, in addition to movement from one normative sample to another, the degree of change attained by a patient also must exceed

change that could be due to chance. This degree of change can be based on calculation of the Reliable Change Index (RC). Calculation of this index is based on the standard error of the GSI score. Because the GSI standard error varies from normative sample to normative sample, the amount of change necessary to exceed chance change varies according to the normative sample the patient is in at pretesting. In fact, for GSI scores, RC is larger for the more severely disturbed patients and smaller

91

ge

LAMBERT

for the asymptomatic sample. Thus, a patient starting out as severely disturbed must change more for the change to be considered reliable than a patient who begins as mildly disturbed or asymptomatic. Figure 4.2 illustrates the situation in which subject A started therapy with a GSI score of 0.7, which fell within the asymptomatic sample. A deteriorated to a posttest status of 1.3, now within the severely stymptomatic sample. The pre- to postchange was of a magnitude of —0.60. Because A not only moved from the asymptomatic to the severely symptomatic sample, but the change was greater than the RC cutoff of 0.16 (calculated by Tingey et al. [1990]), A’s change is considered clinically significant. The illustration concludes that the odds of A deteriorating more than 0.16 from pre- to posttesting are less than 5 in 100, because the RC cutoff of .16 was set at the .05 level of confidence. The chances that a change of the magnitude experienced by A could be due to chance (or measurement error) are small indeed. Because the change is not random, and A is now clearly within the severely distorted distribution, clinically meaningful change, albeit deterioration, has occurred.

The addition of the RC criteria significantly reduces the possibility that a patient could start very near the cutoff of two normative groups, change very little, but enough to cross the threshold, and have that change be considered clinically significant. Through these procedures, the patient’s status in relation to patients and nonpatients was defined carefully and could be observed along the dimension of mental health with severe pathology at one end and ideal mental health (being asymptomatic) at the other. Efforts such as this have not been carried out on many standardized measures, but some important analysis are emerging that show great promise for making the results of outcome studies more meaningful to clinicians and researchers (cf. Kazdin, Bass, Siegel, Thomas, & Christopher, 1989).

Additional examples of estimating clinically significant change have been published in recent years. These methods have emphasized the use of normative comparison. Examples include: the use of social drinking behaviors as criteria for outcome in the treatment of problem drinking, or the use of definitions of adequate sexual performance (e.g., ratio of orgasms to attempts at sex, or as time to orgasm following penetration; Sabalis, 1983). These

criteria are based on data about the normative functioning of individuals and can be applied easily and meaningfully with a number of disorders where normal or ideal functioning is readily apparent and easily measured (e.g., obesity). But normative comparison also can be used to quantify clinical significance. This strategy involves comparing the behavior of clients before and after treatment to that of a sample of nondisturbed “normal” peers. An advantage of this method is that comparisons can be based directly on the psychological tests

that commonly are used to measure therapy outcome, if a standardization sample of nonpatients is also available. Usually the procedure involves comparing the end state functioning of treated clients to various control groups. Thus, standards of clinical improvement can be based on normative data and posttreatment status gathered through meta-analysis of multiple

samples of patients, instead of the magnitude of change of specific individual patients. For example, Trull, Nietzel, and Main (1988) reported a meta-analytical review of 19

studies of agoraphobia that used the Fear Questionnaire. Self-reported posttreatment adjust-

ment of agoraphobics was compared with two normative samples. The normative samples were based on college students (at two universities) or a community sample drawn randomly from the phone directory. Both samples included subjects who had never received treatment for a phobic condition. As might be expected, the community sample was more disturbed than the college sample, probably because agoraphobia prohibits or inhibits attendance in college classes. As a consequence, in this study estimates of clinically significant change via

normative comparison turned out to be a function of which normative group was used for comparison.

4

OUTCOME ASSESSMENT

Agoraphobics, treated mainly with exposure, improved during treatment. The average agoraphobic started at the 99th percentile of the college norms and improved to the 98.7th percentile at the end of treatment. The average agoraphobic also started at the 97th percentile of the community norm and progressed to the 68th percentile at posttreatment and to the 65.5 percentile at follow-up. Using similar methodology

Nietzel, Russell, Hemmings,

& Gretter (1987) studied the

clinical significance of psychotherapy for unipolar depression. They compared the posttherapy adjustment of depressed and nondepressed adults who took the BDI. In all, 28 published studies were used to calculate composite BDI norms; these were compared with outcomes from 31 outcome studies that yielded 60 effect sizes. Three normative groups could be identified: a nondistressed group, a general population group (consisting mostly of collegiate subjects), and a situationally distressed group (e.g., pregnant women), which turned out to be very similar to the general population samples. Comparisons contrasting the depressed patients with the normative samples suggested that the various treatments (all of which appeared similar in their effectiveness) produced clinically significant changes in relation to the general population. In fact, the average depressed patient moved from the 99th percentile of the general population norms to the 76th percentile of this reference sample. These gains were maintained at follow-up. In reference to the nondistressed group, the same improvements were much less remarkable. The average patient only moved from the 99th percentile to the 95th percentile. The authors concluded that clinically significant improvement depends on the nature of the normative sample. Obviously, selection of normative samples have a high impact on estimates of meaningful im-

provement. A recent study combining various methods of calculating clinical-significance illustrates the potential of using more than one procedure. Scott and Stradling (1990) studied and contrasted the effects of cognitive therapy offered in either an individual or group format to patients who were depressed. They reported results from an analysis of BDI scores. Patients were assigned to either of the treatment groups or a wait-list control, while still receiving their customary treatment from their general practitioner (which included tricyclic medication in about one half of the patients). Besides the usual group comparisons based on inferential statistics, the authors reported clinically significant improvements as well. These authors reported the percentage of patients reaching various cutoff scores on the BDI.

Using Kendall, Hollon, Beck, Hammen,

and

Ingram’s (1987) criteria for nondepression, mild depression, moderate depression, and se-

vere depression, the authors were able to show obvious differences between wait-list and psychotherapy outcome over the 12 weeks of treatment, and for 1-year follow-up. They also applied the RC (Jacobson, Follette, & Revenstorf, 1984) as a primary criterion, showing that patient change was of a great enough magnitude so that patients could reasonably be considered to have left the ranks of the dysfunctional. -In fact, using the RC, they estimated that 100% of those in the group treatment and 84% of those in individual treatment manifest clinically significant improvement. Fifty-three percent on the wait-list showed similar improvement. In addition, 5% of the wait-list subjects deteriorated, whereas none of the treatment subjects did.

Numerous problems remain. These include the complexities that are created by the fact that researchers use multiple outcome measures, each one possibly providing different information about the individual and the group as a whole. One measure may show clinically significant change for the group or a specific individual, whereas another does not. Other

problems include the use of discrete cutoff points and their derivation, the problems that result from score distributions that are not normal; and the limitations of floor and ceiling effects in many of the most frequently used tests. This latter problem is especially serious,

93

94

LAMBERT

because many tests are weighted heavily toward pathology and not developed for use with people who represent the actualized end of the continuum of functioning. In some instances, it is this actualized end of the continuum that represents the patients’ nondisturbed peers. Finally, there is considerable controversy about procedural and statistical analyses (cf. Lacks & Powlishta, 1989) that have substantial impact on estimates of clinical significance. Some procedures provide more conservative criteria, whereas others are lenient. Thus, statistical

methods do not eliminate the seemingly inevitable application of values to operationalizing outcome. These statistical methods make the judgments explicit and replicable, so that researchers can equate clinical significance across studies. The development of statistically defined clinically significant change, although not without controversy, should be applauded and encouraged. The clinician now has, at his or her disposal, normative data that allow for estimating

clinically significant improvement on a few important outcome measures (e.g., BDI, SCL-90-R, Locke-Wallace, Marital Adjustment Inventory, Fear Questionnaire, and Child Behavior Checklist). Data on clinical significance may stimulate research applications in private practice settings, as well as improve the translation of research findings into clinician friendly facts.

Summary and Conclusions The assessment of psychotherapy outcome is an important endeavor, impacting both the science and practice of mental health services. Outcome assessment is based on a rich tradition of research, and has shown steady improvement as a scientific endeavor over the last five decades. The phenomenon of measuring change and improvement is a fascinating, although presently chaotic, topic of scientific inquiry. It is hoped that it will continue to be a fruitful ground for collaborative efforts and important discoveries. This exciting area of discovery awaits the excited and gifted student. Important discoveries await the determined and patient researcher.

References Bailey, D. B., & Simeonsson, R. J. (1988). Investigation of use of goal attainment scaling to evaluate individual progress of clients with severe and profound mental retardation. Mental Retardation, 26, 289-295. Berger, M. (1983). Toward maximizing the utility of consumer satisfaction as an outcome measure. In M. J. Lambert, E. R. Christensen, & S. S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 56-80). New York: Wiley. Rerzins, J. I., Bednar, R. L., & Severy, L. J. (1975). The problem of intersource consensus

in measuring therapeutic outcomes: New data and multivariate perspectives. Journal of Abnormal Psychology, 84, 10-19. Beutler, L. E., & Hamblin, D. L. (1986). Individualized outcome measures of internal change: Methdological considerations. Journal of Consulting and Clinical Psychology, 54, 48-53. Burlingame, G., & Barlow, S. (in press). Nonspcific and specific effects in time-limited group psychotherapy. Calsyn, R. J., & Davidson, W. S. (1978). Do we really want a program evaluation strategy

4 based on individualized goals? A critique of goal attainment scaling. Evaluation Studies: Review Annual,

1, 700-713.

Cartwright, D. S., Kirtner, W. L., & Fiske, D. W. (1963). Method factors in changes associated with psychotherapy. Journal of Abnormal and Social Psychology, 66, 164—175. Clark, M. S., & Caudrey, D. J. (1986). Evaluation of rehabilitation services: The use of goal attainment scaling. /nternational Rehabilitative Medicine, 5, 41-45. Cytrynbaum, S.,-Ginath, Y., Birdwell, T., & Brandt, L. (1979). Goal attainment scaling: A critical review. Evaluation Quarterly, 3, 5—40. Derogatis, L. R., & Melisaratos, N. (1983). The Brief Symptom Inventory: An introductory report. Psychological Medicine, 13, 595-605. Farrell, A. D., Curran, J. P., Zwick, W. R., & Monti, P. M. (1983). Generalizability and discriminant validity of anxiety and social skills ratings in two populations. Behavioral Assessment, 6, 1-14. Fleuridas, C., Rosenthal, D. M., Leigh, G. K., & Leigh, T. E. (1990). Family goal recording: An adaptation of goal attainment scaling for enhancing family therapy and assessment. Journal of Marital and Family Therapy, 16(4),

389-406. Flowers, J. V., & Booarem, C. D. (1990). Four studies toward an empirical foundation for group therapy. Journal of Social Service Re-

OUTCOME ASSESSMENT

Glaister, B. (1982). Muscles relaxation training for fear reduction of patients with psychological problems: A review of controlled studies. Behavior Research and Therapy, 20, 493-

504. Green, B. Siefert, diverse Journal

C., Gleser, G. C., Stone, W. N., & R. F. (1975). Relationships among measures of psychotherapy outcome. of Consulting and Clinical Psychiatry,

43, 689-699. Hamilton, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology, 6, -

278-296. Herbert, J. D., & Mueser, K. T. (1991). The proof is in the pudding: A commentary on persons. American Psychologist, 46, 1347-1348.

Jacobson, N. S. (1988). Defining clinically significant change: An introduction. Behavioral Assessment,

10, 131-132.

Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy,

15, 336-352. Jacobson, N. S., Follette, W. C., Revenstorf, D.,

Baucom, D. H., Hahlweg, K., & Margolin, G. (1984). Variability in outcome and clinical significance of behavioral marital therapy: A reanalysis of outcome data. Journal of Consulting and Clinical Psychology, 52, 497—

504.

search, 13(2), 105—121.

Forsyth, R. P., & Fairweather, G. W. (1961). Psychotherapeutic and other hospital treatment criteria: The dilemma. Journal of Abnormal and Social Psychology, 62, 598-604.

Kazdin, A. E. (1977). Assessing the clinical or applied importance of behavior change through social validation. Behavior Modifica-

Froyd, J., & Lambert, M. J. (1989, May). A sur-

Kazdin, A. E., Bass, D., Siegel, T., & Christopher, T. (1989). Cognitive-behavioral therapy and relationship therapy in the treatment of children referred for anti-social behavior. Journal of Consulting and Clinical Psycholo-

vey of outcome research measures in psychotherapy research. Paper presented at the meeting of the Western Psychological Association, Reno, NV. Garfield, S. L. (1991). Psychotherapy models and outcome research. American Psycholo-

gist, 46, 1350-1351. Garfield, S. L., Prager, R. A., & Bergin, A. E. (1971). Evaluation of outcome in psychotherapy. Journal of Consulting and Clinical Psychology, 37, 307-313. Gibson,

R.

L.,

Snyder,

W.

U.,

&

Ray,

W.

S. (1955). A factors analysis of measures of change following client-centered psychotherapy. Journal of Counseling Psychology, 2,

83-90.

tion,

1, 427-452.

gy, 57, 522-535. Kendall, P. C., Hollon, S., Beck, A. T., Hammen, C., & Ingram, R. E. (1987). Issues and recommendations regarding use of the Beck Depression Inventory. Cognitive Therapy and Research, 11, 289-300. Kendall, P. C., & Grove, W. M. (1988). Normative comparisons in therapy outcome. Behavioral Assessment, 10, 147-158. Kiresuk, T. J., & Sherman, R. E. (1968). Goal attainment scaling: A general method for evaluating comprehensive community mental

95

96

LAMBERT health programs.

Community Mental Health

Lacks, P., & Powlishta, K. (1989). Improvement following behavioral treatment for insomnia: Clinical significance, long-term maintenance, and predictors of outcome. Behavior Therapy, 20, 117-134.

Lambert, M. J. (1983). Introduction to assessment of psychotherapy outcome: Historical perspective and current issues. In M. J. Lambert, E. R. Christensen, & S.S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 3-32). New York: Wiley Interscience. Lambert, M. J., Christensen, E. R., & DeJulio, S. S. (Eds.). (1983). The assessment of psychotherapy outcome. New York: Wiley. Lambert, M. J., Hatch, D. R., Kingston, M. D.,

& Edwards, B. C. (1986). Zung, Beck, and Hamilton rating scales as measures of treatment outcome: A meta-analytic comparison. Journal of Consulting and Clinical Psycholo-

gy, 54, 54-59. Lambert,

M.

J., &

McRoberts,

titative review of the research evidence. Psychological Bulletin, 94, 39-53.

Journal, 4, 443-453.

C.

H.

(1993,

April). Outcome measurement in JCCP: 1986-199]. Paper presented at the meeting of the Western Psychological Association, Phoenix, Arizona. Lewis, A. B., Spencer, J. H., Haas, G. L., & DiVittis, A. (1987). Goal attainment scaling: Relevance and replicability in follow-up of inpatients. The Journal of Nervous and Mental

Disease, 175, 408-418. Locke, H. J., & Wallace, K. M. (1959). Shortterm marital adjustment and prediction tests: Their reliability and validity. Marriage and Family Living, 21, 251-255.

Luborsky, L. (1971). Perennial mystery of poor agreement among criteria for psychotherapy outcome. Journal of Consulting and Clinical

Psychology, 37, 316-319. Maher, C. A., & Barbrack, C. R. (1984). Evaluating the individual counseling of conduct problem adolescents: The goal attainment scaling method. Journal of School Psychology,

22, 285-297. Mavissakalian, M. (1986). Clinically significant improvement in agoraphobia research. Behayior Research and Therapy, 24, 369-370.

Messer, S. B. (1991). The case formulation approach: Issues of reliability and validity. American Psychologist, 46, 1348-1350.

Miller, R. C., & Berman, J. S. (1983). The efficacy of cognitive behavior therapies: A quan-

Mintz, J., & Kiesler, D. J. (1982). Individualized measures of psychotherapy outcome. In P. C. Kendall & J. N. Butcher (Eds.), Handbook of research methods in clinical psychology (pp. 491-534). New York: Wiley.

Mintz,

J.,

Luborsky,

L.,

&

Christoph,

P. (1979). Measuring the outcomes of psychotherapy: Findings of the Penn. Psychotherapy Project. Journal of Consulting and Clinical Psychology, 47, 319-334. Monti,

P. M., Wallander,

J. L., Ahern,

D. K.,

Abrams, D. B., & Munroe, S. M. (1983). Multi-modal measurement of anxiety and social skills in a behavioral role-play test. Generalizability and discriminant validity. Behavioral Assessment, 6, 15-25. Mylar, J. L., & Clement, P. W.

(1972).

Predic-

tion and comparison of outcome in systematic desensitization and implosion. Behavior Research and Therapy, 10, 235-246. Nietzel, M. T., Russell, R. L., Hemmings,

K. A., & Gretter, M. L. (1987). Clinical significance of psychotherapy for unipolar depression: A meta-analytic approach to social comparison. Journal of Consulting and Clinical Psychology, 55, 156-161. Ogles, B. M., Lambert, M. J., Weight, D. G., & Payne, I. R. (1990). Agoraphobia outcome measurement: A review and meta-analysis. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 2, 317-325.

Perry, G., Shapiro, D. A., & Firth, J. (1986). The case of the anxious executive: A study from the research clinic. British Journal of Medical Psychology, 59, 221-233. Persons, J. B. (1991). Psychotherapy outcome studies do not accurately represent current models of psychotherapy: A proposed remedy. American Psychologist, 46, 99-106. Pilkonis, P. A., Imber, S. D., Lewis, P., & Rubinsky, P. (1984). A comparative outcome study of individual, group, and conjoint psychotherapy. Archives of General Psychiatry,

41, 431-437. Polkinghorne, D.E. (1991). Twoconflicting calls for methodological reform. The Consulting Psychologist, 19, 103-114. Ross, S. M., & Proctor, S. (1973). Frequency and duration of hierarchy item exposure in a systematic desensitization analogue. Behavior Research and Therapy, 11, 303-312.

4 Russell, C. S., Olson, D. H., Sprenkle, D. H., & Atilano, R. B. (1983). From family system to family system: Review of family therapy research. The American Journal of Family Therapy, 11, 3-14. Sabalis, R. F. (1983). Assessing outcome in patients with sexual dysfunctions and sexual deviations. In M. J. Lambert, E. R. Christensen, & S. S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 205—262). New York: Wiley. Schacht, T. E. (1991). Formulation-based psychotherapy research: Some further considerations. American Psychologist, 46, 1346—

1347. Schmaling, K. B., & Jacobson, N. S. (1987, November). The clinical significance of treatment gains resulting from parent-training interventions for children with conduct problems: A reanalysis of outcome data. Paper presented at the annual meeting of the Association for the Advancement of Behavior Therapy, Boston.

Scott, M. J., & Stradling, S. G. (1990). Group

OUTCOME ASSESSMENT

Strupp, H. H., & Hadley, S. W. (1977). A tripartite model of mental health and therapeutic outcomes: With special reference to negative effects in psychotherapy. American Psycholo-

gist, 32, 187-196. Strupp, H. H., Schacht, T. E., & Henry, W. P. (1988). Problem-treatment-outcome congruence: A principle whose time has come. In H. Dahl, H. Kaechele, & H. Thomas (Eds.), Psychoanalytic process research strategies (pp. 1-14). Berlin: Springer. Tingey, R., Burlingame, G., Lambert, M. J., & Barlow, S. H. (1990). Assessing clinical significance: Extensions and Applications. Paper presented at the Society for Psychotherapy Research, Wintergreen, Virginia.

Trull, T. J., Nietzel, M. T., & Main, A. (1988). The use of meta-analysis to assess the clinical significance of behavior therapy for agoraphobia. Behavior Therapy, 19, 527-538.

Wells, E. A., Hawkins, J. D., & Catalano, R. F. (1988). Choosing drug use measures for treatment outcome studies. 1. The influence of measurement approach on treatment results.

International Journal of Addictions, 23, 851— cognitive therapy for depression produces. S73e clinically significant reliable change in Williams, S. L. (1985). On the nature and meacommunity-based settings. Behavioral Psysurement of agoraphobia. Progress in Behavchotherapy, 18, 1-19. ior Modification, 19, 109-144. Shapiro, D. A., & Shapiro, D. (1982). MetaWilson, G. T., & Thomas, M. G. (1973). Self analysis of comparative therapy outcome studversus drug-produced relaxation and the efies: A replication and refinement. Psychologifects of instructional set in standardized syscal Bulletin, 92, 581-604. tematic desensitization. Behavior Research Shore, M. F., Massimo, J. L., & Ricks, D. F. and Therapy, 11, 279-288. (1965). A factor analytic study of psychoWolf, M. M. (1978). Social validity: The case therapeutic change in delinquent boys. Journal for subjective measurement or how applied of Clinical Psychology, 21, 208-212. behavior analysis is finding its heart. JourSilverman, W. K. (1991). Person’s description nal of Applied Behavior Analysis, 11, 203of psychotherapy outcome studies does not

accurately represent psychotherapy outcome studies.

American

Psychologist,

46,

1351-

1352, Smith, M. L., Glass, G. V., & Miller, T. I. (1980). The benefits of psychotherapy. Baltimore: Johns Hopkins University Press.

214. Woodward, C. A., Santa-Barbara, & Epstein, N. B. (1978). The attainment scaling in evaluating py outcome. American Journal chiatry, 48, 464.

J., Levin, S., roles of goal family theraof Orthopsy-

97

Chapter 5 Criteria for Selecting Psychological Instruments for Treatment Outcome Assessment Frederick

L Newman

Florida International University

James A. Ciarlo University of Denver

What is the current psychological and community functioning of those we wish to assess? What are the behaviors that the therapeutic intervention(s) expect to impact? What clinical or program decisions will be supported by an assessment of the client’s psychological state or functional status? What is the most cost-effective means to perform these assessments? These are among a number of questions that one might consider addressing when selecting a psychological instrument for screening clients, treatment planning, or evaluating client progress and outcome. The current chapter is designed to assist the reader in creating criteria for developing, selecting, and using an instrument. The 11 criteria discussed in this chapter were identified originally by a panel of experts assembled by the National Institute of Mental Health (NIMH; Ciarlo, Brown, Edwards, Kiresuk, and Newman,

1986).! In this chapter, we

have attempted to update that discussion of the 11 criteria given our current understanding of the research literature and the demands on the use of psychological testing. The 11 criteria are summarized in Table 5.1 and are organized under five groupings: e ¢ e ¢ ¢

Applications of Measures Methods and Procedures Psychometric Features Cost Considerations Utility Considerations

Through this discussion, it becomes obvious that the criteria are not independent of each other. Yet each criterion focuses on unique concerns that help the reader consider the demands of his or her own situation, the literature, and the relationship of that criterion to the others. ‘Members of the expert panel were A. Broskowski, G. B. Cox, H. H. Goldman, W- A. Hargreaves, J. Mintz, I. Elkins, and J. W. Zinober.

98

5

CRITERIA FOR OUTCOME ASSESSMENT

INSTRUMENTS

99

TABLE 5.1 Criteria for Development, Selection, and/or Use of Progress-Outcome Measures

Applications

Methods and Procedures

Relevance to Simple, target group and __teachable independent methods of treatment provided Use of measures with objective referents Use of multiple respondents More processidentifying outcomé measures

Psychometric Features

Psychometric strength--reliable, valid, sensitive to treatment related change; and nonreactive

Cost Considerations

Low costs

Utility Considerations

Understanding by nonprofessional audiences Easy feedback and uncomplicated interpretation Useful in clinical services Compatibility with clinical theories and practices

Criterion One: Relevance to Target Group An outcome measure or set of measures should be relevant and appropriate to the target group(s) whose treatment is being studied; that is, the most important and frequently observed symptoms, problems, goals, or other domains of change for the group(s) should be addressed by the measure(s). . . . Other factors being equal, use of a measure appropriate to a wider range of client groups is preferred. . . . Measures (should be). . . independent of the type of treatment service provided are to be preferred. . . . (Ciarlo et al., 1986, p. 26)

There is a common wisdom that treatment selection and the person’s probable response to treatment should be based on a set of clinical and demographic characteristics (Beutler & Clarkin, 1990). A target group can be described as a cluster of clients with similar clinicaldemographic characteristics that are expected to have similar treatment needs and course. Given this as a starting premise, a first criterion should confront issues relevant to common problems in functioning, diagnosis, socioeconomic setting, demographics, or history that would make a difference in treatment selection, course, and outcome. In fact, the NIMH panel identified this criterion as the most important, primarily because it’s the logic that sets

the stage for each of the ensuing criteria. “Clients of different types can be expected to differ in their response to treatment; therefore it is only reasonable to study clients of the same type when comparing different treatments or service programs” (Ciarlo et al., 1986, p. 27). As homogeneity of client characteristics decreases, the complexity of the methods and procedures required to administer and score the instrument often increases. Differences in client age, ethnicity (as related to language and meaning), comorbidity with a physical illness, and history are obvious reasons for modifying an instrument’s administrative procedures. Once these modifications are introduced, the measure may change its character. Thus, the first criterion, “relevance to a target group,” addresses whether consumers’ characteristics will force a modification in the procedure’s administration. The second issue is the influence of heterogeneous client characteristics on the measure’s

psychometric features. Increased differences among client characteristics may add variation

100

NEWMAN AND CIARLO that otherwise would not systematically covary with treatment. This heterogeneity could diminish the measure’s psychometric accuracy.

Criterion Two: Simple, Teachable Methods Ciarlo et al. (1986) pointed out that the second criterion was agreed on readily by all of the panelists, yet the development of training manuals and methods for assuring the quality of instrument administration is a recent and rare phenomenon. The criteria and technology for developing training materials and for controlling administrative quality is not always applied, although it has been understood in the abstract for many years (Cronbach, 1970; Nunnally,

1978). Self-report measures (e.g., System Checklist 90-R [SCL-90-R], Beck Depression Inventory [BDI], Minnesota Multiphasic Personality Inventory [MMPI-2]) or measures completed by a significant other (the parents in the Children’s Behavior Checklist; Achenbach & Edelbrock, 1983) that have survived scrutiny and are considered to have adequate psychometric quality usually have good instructions and administration manuals. But even in these cases, the recommended guidelines for administration may be ignored with potentially disastrous effects. For example, the instructions for most self-report instruments strongly recommend completion independent of guidance or advice from others, preferably in isolation. From our own editorial experience, this requirement has not always been adhered to adequately. Measures completed by an independent clinical observer or by the treating clinician can be very useful, but often the instructions on the instrument’s use, training, and quality-control procedures are poorly developed. On the one hand, such measures seek to make use of the professional’s trained observations. On the other hand, such scales tend to be more reactive

to clinician judgment bias (Newman, 1983; Patterson & Sechrest, 1983). Procedures for surfacing judgment biases in a staff training format are discussed in Newman (1983) and detailed in Newman and Sorensen (1985).

Criterion Three: Use of Measures

with Objective Referents

An objective referent is one for which concrete examples are given for each level of a measure or at least at key points on the rating scale. A major asset of objective referents is the

potential to develop reliable and useable norms for an instrument. If the instrument is sensitive to differences in client psychological status and functioning, then it has potential for justifying third-party payment or for treatment/service outcome evaluation. Clinicians often proclaim the attractiveness of instruments that are individualized to the client. The most attractive features of these instruments are that: (a) the measures can be linked more directly to the consumer’s own behaviors and life situation; and (b) treatment selection, course, and outcome can be individualized to the consumer. In fact, a consistent finding in the literature is that when the client and the clinician have identified the client’s

unique problems and the goals of treatment clearly, there is a significantly positive increase in outcome (Mintz & Kiesler, 1981). Measures of this sort include target complains (severity of), goal-attainment scaling, problem-oriented records, and global improvement ratings. > ts

5

CRITERIA FOR OUTCOME ASSESSMENT INSTRUMENTS

101

The major argument against such measures is the issue of generalizability. Specifically, is the change in the severity of one person’s complaint problem comparable to a like degree of change in another person’s complaint problem? Although the issue of generalizability plagues all measures, without objective referents, the distribution of outcomes becomes freefloating across settings or clinical groups (Cytrynbaum, Ginath, Birdwell, & Brandt, 1979),

thereby limiting the utility of the measures. There are arguments on the other side of this issue. Several meta-analytic studies, where

effect size was standardized, have been very informative without specifically identifying the behaviors that have been modified (Clum, Clum, & Surls, 1993; Robinson, Berman, & Neimeyer, 1990). Howard, Kopta, Krause, and Orlinsky (1986) studied the relationship of dosage (number of visits) to outcome

across studies, where the measure of outcome

was

simply whether improvement was observed. They documented that there was significant between-group differences in improvement as a function of fhe number of sessions. Newman (1980) also argued that, even with objective referents, local conditions, including state-wide

funding practices or community standards of “normal functioning,” will transform the distributions of any measure, with or without objective referents. Thus, studies that identify local

norms should become standard practice for any measure intended to set funding guidelines, to set standards for treatment review, or to conduct evaluation research (Newman, Newman,

Kopta, McGovern,

Howard, & McNeilly,

1980;

1988).

Because individualized problem identification and goal setting does have beneficial outcomes, it may be possible to have the best of both worlds. This can be accomplished by using both an individualized instrument and a measure that has objective referents.

Criterion Four: Use of Multiple Respondents A number of theorists and researchers have noted that measures from the principal stakeholders (client, therapist, significant other-collateral, research evaluator) should be obtained, because each views the process and outcomes of treatment that should be considered differently (Ciarlo et al., 1986; Ellsworth, 1975; Lambert, Christensen, & DeJulio, 1983; Strupp & Hadley, 1977). The importance of this criterion varies by target group. For example, parents of psychologically troubled children were considered as primary observers by Achenbach and Edelbrock (1983) in their development of the Child Behavior Checklist, whereas teachers were considered secondary. Similar arguments are being made in scale development for the elderly, where the adult children of the frail elderly are considered as major stakeholders whose assessments are considered primary over self-reports (Kane & Kane, 1981; Mangen & Peterson,

1984).

There have been a number of researchers who have contrasted the views of the four major respondents: client, treating clinician, significant other, and independent clinical observer.

Turner, McGovern, and Sandrock (1982) found that there is a high level of agreement across different scales originally designed for use by one of the respondent groups, as evidenced

through high canonical correlations across respondents (e.g., the SCL-90-R for clients, the Colorado Personal modified described

Clinical Rating Scale and the Global Assessment Scale for clinicians, and the Adjustment and Role Skills Scale for significant others) when instructions were to fit each of the respondents. High coefficients were obtained when observers specific behaviors (where the scale had objective referents), whereas lower coefficients were obtained when observers described how another person felt (e.g., he or she felt

happy or sad).

102

NEWMAN AND CIARLO The major advantages achieved by obtaining measures from multiple observers are: (a) each observer’s experiences result in a unique view of the client (although the Turner et al. [1982] study suggested that these views can be highly similar); (b) concurrent validation of the client’s behavioral status and changes can be obtained; and (c) responses are likely to be more honest if all of the respondents are aware that there are multiple respondents. A major disadvantage is higher costs, particularly in terms of the time and effort for data collection and analysis. There also is the added logistical problem of attempting to collect the functional status data from multiple respondents at the same time, such that the same states

are being observed. The time and effort costs are becoming more manageable with the use of computer-assisted testing and scoring procedures; however, there is an associated increase in hardware and software costs.

Criterion Five: More Process-Identifying Outcome Measures “Measure(s) that provide information regarding the means or processes by which treatments may produce positive effects are preferred to those that do not” (Ciarlo et al., 1986, p. 28).

The basic concept here is at least controversial. On one side of the issue, Orlinsky and Howard (1986) argued that there should be a relationship between process and outcomes. Behavioral and cognitive behavioral treatments employing self-management, homework assignments, and self-help group feedback often use measures with objective behavioral referents as both process and outcome measures. On the other side, Stiles and Shapiro (in press) argued that most of the important interpersonal and relationship ingredients (processes) that occur during psychotherapy (possibly other psychosocial intervention sessions as well) are not expected to correlate with outcome. Adequate empirical support for either side of the argument is still lacking, and the different sides of the arguments appear to be theory related. A strong argument can be made that behavioral markers of progress or risk level should be taken regularly during the course of treatment. Examples include level of depression, anxiety, substance abuse, interpersonal functioning, or community functioning (Carter & Newman,

1975; Newman,

1980; Newman,

Hunter,

& Irving,

1987). These

markers

are not

necessarily describing the actual therapeutic process. Instead, they are global indicators describing whether the person is functioning adequately to consider continuing versus altering the planned treatment. Certainly, programs serving consumers with serious and persistent illnesses should adopt a strategy of regularly collecting such progress measures.

Criterion Six: Psychometric Strengths The measure used should meet minimum criteria of psychometric adequacy, including: a) reliability (test-retest, internal consistency, or inter-rater agreement where appropriate); b) validity (content, concurrent, and construct validity); c) demonstrated sensitivity to treatment-related change; and d)

freedom from response bias and non-reactivity (insensitivity) to extraneous situational factors that may exist (including physical settings, client expectation, staff behavior, and accountability pressures). The measure should be difficult to intentionally fake, either positively or negatively. (Ciarlo et al., 1986, p. 27)

S

CRITERIA FOR OUTCOME ASSESSMENT INSTRUMENTS

Two issues are discussed under the topic of psychometric features. The first is obvious: “It is important to use measures of high psychometric quality.” The second might seem bold: “The psychometric quality of the local application of an instrument is related to the quality of

services.” 1. The importance of having measures of high psychometric quality. On the surface, no one should argue to lower the standards for an instrument’s psychometric qualities. Yet the more reactive, less psychometrically rigorous global measures (e.g., global improvement ratings, global level of functioning ratings) tend to be more popular with upper level decisionmakers (e.g., program managers, legislators). Although it is possible to exert reasonable control over the application of these measures to assure psychometric quality (Newman, 1980), such control is not necessarily enforced, thus psychometric quality suffers (Green, Nguyen, & Attkisson, 1979). 2. The relationship of psychometric quality of the local application of an instrument to the quality of the service program. Although this concept may appear to be too bold at first, it nevertheless is a concept that has considerable empirical support. The focus of discussion here is on clinical aspects of psychometric strengths and the relationships of these strengths to the qualities of the delivery of therapeutic services. ‘

A complete articulation of the first assertion is as follows: If local data collection produce low reliability and validity estimates, and there exists evidence that the instrument has been demonstrated to have adequate reliability and validity in another context, then it is highly likely that the quality and effectiveness of clinical services may be questionable. The argument of relating quality of instrument use to quality of care follows from three assumptions. First, psychological service should have defined clearly the target groups it can treat (see Criterion One) in terms of clinical and demographic characteristics.

Second, the

psychological service should have provided a sufficient operational definition regarding the goals of the services, such that consumer progress-outcomes are observable and measurable. Third, the leadership and staff of a psychological service should have identified one or more instruments that have sufficient content validity relative to service program goals for each target group. Each of the three assumptions has face and content validity with respect to program quality. It is difficult to envision how a program could claim to provide quality or effective services if it does not define the treatment population and the service objectives. The third assumption, of course, is the most problematic to realize. It requires service program leaders and staff to agree that there are measurable progress and outcome instruments that span the domains of service program objectives. Many still claim that such measures

do not exist. Yet, all of the instruments described in this text have been used

successfully to demonstrate treatment efficacy in more than one service program. Others argue that if such measures do exist, they are either too expensive to use or too reactive to

adequately control the quality of the data. This circles the discussion back to the assertion that if the quality of the data are low in a particular service setting, although quite satisfactory elsewhere, the quality of the service might be at fault. Thus, a local program that uses an instrument for screening, treatment planning, or progress review might consider obtaining annual estimates of the instrument’s reliability and validity. These results then could be used

to index the quality of service, regardless of the actual level of treatment outcome obtained. To further illustrate this point, consider the issue of reliability as it might relate to program quality. If reliability is low between raters (between the client and therapist, or among two or

more treatment staff), it is likely that there is inconsistent communication or understanding about the client’s psychological and functional status, the service-treatment’s intention, and/or the client’s progress-outcome. If there is inconsistent communication or understanding regarding these aspects between client and therapist or among clinical staff, then a poor outcome would be the most likely result (Mintz & Kiesler, 1981).

103

104

NEWMAN AND CIARLO How does the use of standardized measures fit into the picture of increasing the level of communication? There are two points to be made here. First, under the assumptions given

earlier, careful selection of the progress-outcome measures provides a clear statement of program purpose and goals. Second, the language describing the functional domains (factor structure) covered by the instruments represents an agreed on vocabulary for staff to use when communicating about clients. The language should be useful in the communication between clients and staff, as well as among staff. If that language is related to the language of the instruments, any inconsistency in use of the instruments would be reflected in an inconsistency in the communication with clients and among staff when discussing clients. Another consideration is that the validity of the test instrument, when estimated locally,

relates to the quality of services. If locally established estimates of instrument validity among services or within a service deviate from established norms, the service staff’s concept of normal needs to be investigated. Classical examples of such differences are those found in estimates of community functioning between inpatient and outpatient staff (Newman, Heverly, Rosen, Kopta, & Bedell, 1983). Kopta, Newman, McGovern, and Sandrock (1986) found that when there were multiple frames of reference among clinicians of different theoretical

orientations,

there were

different

syntheses

of the clinical

material

within

a

session and different intervention strategies and treatment plans proposed. McGovern, Newman, and Kopta (1986) found that differences in attributions of problem causality and treatment outcome responsibility were related to judgments regarding the clinicians’ choices of treatment strategies. These differences in frames of reference influence (i.e., probably reduce) the estimates of concurrent validity of measures in use as well as measure interrater reliability. However, reduced coefficients of reliability and validity are not as serious an issue as the potential negative impact on services when purpose, language, and meaning lack clarity among service staff. Following the line of reasoning offered previously, a two-part recommendation might be considered. First, the leadership of a service program could implement operations that satisfy the three assumptions identified earlier: (a) obtain a clear target group definition by service staff, (b) provide operational definitions of treatment goals-objectives, and (c) work toward selection of instruments whose structure and language reflects the first two assumptions. The program also should incorporate staff supervision and development procedures that will identify when and how differences in frames of reference and language meaning are occurring. Chapter 6 of this volume provides a discussion on the design and analysis of reliability and validity studies.

Criterion Seven: Low Measure

Costs

Relative to Its Uses How

much

should be spent on collecting, editing, storing, processing,

and analyzing

progress-outcome information? The answer to this question must be considered in terms of the five important functions that the data support: screening-treatment planning, quality Work with several colleagues has focused on the methods and results of studies identifying factors influencing differences in clinicians’ perceptions. The theoretical arguments and historical research basis for this line of work are discussed in Newman (1983). The procedures for conducting these studies as staff development sessions are detailed in Newman and Sorensen (1985) and in Heverly, Fitt, and Newman (1984). Examples of studies on factors influencing clinical assessment and treatment decisions are: Newman, Kopta, McGovern, Howard, and McNeilly (1988); Kopta, Newman, McGovern, and Angle (1989); Kopta, Newman, McGovern, and Sandrock (1986); McGovern, Newman, and Kopta (1986); Newman, Fitt, and Heverly (1987); Heverly & Newman (1984); and Newman, Heverly, Rosen, Kopta, & Bedell (1983).

5

CRITERIA FOR OUTCOME ASSESSMENT INSTRUMENTS

assurance, program evaluation, cost containment (utilization review), and revenue generation. Given these functions, a better question might be, “What is the investment needed to

assure a positive return on these functions?” There are several pressures on mental health (and physical health) services that indicate that an investment in the use of progress-assessment instruments can be cost—beneficial. The first force is the requirement of an initial assessment to justify entry into services and development of a treatment plan for reimbursable clients. For the seriously and persistently mentally ill, funded placement (e.g., by Medicaid in most states) in extended community services (waivered services) requires a diagnostic and functional assessment. Most other third-party payors also will reimburse such activities and the affiliated resource costs if it can be shown that the testing is a cost-effective means of making screening (utilization review) decisions. This is not to imply that there is an open door to perform and charge for an unlimited amount of evaluative testing and assessment, but its judicial use is covered by many third-party payors. Justification for continued care also is required by both public and private third-party payors. Again, the cost of the assessment often can be underwritten by the cost containmentquality assurance agreement with the third-party payors. The second force is the emerging litigious culture that requires increasing levels of accountability for treatment interventions. Our experience is that the legal profession is divided on this issue. One view says that the less hard data a service program has, the less liability it would have for its actions. The credo here appears to be, “Do not put anything in writing unless required to do so in writing by an authority that will assume responsibility.” The other view says that a service program increases its liability if it does not have any hard evidence to justify its actions. There is little doubt that the former view has been the most popular view until recently. With increased legal actions by consumer groups on the “right to treatment,” there is likely to be an increased need for data that can justify the types and levels of treatment provided. A parallel force is exerted by the increased budgetary constraints by both private and public sources of revenues for mental health services. Pressures to enforce application of cost containment-utilization review criteria appear to be far stronger than pressures for assuring

quality of care. Although the literature has indicated the efficacy of many mental health interventions,

empirical literature supporting the cost-effectiveness and cost—benefits of

these services still lags (Newman & Howard,

1986; Yates & Newman,

1980).

When Ciarlo assembled the panel of experts for NIMH, a cost estimate of 0.5% of an

agency’s total budget was considered to be a fair estimate of affordable costs for collecting and processing progress-outcome data. This was to include the costs of test materials, training of personnel, and collecting and processing the data. This estimate was made at a time when the public laws governing the disbursements of federal block grant funds required that 5% of the agency’s budget go toward evaluations of needs and program effectiveness. There have been two notable changes in service delivery that have occurred since the time

when the panel of experts met. One is that the Health Care Financing Agency (HCFA) and other third-party payors now require an assessment procedure that will deflect those who do not require care or will identify the level of care required for clients applying for service. They offer limited reimbursement for such assessment activities. The second change focuses on the use of assertive case management or continuous treatment team approaches for the seriously and persistently mentally ill or substance abusers. Here, the client tracking procedures can be part of the reimbursed overhead costs. Assessment and client tracking procedures are logically compatible activities. The requirement for initial and updating assessments to justify levels of care can be integrated with the client tracking system requirement for case management or treatment team approaches. If

105

106

NEWMAN AND CIARLO

tracking procedures is a cost-effective technique for integrating the assessment and the client ting and providing instituted, the costs for testing become part of the costs of coordina d to the costs of services. It is possible that if the costs considered here were restricte data (and not the purchasing the instrument and the capacity to process the instrument’s cost estimation professionals’ time), the costs might not exceed the 0.5% estimate. Proper ate levels of appropri studies need to be done to provide an empirical basis to identify the costs.

Criterion Eight: Understanding by Nonprofessional Audiences The scoring procedures and presentation of the results should be understandable to stakeholders at all levels. These stakeholders include the consumer and his or her significant others; third-party payors; and administrative and legislative policymakers at the local, state, and federal levels. The analysis and interpretation of the results should be understandable at the individual consumer level. Two lines of reasoning support this. The first is the increased belief in and legal support for the consumer’s right to know about the assessment’s results and the associated selection of treatment and services. An understandable descriptive profile of the client can be used in a therapeutically positive fashion. Examples for the client or family member’s consideration might include the following: (a) Does the assessment score(s) indicate my need for, progress in, or success with treatment; or the need for continued treatment? (b) Does a view of my assessment score(s) over time describe how I functioned in the past relative to how I am doing now? (c) Does the assessment score(s) help me communicate how I feel or

function to those who are trying to serve, treat or assist me (including my family)? (d) Does the assessment help me understand what I can expect in the future? A second aspect of this criterion is the advantage of being able to aggregate understandable test results over groups of consumers to communicate evaluation research results to influential stakeholders (e.g., regulators, third-party payors, legislators, citizens, or consumer groups). This includes needs assessment for program and budget planning (Newman, Griffin, Black, & Page,

1989; Uehara,

Smukler,

& Newman,

in press). It also includes

evaluating program effectiveness and/or cost-effectiveness among service alternatives for policy analysis and decisions (Newman & Howard, 1986; Yates & Newman, 1980). The budget planners and the policy decisionmakers require easily understood data. They often are reticent to rely solely on expert opinion to interpret the data, with some even preferring to do it themselves. Examples of questions that the data ideally should be able to address include: (a) Do the scores show whether a client has improved functioning to a level where he or she either requires less restrictive care or no longer requires care? (b) Do the measures assess and understandably describe the consumer’s functioning in socially significant areas, for example, independent living, vocational productivity, appropriate interpersonal, and community behaviors? (c) Would the measures permit comparisons of relative program effectiveness amorig similar programs that serve similar clients? In summary, it is important to ensure that the test results are understandable to those at the front-line level (consumers, their families, and service staff) and that the aggregate data are understandable to budget planners and policymakers.

Criterion Nine: Easy Feedback and Uncomplicated Interpretation The discussion under Criterion Eight is also relevant here, but the focus is on presentation. Does the instrument and its scoring procedures provide reports that are easily interpreted? Does the report stand on its own without further explanations or training? For example, complex “look-up” tables are less desirable than a graphic display describing the characteristics of a client or a group of clients relative to a recognizable norm. Computerized scoring and profile printouts in both narrative and graphic form are becoming more common, which is to be commended. This trend reiterates the importance of Criterion Nine. The only cautionary note here is that the presentation should not be so “user friendly” that it misrepresents the data. The language used to label figures and tables must be developed carefully such that the validity of the instrument’s underlying constructs are not violated.

Criterion Ten: Useful in Clinical Services The assessment instrument(s) used should support the clinical processes of a service with minimum interference. An important selection criterion is whether the instrument’s language, scoring, and presentation of results supports clinical decisions and communication. Those who need to communicate with each other include the clinical and service staff working with the client, and the clients and their collateral-significant others. These clinically relevant questions might be considered when discussing the instrument(s) utility: (a) Will the test results describe the likelihood that the client needs services and be appropriately responsive to available services? (b) Do the test results help in planning the array and levels of services, treatments, and intervention styles that might best meet service goals? (c) Do the test results provide sufficient justification for the planned treatment to be reimbursed by third-party payors? (d) Is the client responding to treatment as planned, and, if not, what areas of functioning are or are not responding as expected? An ideal instrument of this criterion would be sufficiently supportive of these processes, such that the effort required to collect and process the data would not be seen as a burden. The logic here is complementary to Criterion Seven, that is, the measure should have low

costs relative to uses in screening-treatment planning, quality assurance, cost control, and revenue generation. Here, however, the emphasis is on utilization of the measure’s results. The more the instrument is seen as supporting these functions, the less expensive and interfering the instrument will be perceived by clinical staff.

Criterion Eleven: Compatibility with Clinical Theories and Practices An instrument that is compatible with a variety of clinical theories and practices should have wider interest and acceptance by a broad range of clinicians and stakeholders than one based on only one concept of treatment improvement. The former would provide a base for

evaluative research by contrasting the relative effectiveness of different treatment approaches or strategies.

107

108

NEWMAN AND CIARLO

How does one evaluate the level of compatibility? The first step is to inquire about the context in which it was developed and the samples used in developing norms. For example, if the normative sample was clients on inpatient units, then it probably would be too limited, because inpatient care is seen as the most restrictive and infrequently used level of a continuum of care. The broader the initial sampling population used in the measure’s development, the more generalizable the instrument. Ideally, one would want to have available norms for both clinical and normal populations. For example, if an instrument is intended for a population with a chronic physical disability (e.g., wheelchair-bound), then, for sampling purposes, the definition of-a normal functioning population might change to persons with the chronic physical disability who function well in the community

(Saunders, Howard,

& Newman,

1988). Another indicator of measure compatibility is whether there is evidence that its use in treatment/service planning and review matches the research results published in refereed journals. This is especially important when the data are used to contrast the outcomes of two or more therapeutic (or service) interventions. In reviewing this type of research, one first should review the types of clients served, the setting, and the type of diagnoses and problems treated. One also should note the differences in standard deviations among the groups in this literature. Evidence of compatibility would be indicated by similar (homogeneous) variations among the treatment groups. Homogeneity would indicate that errors of measurement (and/or individual differences and/or item difficulty) were not biased by the therapeutic intervention that was employed. One note of caution is that it is possible for a measure to have homogeneity of variance within and across treatment groups, and to still lack equal sensitivity to the respective treatment effects. If a measure is not sensitive to treatment effects, its use as a progress or outcome assessment instrument is invalid. Methods for assessing these features are discussed in Chapter 6 of this volume.

Final Comments The 11 criteria discussed here should serve as guides for evaluating an assessment instrument; they are not presented as firm rules of conduct. Few, if any, instruments can meet all the criteria fully. But if their use is as a means of drawing together available information on an instrument, they will decrease the number of unexpected or unpleasant surprises in the adaptation and use of the measure. The application of the 11 criteria has its own costs. Although a master’s degree of level psychometric training is sufficient background to assemble the basic information on an instrument’s ability to meet these criteria, a full explication of the criteria requires broader input. Some of the-criteria require clinical’ supervisors and managers to review clinical standards, program procedures, and policies. Other criteria will require an interchange among clinical supervisory and fiscal management personnel in areas of inexperience. However, it would be the authors’ contention that the ultimate benefits to clients and stakeholders of applying these criteria are well worth the costs.

Acknowledgments Several colleagues provided useful suggestions on this project: Betty Blythe, Gonzalez, Patricia Michael, and Herbert Steinler.

Siobhan

References Achenbach, T. M., & Edelbrock, C. S. (1983). Manual for the Child Behavior Checklist and Revised Behavior Profile. Burlington, VT: Department of Psychology, University of Vermont. Beutler, L. E., & Clarkin, J. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. New York: Brunner/ Mazel. Carter, D. E., & Newman, F, L. (1975). A client oriented system of mental health service delivery and program management: A workbook and guide. Series FN No. 4, DHHS No. 80-307). Rockville, MD: Mental Health Service System Reports. Ciarlo, J. A., Brown, T. R., Edwards, D. W., Kiresuk, T. J., & Newman,

F. L. (1986).

As-

sessing mental health treatment outcome measurement techniques. DHHS Pub. No. (ADM)86-1301. Washington, DC: U.S. Government Printing Office. Clum, G. A., Clum, G. A., & Surls, R. (1993). A meta-analytic comparison of treatments for panic disorder. Journal of Consulting and Clinical Psychology, 61, 317-326. Cronbach, L. J. (1970). Essentials of psycho-

logical testing (3rd ed.). New York: Harper & Row. Cytrynbaum, S., Ginath, T., Birdwell, J., & Brandt, L. (1979). Goal attainment scaling: A critical review. Evaluation Quarterly, 3, 5—

40. Ellsworth, R. B. (1975). Consumer feedback in measuring the effectiveness of mental health programs. In M. Guttentag & E. L. Struening (Eds.), Handbook of evaluation research: Volume 2 (pp. 239-274). Beverly Hills, CA: Sage. Green, R. S., Nguyen, T. D., & Attkisson, C. C. (1979). Harnessing the reliability of outcome measures. Evaluation and Program Planning, 2, 137-142. Heverly, M. A., Fitt, D. X., & Newman, F. L. (1984). Constructing case vignettes for evaluating clinical judgement. Evaluation and Program Planning, 7, 45-55. Heverly, M. A., & Newman, F. L. (1984). Evaluating the influence of day treatment program orientation on clinicians’ judgments. /nternational Journal of Partial Hospitalization, 2, 239-250.

Howard, K. I., Kopta, S. M., Krause, M. S., & Orlinsky, D. E. (1986).

The dose-effect rela-

tionship in psychotherapy. American Psychologist, 41, 159-164. Kane, R. A., & Kane, R. L. (1981).

Assessing

the elderly: A practical guide to measurement. Lexington, MA: Lexington Books. Kopta,

A.

M.,

Newman,

F.

L.,

McGovern,

M. P., & Angle, R. S. (1989). The relationship between years of psychotherapy experience and conceptualizations, interventions, and treatment plan costs. Professional Psychology, 29, 59-61. Kopta, S. M., Newman, F. L., McGovern, M. P., & Sandrock, D. (1986). Psychotherapeutic orientations: A comparison of conceptualizations, interventions and recommendations for a treatment plan. Journal of Consulting and Clinical Psychology, 54, 369-374. Lambert,

M.,

Christensen,

E., &

DeJulio,

R.

(Eds.). (1983). The assessment of psychotherapy outcome. New York: Wiley. Mangen,

D.

J.,

&

Peterson,

W.

A.

(1984).

Health, program evaluation, and demography. Minneapolis: University of Minnesota Press. McGovern, M. P., Newman, F. L., & Kopta, S. M.

(1986). Meta-theoretical assumptions and psychotherapy orientation: Clinician attributions of patients’ problem causality and responsibility for treatment outcome. Journal of Consulting and Clinical Psychology, 54, 476-481. Mintz, J., & Kiesler, D. J. (1981). Individualized measure of psychotherapy outcome. In P. Kendall & J. N. Butcher (Eds.), Handbook of research methods in clinical psychology. New York: Wiley. Newman, F. L. (1980). Global scales: Strengths, uses and problems of global scales as an evaluation instrument. Evaluation and Program Planning, 3, 257—268. Newman,

F. L. (1983).

Therapists’

evaluations

of psychotherapy. In M. Lambert, E. Christensen, & R. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 498— 536). New York: Wiley. Newman, F. L., Fitt, D., & Heverly, M. A. (1987). Influences of patient, service program and clinician characteristics on judgments of functioning and treatment recommendations. Evaluation and Program Planning,

10, 260-

267.

109

110

NEWMAN AND CIARLO Newman,

F. L., Griffin, B. P., Black, R. W., &

Page, S. E. (1989). Linking level of care to level of need: Assessing the need for mental health care for nursing home residents. American Psychologist, 44, 1315-1324. Newman, F. L., Heverly, M. A., Rosen, M., Kopta, S. M., & Bedell, R. (1983). Influ-

ences

on internal evaluation

data depend-

ability: Clinicians as a source of variance. In A. J. Love (Ed.), Developing effective internal evaluation: New directions for program evaluation (No. 20, pp. 71-92). San Francisco: Jossey-Bass. Newman, F. L., & Howard, K. I. (1986). Ther- apeutic effort, outcome and policy. American Psychologist, 41, 181-187. Newman, F. L., Hunter, R. H., & Irving, D. (1987). Simply measures of progress and outcome in the evaluation of mental health services. Evaluation and Program Planning, 10, 209-218. Newman,

F.

L.,

Kopta,

S.

M.,

McGovern,

M. P., Howard, K. I., & McNeilly, C. (1988). Evaluating the conceptualizations and treatment plans of interns and supervisors during a psychology internship. Journal of Consulting and Clinical Psychology, 56, 659-665. Newman, F. L., & Sorensen, J. E. (1985).

Inte-

grated clinical and fiscal management in mental health: A guidebook. Norwood, NJ: Ablex. Nunnally, J. C. (1978). Psychometric theory, 2nd ed. New York: McGraw-Hill. Orlinsky, D. E., & Howard, K. I. (1986). Process and outcome in psychotherapy. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 311-381). New York: Wiley. Patterson, D. R., & Sechrest, L. (1983). Non-

reactive measures in psychotherapy outcome research. Clinical Psychology Review, 3, 391—

416. Robinson, L. A., Berman, J. S., & Neimeyer, R. A. (1990). Psychotherapy for the treat-

ment of depression: A comprehensive review of controlled outcome research. Psychological Bulletin, Saunders,

108, 30-49. S. M., Howard,

K. I., & Newman,

F. L. (1988). Evaluating the clinical significance of treatment effects: Norms and normality. Behavioral Assessment, 10, 207—

218. Stiles, W. B., & Shapiro, D. A. (in press). Disabuse of the drug metaphor: Psychotherapy

process-outcome correlations. Journal of Consulting and Clinical Psychology. Strupp, H. H., & Hadley, S. W. (1977). A tripartite model of mental health and therapeutic outcome.

American

Psychologist,

32,

187—

196. Turner, R. M., McGovern, M. P., & Sandrock, D. (1982). A multiple perspective analysis of schizophrenic symptomatology and community functioning. American Journal of Community Psychology, 11, 593-607. Uehara, E., Smukler, M., & Newman,

F. L. (in

press). Linking resources use to consumer level of need in a local mental health system: Field test of the “LONCA” case mix method. Journal of Consulting and Clinical Psycholo-

&y. Yates, B. T., & Newman, F. L. (1980). Findings of cost-effectiveness and cost-benefit analyses of psychotherapy. In G. VandenBos (Ed.), Psychotherapy: From practice to research to policy (pp. 163-185). Beverly Hills, CA: Sage.

Chapter 6 Selection of Design and Statistical Procedures for Progress and Outcome Assessment Frederick L. Newman Florida International University

The selection of appropriate outcome research design and statistical procedures in the analysis of psychological test results must be driven by their application (i.e., by the clinical and decision environment in which the procedures are to be used). This chapter offers recommendations and guidelines for selecting research design and statistical procedures useful in progress and outcome assessment. Each application has unique demands warranting different, but not necessarily independent, designs or statistical approaches. There is, of course, a concern regarding the measure’s psychometric qualities and its relationship with outcome. For progress and outcome applications, there are the additional requirements of sensitivity to the rate and direction of the change relative to treatment goals. The discussion in this chapter focuses on issues of analysis that should be addressed and guidelines for evaluating and selecting research designs and statistical procedures for outcome investigation in psychological treatment.

Approach to Presenting the Statistical Material The logic of the chapter’s presentation is first to discuss a specific clinical or mental health service issue and then to recommend one or more statistical procedures that can address the

issue. The logic of the clinical issue is bridged with the logic of the statistical procedure by presenting the language of the mathematical expression that underlies the statistical procedure. The expressions are presented here for three reasons. First, the clinical focus of the discussion is designed to help the reader understand the logical link between the clinical issue and the statistical procedure. Second, the discussion is designed to provide the reader with a sufficient understanding of the statistical logic and vocabulary to read and use a statistical computer package manual and related texts, or to converse with her or his resident statistician. Third, the discussion should help the reader understand the strengths and weaknesses in the link between the clinical issue and the statistical procedure. References are provided for

111

112

NEWMAN

each technique along with computational details and examples of how the technique is used in clinical research applications.

A Preliminary Step: Defining

a Measurement Model

to Describe Treatment Evaluation-Process-Outcome Data Discussion on selecting statistical procedures follows from two baselines. One is a formal conceptualization of measurement (i.e., what does the instrument seek to measure and what

are the potential sources of error in this estimation?). The second is the clinical or service management question that is being asked of the measure. What follows is a definition of the general model and notation that is used throughout the chapter. This model and notation are introduced for descriptive purposes. It is not the only model, nor necessarily the best model for all situations. However, it is a model that lends itself to a discussion of outcome assessment in mental health services. Collins and Horn’s (1991) text Best Methods of the

Analysis of Change provided a good review of alternative models. Suppose that at a specific time (¢) one is interested in obtaining a measure Y;,,, that proposes to describe a particular domain of human functioning (8) on the ith individual who belongs to a particular target group (8;,) and is receiving a specific treatment (a,). In the measurement model, one can describe the influence of the person belonging to the kth target group and being under jth treatment at time ¢, on the observed behavior (Y;;,,) of the domain called 6, as follows: NG he 8 2 + BSing + ABS sxe + jx

(1)

The term oB8;;,, is the interaction of the jth treatment and the kth target group that influences the functional domain (6) for the ith subject at the time t, when the measure was

taken. There are two features of the model offered here that are different from that offered in standard texts. One is that the time that the measure is obtained (ft) is included as a subscript in each term, and as such appears to be a constant. Time, of course, is a constant, but ¢ is

included here to remind us that all measures, particularly clinical measures of functional status (i.e., state rather than trait measures) are time-dependent. A more formal statement of the model could have treated time as an additional element of the model, adding complexity to the presentation. Another tact could have left t out completely. However, the temporal nature of the measure of functional status is important in most clinical service applications. Thus, a simple subscripted t is used to indicate the temporal status of the measurement model. The clinical and statistical issues involved in measuring changes in functional status over time (progress and outcome) are discussed in detail later in the chapter.

The second feature of the measurement model that differs from the more standard presentations is the addition of the parameter, 8, to represent a particular domain of functioning. As

with the use of the time element in the expression, 8, a specific functional domain, is added

for emphasis. Each of the elements in the expression (treatment or target population charac-

teristic in this case) should be seen as interacting with the measure of the individual on a

specific functional domain. The inclusion of 5 reminds one that the model may not hold if the observation made (Y;;,,) does not actually represent the functional domain of interest.

6

DESIGN AND STATISTICS FOR OUTCOME ASSESSMENT

113

Prior to treatment the model reduces to: Vine oe B8ixe a8 €ixe

(2)

where the term B8,;, is the true value of the domain for the ith person belonging to the B, target population, at a given time (f). The last term, €;,,, is an error term for the ith person that combines potential differences (error) due to at least three (potentially interacting) features: (a) item (measure) difficulty (b,,) at the time; (b) error of measurement at that time (m,,); and (c) individual differences introduced by that person (the individual’s state) at that time (d;,). The three potentially confounding components of the error term are discussed later in the chapter. As additional target population characteristics are considered, the potential for them to interact with the other terms only will add to the model’s complexity. Because most scoring procedures attempt to derive a composite score, adding additional sources of variation, such as those due to consumer characteristics, typically compounds error (i.e., combined with D, m, and d). Because these sources of error are nonsystematic, they cannot simply be subtracted from aggregate scores. The number of sources of variance expand with the addition of just one additional consumer characteristic (I). This would add two fixed interactions with

treatment effects (al’;, and aBI’,;) and the potential for up to seven random interactions with the three confounded components of error (item difficulty b, error of measurement m, and individual differences d). Before one becomes overwhelmed with the potential for intractable complexity, there are various measures available that have shown sufficient psychometric quality, with sufficiently strong treatment effects and consumer effects, but small random error effects. These instruments may be applied to a fairly wide range of consumer characteristics. This is good news, because, as the expert NIMH panel on outcome assessment recommended (see chapter 5), an ideal measure should serve a wide range of consumer groups. The wide applicability of these measures also decreases the costs of providing different measures for each of the consumer groups and increases the measure’s utility (e.g., in planning or evaluating a program of services).

Assessment of Treatment: Progress and Outcome Prior to a discussion of measuring treatment progress and outcome, it is noteworthy that the clinician should have evidence of the reliability and the validity of the instrument being used. One should conduct reliability and validity studies on the instrument so that there are hard data to support the decision to influence the life of another person, and to justify the use of clinical and economic resources in doing so. This section begins with a description of four assumptions about defining treatment progress and outcome goals that need to be addressed prior to selecting a statistical procedure for assessing questions regarding consumer progress and outcome. The remainder of the chapter focuses on a sequence of five clinical service questions that set the stage for selecting

statistical questions. Recommendations for selecting a statistical procedure are given under

each of the five questions. The questions are sequenced from a macro to a micro level of analysis: (a) Did change occur, by how much, and did it sustain? (b) What effort (dose) was expended? (c) For whom did it work? (d) What was the nature of the change? (e) Did the state(s) or level(s) of functioning stabilize?

114

NEWMAN

SPECIFYING TREATMENT-SERVICE

GOALS

The selection of a statistical approach to describe treatment progress or outcome should depend on the anticipated goal(s) of treatment. There are four assumptions that must be explicit when specifying a treatment goal and selecting a statistical approach. The first is that the consumers! (clients, patients) are initially at a clinically unsatisfactory psychological state and/or level of functioning, and that there is a reasonable probability of change or at stabilization by a given therapeutic intervention. Second, it is assumed that an agreed on satisfactory psychological state or level of functioning, observably different from the initial state, can be defined for that individual or for that clinical population. Third, it is assumed that a measure, scale, or instrument is available that can describe reliably and validly the status of the person in that target population at any designated time. The fourth assumption is that the instrument’s score(s) describing an individual at an unsatisfactory state is reliably different from the score(s) describing that same individual at a satisfactory state, and the scores are not limited by ceiling and floor effects. If each of the assumptions is met, one can proceed to specifying treatment goals and selecting the statistical approach to estimating the relative effectiveness of consumer progress or outcome as described in the specific goals. To support this approach, the remainder of this chapter organizes the discussion of the statistical procedures around five generic questions regarding the achievement of specific treatment goal(s) for specific therapeutic intervention(s). The five questions and related statistical approaches are ordered from a macro level to a micro level of investigation. This is done to give a context for determining what should be studied and how the data should be analyzed. The theme is that a statistical procedure must fit the context of the question being asked. The statistical literature has a rich history of controversy and discussion on which aspects of change should be investigated and how each aspect should be analyzed (Collins & Horn, 1991; Cronbach & Furby, 1970; Francis, Fletcher, Strubing, Davidson, & Thompson, 1991; Lord, 1963; Rogosa, Bryant, & Zimowski, 1982; Rogosa & Willett, 1983, 1985; Willett, 1988; Zimmerman & Williams, 1982a, 1982b). Historical controversies can be avoided by

carefully articulating the research or evaluation question. The criterion for a well-developed question is that it frames the appropriate unit and level of analysis relevant to the question. To use the following section, the reader should first formulate an initial draft of the question(s) and the unit of analysis. Next, the investigator should check that the four assumptions (discussed earlier) have been made explicit, modifying the question(s) if necessary. Then, the investigator can match his or her research question(s) to those given next, and finally consider the recommendations offered. Although there is no perfect method, the discussion should provide the reader with guidelines to identify the best method for his or her situation.

QUESTION ONE: DID CHANGE OCCUR, BY HOW MUCH, AND WAS IT SUSTAINED? Did specific domains of consumer functioning or states change by the end of therapy, by how much, and were the changes sustained over time? This is often considered to be the first question that those developing a therapeutic innovation seek to address: Does the therapy make a difference? There are two general approaches that have been employed when address-

‘Consumer, rather than client or patient, is used to identify the recipient of psychologi cal services in this chapter

6

DESIGN AND STATISTICS FOR OUTCOME ASSESSMENT

ing this question. One is to investigate the magnitude of the difference between the pretreatment and the posttreatment scores on the selected measure across subjects. The second is to contrast the trends on the status measures taken on subjects over time (pre-, during-, post-, and following treatment). Each approach has its strengths and limitations, but they both have the potential to provide a gross estimation as to whether the therapeutic intervention makes a

difference. The difference score, D; is often considered the most basic unit of analysis, with D; = X;, — Xj, where X;, and X;, are the observations recorded at time 1 (usually prior to treatment) and at time 2 (usually at the end of treatment) for the ith person. The mean values of D; can be contrasted between groups or within a single group against an expected outcome of no difference (D = 0.0, or an equal number of positive and negative values of D). There are two issues to be addressed here. One is to decide whether to use D; as the basic unit of analysis. The second is the research design used to address the question. There is an extensive and controversial body of literature on whether to use D;. Discussion has focused on two features of D;: (a) the reliability of the difference score is inversely related to the correlation between the pre- and posttreatment measures, and (b) the potential for a correlation between initial (pretreatment) status and the magnitude of the difference score. The potential bias introduced by these two features led Cronbach and Furby (1970) to recommend that the difference score not be used at all. They recommended that researchers instead concentrate on between group outcome, posttreatment measures. Others have argued for the use of alternatives such as a residualized gain score, where the difference score is adjusted for initial, pretest differences (Webster & Bereiter, 1963).

However, there is an opposing point of view. Rogosa et al. (1982), Rogosa and Willett (1983,

1985), Willett (1988,

1989), and Zimmerman

and Williams (1982a, 1982b) collec-

tively have developed arguments with sufficient empirical support to conclude that difference scores were being damned for the wrong reasons. These authors also provided strong evidence that some of the most popular solutions (e.g., the residual gain score) have worse side effects than the problems they set out to solve. The potential inverse relationship between the reliability of D and the correlation between pre- and posttreatment scores is not necessarily a problem. The difference score is an unbiased estimator of change (a process), and scores at each of the other times (pre and post) are estimators of status (not a process) at those two respective times. When there is low reliability in D, it is to be interpreted as no consistent change. But as Zimmerman and Williams (1982a, 1982b) showed, it is possible for the reliability of the difference score to exceed the reliability of the pretreatment score or the reliability of the posttreatment score. When this occurs, one still may conclude validly that there is reliable change for persons, even though there were unreliable measurements obtained at times 1 and 2. Although there is

a problem of measurement error at each of these times, one still can conclude that there is a consistent change (a reliable process) among the inconsistent measures of status at each time. The second problem regarding D, pertains to the correlation between the magnitude of a difference score and the value of the pretreatment score. It is intuitively obvious that a person with a low pretreatment score, indicative of more severe psychological maladjustment,

would appear to have a greater probability of obtaining a higher score on the second occasion (posttreatment). Despite this obvious relationship, Rogosa and Willett (1983) showed that

there is a negative bias in the estimate of the correlation between the initial score and the difference score. Negative bias must not be misinterpreted as a negative correlation (Francis et al., 1991). Rogosa and Willett (1983) amply demonstrated that a raw difference score is

not the best statistic for estimating the correlation between initial status and change. When investigating the relationship(s) between initial status and change, the research

11S

116

NEWMAN

question should be refined to focus on the consumer characteristic that predicts different rates of change. The refined question considers the characteristic related to initial level as a moderator

variable

(i.e., consumer

attribute that moderates

the effects of the treatment

variable). The logic here focuses on the interaction between the moderator variable and the treatment variable. For the simplest case, the values of D;, would be predicted by the interaction between the consumer’s initial level of the characteristic, B,, in moderating the potential impact of the treatment, a,, and this influence would be observed as an interaction effect, a. This conceptualization results in an expression that can be evaluated by either a regression or variance analysis, using the standard computer statistical packages: Dj, = constant + a; + By + aBy + Ej

(3)

There is a strong logical argument for incorporating a well-conceived consumer characteristic as a moderator variable in most therapeutic intervention studies. The field has claimed that individual therapeutic approaches are not all things to all people. But it also is becoming apparent that researchers need to refine the questions, “For whom does the therapy work most effectively?” A consumer characteristic moderator variable is just that: a variable that potentially moderates the degree to which the therapeutic intervention will have impact, because of a characteristic brought into therapy by the consumer. One also could argue that if the theoretical construct underlying the therapeutic intervention is adequately developed, the moderator variable(s) should be identified easily. It also is possible that the variable might be a mediating variable, rather than a moderating variable. Here, a mediating variable is one whose refinement, presence, or absence in the design is required for the therapeutic intervention to be observed. Shadish and Sweeney (1991) showed that effect sizes in psychotherapy studies were related directly to moderator variables, such as the outcome measure selected, standardization of the therapeutic interventions, and setting in which the study was conducted. By incorporating an appropriate consumer characteristic as a moderator variable in the research design, the investigator will be able to obtain estimates of relationships that can then be employed in a testable structural equation describing how the various variables come together to produce a therapeutic outcome (see Question Four). A number of texts recommend using a residual gain score to estimate the differences. The residual gain score is calculated by adjusting the difference score by the correlation between initial level and either the posttest score or the difference score. This author joins Rogosa and Willett (1985) and Francis et al. (1991) in recommending that the residual gain score is a poor choice and should be avoided. There are two critical flaws with using the residual gain score. First, when used in the context of clinical psychology, it describes a state of affairs that does not exist in reality (adjusting subjects to be at the same initial level, which is seldom true). Second, it adjusts a measure of change (a process measure) with a status measure (pretreatment scores). The resulting statistic is no longer an unbiased measure of the change process, because it was adjusted by a measure of status, which contains its own unique sources of error.

Should one use a difference score? The answer is yes, if the research question is simply

stated, such as: Is there change related to treatment? Unfortunately, the issues related to a treatment intervention are often more complex. At a minimum, the investigator typically questions the treatment’s differential effects with regard to one or more consumer characteristics over the course of treatment. Recently, investigators have become interested in the sustenance of the change after treatment has formally stopped. Some of these concerns can be assessed at the level of this first question (i.e., Did change occur, by how much, and did it sustain?). In these instances, the issues when addressing these refined questions pertain to

6

DESIGN AND STATISTICS FOR OUTCOME ASSESSMENT

design: What groups need to be contrasted? When does one need to sample consumer behaviors? In other instances, the refined question requires going to another level of focus (see Questions Two to Five). For those question refinements that can still be articulated as “Did change occur, by how

much, and did it sustain?”, issues of design and sampling time frames need to be identified. One design issue that can be dealt with easily is whether a Solomon four-group design is required in the evaluation of a therapeutic intervention. The Solomon four-group design attempts to control for or estimate the effects of carryover from pretest to posttest. Half of the subjects in the treatment group and half of the subjects in the control group are randomly assigned to a no pretreatment test condition and half to both a pre- and a posttreatment test condition. The carryover effects of testing that are estimated via a Solomon four-group design may not be a factor in most treatment research. Most consumers that enter therapy are not naive as to what their problems are, nor are they naive as to the general purpose of the intervention. This is particularly common in the experience of those working with persons with a severe mental illness or with those who are substance abusers. Moreover, Saunders’ (1991) work

clearly showed that the majority of those who have entered psychotherapy and go beyond two sessions had prior experience with at least the intake process for psychotherapy. Given this, serious researchers studying a particular therapy or therapies should, if the opportunity presents itself, run a pilot study with a Solomon four-group design to ensure that such carryover effects are not a significant source of variance. There are two remaining design issues: What groups should be sampled? When does one collect data? The quick answer to the first is to sample those groups that satisfy the question and eliminate alternative explanations. As discussed earlier, it is best to partition consumers by levels of any characteristic expected to modify the impact of the treatment. Enough has been written about the difficulty of interpreting single-group results that most researchers will understand the need to develop either a “waiting-list” control or a “treatment as usual” control to contrast with an innovative treatment. The quick answer to the second question on how often to collect data is: If possible, collect data two or more times during treatment and two or more times after treatment. An optimal minimum is to collect data at four times: pretreatment, half-way through treatment, posttreatment, and at least once in follow-up (e.g., 6 months following treatment). A typical design would be a mixed between-group repeated measures design with two between-group variables (treatment variable, a,, and an initial consumer characteristic identified as a moderator variable, B,,) and one within-group (time) variable of two, three, or four

levels. Here, an analysis of variance or multivariate analysis of variance of the linear and quadratic trends can describe between-group differences in the direction and rates of change over time. Between-group contrasts of the linear trends within each group offers a test of whether direction and magnitudes of change vary as a function of groups. Between-group contrasts of the quadratic trends within each group would describe whether there are significant differences among groups in how their initial changes were modified over time. For the analysis of functional status over the three times from pretreatment, midtreatment, to imme-

diately posttreatment, the between-group contrast of quadratic trends would describe the course of change over the course of treatment. When considering the changes between pretreatment and follow-up,

evidence of regressive or sustenance

trends could be tested.

As was true of other forms of univariate and multivariate analysis of variance, the standard statistical packages have programs that can perform these analyses. One problem with these forms of analysis is that they do not tolerate missing data. They discard all subjects with data missing at any one point in time. But if the research questions are at the macro level

117

118

NEWMAN

of group trends and effects, these designs will be adequate. From experience, most investigators using these designs are satisfied initially, but then wish to understand some of the differences among subjects within the groups. Here, the analysis of variance methods are limited and the micro-level questions discussed next are found to be more satisfying.

QUESTION TWO: WHAT EFFORT (DOSAGE) WAS EXPENDED? What was the amount of time or effort expended for a consumer to achieve a satisfactory psychological state or level of functioning? Although the issues underlying this question have been proclaimed for some time (Carter & Newman,

1975; Fishman, 1975; Yates, 1980; Yates

& Newman, 1980), it was not until the late 1980s that this question began to be recognized as part of a fundamental issue of outcome research (Howard, Kopta, Krause, & Orlinsky, 1986; Newman & Howard, 1986; Newman & Sorenson, 1985). Recent interest appears to be centered on the economic concern regarding the worth of the investment in mental health care, rather than a scientific concern on how much is enough. The measures of effort often are easy to develop and are readily available to the researcher if there is a plan to collect the effort data. There are three major classes of effort measures that have served as either predictor or dependent variables in psychotherapy and mental health services research. They are: (a) dosage (i.e., the number of therapeutic events provided over

the period of a clinical service episode); (b) the level of treatment restrictiveness (i.e. , the use of environmental manipulations to control the person’s behavior); and (c) the cumulative costs of the resources invested in treatment (i.e, the type of staff, staff time, and material resources consumed) (Newman & Howard, 1986; Newman & Sorenson, 1985).

Dosage is the measure employed most frequently when only a single modality is considered (e.g., number of days of inpatient or nursing home treatment). Restrictiveness measures can be developed at a sophisticated or a simple level. Hargreaves and his colleagues (Hargreaves,

Gaynor,

Ransohoff,

& Attkisson,

1984; Ransohoff,

Zackary, Gaynor,

& Harg-

reaves, 1982) had panels of clinical experts, employing a magnitude estimation technique, scale the levels of restrictiveness for interventions designed to serve the seriously mentally ill. Newman, Heverly, Rosen, Kopta, and Bedell (1983) used a simpler approach to quantify level of restrictiveness by giving of value of 1 to an outpatient visit, 2 to day treatment, and 3 to inpatient care in the treatment plans proposed by 174 clinicians for a standardized set of 18 cases. To create a dependent measure that combines dosage with restrictiveness of effort, these dosage and restrictiveness scores were cross-multiplied. Significant relationships were found between this dependent measure and three predictor variables: (a) levels of functioning at intake, (b) level of social support, and (c) level of cooperativeness at the start of treatment. A measure of the costs of resources consumed combines the concepts of dosage and restrictiveness, because the costs of staff time and the resources used to exert environmental control during the clinical service episode are summed to calculate the costs. However, the concept of employing costs as an empirical measure of therapeutic effort is still sufficiently new to the field, such that there appears to be some misconceptions that inhibit its use in research. Newman and Howard (1986) described the three popular incorrect perceptions: (a) Confusion of costs with revenues. Revenues are the monies that come to the service from many

different sources (payment of fees charged, grants, gifts, interest on cash in the bank). (b) Confusion of costs with fees charged. Fees charged may or may not cover costs of services

6

DESIGN AND STATISTICS FOR OUTCOME ASSESSMENT

provided. Profits accrue when fees collected are greater than costs and deficits accrue when they are less. If all consumers have similar diagnoses and problems, if they receive the same type and amount of treatment, and if the fees charged equal the costs of the resources used, then the costs equal the fees. However, for mental health programs and private practice, the costs of the clinical efforts vary across consumers and therapeutic goals. (c) Confusion of costs and fees in private practice. This is being recognized as a myth by more and more private practice clinicians. Unfortunately, this myth is being perpetuated by third-party payor reimbursement practices, where a single reimbursement rate is being set for a broad spectrum of diagnoses, independent of consumers’ levels of psychosocial functioning, social circumstances, and therapeutic goals. It is intuitively obvious to most private practice clinicians that not all consumers require the same levels of care to achieve a satisfactory psychological or functioning level. It is also obvious that it is more profitable to restrict one’s practice to those consumers who can be treated profitably within the limits set by reimbursable fees, rather than by the treatment goals achieved. Unfortunately, unprofitable consumers might be referred elsewhere. There are three statistical approaches that can be applied usefully to measures of effort: (a) probit analysis, focusing on the cumulative proportion of the sample that has achieved a criterion of success (or failure) at each level (dose) of the intervention’s events; (b) log-linear

analysis, using a multidimensional test of independence of two or more variables (each having two or more levels) in predicting two or more classes of outcomes; and (c) univariate and multivariate regression and variance analysis, focusing on the unique characteristics and limitations of applying these traditional approaches to analyzing effort data. The shape of the distributions of measures of effort is of some concern. They typically are skewed positively, with as much as 3%—10% of the subjects having effort measures 3+ standard deviations above the median. The first two approaches are affected less by the precise shape of the distribution of the measure of effort than the last approach. The focus of questions addressed by probit and log-linear analyses is on the relative frequency of observations that fall within given classes or ranges of outcome. The distribution of the measures of effort are more important for univariate and multivariance regression and variance analyses that have the usual assumptions of parametric statistics (e.g., normality, independence between within-group error variance, and group assignment). The difficulty of analyzing extremely skewed distributions typically is dealt with by one of two methods. One is to drop those subjects with extreme scores (e.g., the top 10% or 20%) from the analysis—the outliers. Another approach is to transform the values to produce a more normal appearing distribution. The arcsin and the log transformations are two popular transformations. These approaches may have negative consequences of either dropping data that should be considered or to transform the conceptual base to mean something other than it was conceptualized to mean originally. It is the author’s experience that investigators will invoke either of these approaches to deal with the statistical issues, without considering what the implications are to the clinical aspects. Although probit and log-linear analyses do not require throwing away data, they do use a log transformation of the data during the analytic process. For the examples I have reviewed,

the probit analyses appear to have preserved the conceptual basis of the studies as they were designed by the investigators. It also can be argued that both probit and log-linear analyzes have their own set of negatives. The principal negative aspect is that relatively large samples are required to assure that observed differences in relative frequencies are stable. With these notes of caution, I proceed to describe the conceptual basis underlying and the applications of the three approaches.

119

120

NEWMAN

Probit Analysis.

In probit analysis, the basic unit of analysis is the proportion of subjects

within a specific group to achieve a satisfactory level of psychological or social functioning after a given dosage of treatment has been provided. Howard et al. (1986) described this relationship as “ . . . the amount of treatment (dose) needed to achieve a specific percentage of patient improvement (effect)” (p. 160). A probit model is created that uses the observed proportions of subjects in the jth group to achieve a satisfactory outcome at the ith dosage level,

P,, = A, + B; (log of dosage X,).

(4)

To estimate the values of A; and B; in the model for the jth group, the observed values

(proportions of subjects to achieve a satisfactory criterion at each dosage) are entered into a maximum likelihood procedure. Once estimated for a group, a model is created that generates a function describing the expected relationships between dose and proportions to achieve a satisfactory outcome. The probit analysis provided in most statistical packages will generate a set of probit values and the estimated standardized proportions of persons expected to achieve a measured satisfactory state or level of functioning for each successive dose level, along with the 95%

confidence intervals about each probit value within a given group. Thus, the analysis provides dosage by success rate functions, along with the envelope of 95% confidence intervals about each function for each group. The extent of nonoverlap between the 95% interval envelops for two or more groups will describe the statistical significance of between-group differences. Howard et al. (1986) found significantly different dose-effect functions for three diagnostic groups receiving psychotherapy: depression, anxiety, and borderline psychotic. There are three major limitations of probit analysis in addressing the question of “How much is enough?” One is that relatively large samples are needed to refine the dosage levels, probably more than 50 subjects per treatment group. The second is that only between-group main effects can be tested. However, it is possible to contrast the overlap in the 95% confidence interval across dose levels (or “confidence interval envelope”) among any two or more groups. This results in a test of simple effects among groups in a design with two or more between-group variables. Experiment-wise error rates are an important concern. The third limitation is that probit analysis is only applicable to the dose-effect relationships. It cannot be applied to restrictiveness and cumulative cost classes of effort measures. The next two sets of approaches can be used for all three classes of effort measures.

Log-Linear Analysis. The basic unit of measure when applying a log-linear analytic approach is the rank order of the magnitude of the effort measure when considering all subjects across all groups. For example, if there are 320 subjects in four groups, with 80 per group, the rank-order values on the effort measure can vary from 1 to 320. Once each subject receives his or her rank-order score, any between-group rank-order (nonparametric) statistical approach can be applied. A researcher should consider the log-linear analysis, because of its ability to analyze higher level designs, with at least one treatment and one consumer characteristic variable. Table 6.1 provides an example of the general form of an analysis that can be considered by

log-linear analysis. Consider that one is working with seriously and persistently mentally ill

persons that are entering a community support program. At admission, they are first evaluated for their levels of interpersonal (including communication) skills, along with other characteristics. Half of those who score at a low level of interpersonal skills (Consumer Group B-1) are assigned randomly to a program that focuses on social and community functioning skills, where the treatment team works out of an office adjacent to a consumer-

6

DESIGN AND STATISTICS FOR OUTCOME ASSESSMENT TABLE 6.1

The Frequencies of Subjects in Each Group for the Successive Quartiles When Ranked by the “Cumulative Costs of the Clinical Service Episode”

Quartile Ranking on the “Cumulative Costs of the Service Episode”

Treatment

Consumer

Group

Group

A-1 Social rehabilitation treatment team

Q-4 Q-3

(Highest Costs)

(26% - 50%)

(51% -75%)

B-1 Low interpersonal

f (111)

f (112)

f (113)

f (114)

B-2 Mod-High Interpersonal

f (121)

f (122)

f (123)

f (124)

f (211)

f (212)

f (213)

f (214)

f (221)

f (222)

f (223)

f (224)

25% of total

25% of total

25% of total

25% of total

A-2

(76% - 99%)

B-1

Low

control, case CMHC

Q-2

(1% - 25%)

Symptom manager

Q-1

(Lowest Cost)

_ interpersonal

at

B-2 Mod-High Interpersonal

Sum of Columns

Note. The cell frequencies are described as f (ijk) for the ith quartile, in the jth treatment group, and the kth level of the consumer characteristic.

run drop-in center (Group A-1, B-1). The remaining half of the consumers with low interpersonal skill are assigned to a program whose treatment team interventions focus on symptom control, with a case manager who works out of community mental health center (Group A-2 in combination with B-1). The same random assignment to Groups A-1 and A-2 are made for those consumers scoring at the moderate to high levels of interpersonal skills (i.e., assigned to Groups A-1 in combination with B-2, and A-2 with B-2, respectively). Thus, in this example, there are two treatment groups and two levels of a consumer characteristic (e.g., entry level of interpersonal skills at low or moderately-high). The consumer characteristic is expected to interact with the effects of the therapeutic intervention. Suppose that there were 320 subjects involved in the study, 80 in each combination of treatment by consumer characteristic level. The cumulative costs of treatment for each consumer can be calculated for the first 6 months of treatment. These include the costs of personnel time and the materials consumed by agency personnel while serving the consumer over his or her first 6 months in the community support program. If cumulative costs of serving these consumers over the 6-month period were independent of either treatment or interpersonal communication skills, then one could expect the 320 subjects to be distributed

evenly across the cells of Table 6.1 (i.e., 20 subjects per cell). Because the columns represent

the four respective quartiles, the columns always will sum to 25% of the sample. Based on the null hypothesis for all effects, there would be 20 consumers in each cell, indicating that the distribution of cumulative service costs are independent of either treatment or consumer characteristic. The outcome would indicate that the most cost-efficient group would be the

group row with the largest observed cell frequencies in the lower quartile cells (Q-1 and Q-2) and the smallest observed cell frequencies in the higher quartile cells (Q-3 and Q-4).

121

122

NEWMAN

Although the logic here is that of a chi-square test of independence (testing whether row

assignments are independent of column outcomes), the multidimensional classification

(treatment group by consumer characteristic) nullifies the simple test of independence provided by the ordinary chi-square test. Log-linear analysis can provide a test of a cell frequen-

cy’s independence of the association with the combinations of column and multiple row

classifications that define the cell. As with the classical test of independence (chi-square), observed cell values are contrasted with expected cell values. Given that the cells are embedded in a multivaried classification scheme (three or more classes), the expected cell values need to be adjusted for main- and first-order interaction effects. The mathematical technique employed is to model each cell frequency using a natural log transformation of the observed frequencies. This permits the development of additive, rather than exponential, models to describe the relationships among classifications. Considering the

example, the researcher is interested in testing the independence of each of the two main effects (treatment type and consumer characteristic) and the interaction with quartile ranking of cumulative service costs. The researcher is assessing the likelihood that the magnitude of service costs (level of Q;) is associated with type of treatment (A;) or consumer characteristic (B,), or both. The natural log of the expected cell frequency, f, for the ijkth cell is fii = In fix) =

+ Qo, + O45 + Og, + Doi aa Oop -ix + Ovap jx a> Qos -ijk> (5)

where w is the average of all of the natural log frequencies within the table. Each of the omega terms are parameters estimated for each of the marginal effects. Each of the marginal effects is obtained in a fashion similar to a univariate analysis of variance. For example, this can be seen in computing the parameter for A;, 14; = (4; — 1), where p,; is the average of the natural log of the cell frequencies contained within A,. The test statistic for the interaction of costs by treatment (Q by A) would be derived from the ordinary chi-square test for independence, X2

[(Observed fi; — (Expected fi)1?

=

my

ij

(Expected f;;)

(6)

but employing the natural log values, an alternative statistic called the likelihood-ratio chisquare, L7, is computed as

L2=2-

Observed f;; Ash ee eee pes j om Expected f;,

)

Most statistical packages require that the user identify a design describing the interactions of interest and set an hierarchical order to the effects of interest. For my example, the highest order interaction is Q by A by B, but there are only two other terms of interest: QxAandQ x B. As is true of any hierarchical model, if the full (second-order) interaction of OXAXB is significant, then follow-up tests must be contingency tables investigating simple maineffects and the interactions. Because the follow-up tests could inflate Type I error, two precautions are recommended. First, plan the follow-up tests in advance, restricting the number of planned comparisons to the degrees of freedom available (three in the example

used here). Second, adjust the Type I error level of the follow-up tests to be more conserva-

tive (e.g., degrees of The two required to

by dividing the Type I error rate used to test the interaction by the number of freedom [0.05 + 3 = 0.017)). major limitations of the log-linear approach are (a) relatively large samples are obtain stable results, and (b) the analysis leads to conclusions regarding treatment

cost-efficiency and not cost-effectiveness or benefit. The issue of sample size sometimes can

6

DESIGN AND STATISTICS FOR OUTCOME ASSESSMENT

be handled by careful consideration of expected outcomes. As with most chi-square techniques, expected cell frequencies should be greater than or equal to 5 for the highest order of classification (the ijkth cell in the current example). It is possible to establish a model that excludes certain cells from the analysis, provided that the assertion that those cells should contain a count approaching zero can be defended logically. This could happen when contrasting an inexpensive procedure with an expensive procedure, where the investigator is interested in the middle-level cost values and the interactions with two or more predictor variables. In this case, it is quite possible for the expected cell frequencies for the lowest quartile for the expensive procedure to have expected values under 5. Most computer packages permit the user to specify the cell structure and the model to be tested. If there is a good rationale for zero frequency cells, this option should be used to analyze the data. The issue that this design only attends to cost-efficiency and not cost-effectiveness is handled best by treating this analysis as part of larger analysis, where tests of treatment effectiveness will be considered along side the test of treatment costs as illustrated by Fishman (1975). He used a strategy that considers the results of a treatment effectiveness study along with a cost-efficiency analysis (see Table 6.2). Fishman recommended a twodimensional array contrasting the results of the cost study (the columns in the Table 6.2) with the results of the effectiveness study (the rows in Table 6.2). For seven of the nine combinations of dual outcome-cost analyses, an investigator or policymaker would be able to decide on the most cost-effective choice. The issue of the need to consider an outcome (effective-

ness) study along with a cost-efficiency study also is to be considered when performing the variance and regression analyses of costs (see the discussion that follows).

There are some interMultivariate and Univariate Regression and Analysis of Variance. cumulative costs, with analysis variance or regression considering when possibilities esting along side measure effort an use to is possibility One measures. restrictiveness or dosage, would This analysis. variance or regression multivariate a in measures outcome or progress characteristic) consumer or (treatment independent the whether of issue research the address variables produce differences in consumer outcome profiles along side differences in the cumulative costs of serving them. It is obvious that when consumers improve quickly, less long-term effort is needed; and when they are slow to react to treatment and slow to change,

more or extended effort is required. Thus, it is defensible to include an effort measure along with the progress or outcome measure in the prediction equation that defines the regression or the variance analysis. Another possibility is to investigate the covariance structures of effort with consumer characteristics as they relate to outcome in a multiple regression analysis. Here, the interaction (covariance) of the level of effort with the level of consumer characteristic is treated as a TABLE 6.2 Decisions Possible When Cumulative Costs and Outcome Effectiveness Are Compared Jointly Cumulative Episode Costs

Outcome

A>B

A=B

A § sino

3s4

(ia

Peediege

pas re

eh At Pitter terete

ee (Rebs.

ee

Btcert efter orcs ae we yrs ee Gey

tye 4

= soe eee sare AS tee TE: MRCP

Cie. ~8: sear. isi _

ed Ben

abateF

4

all atk Ce ee

S

+

a

7 =

sey,

;

a

hapesies = aor), Ty ata SPTREG atanatn nao teange aetexaity

=

ae

ce ey ett

rane

30

1

;

eh

ae

“a

Weepant, hats coaleatros get

opted

af

el ge

fio

y

ee

~ ae

ie 2

[yes

Sep Sate

tpn

‘i

“anapaneatis otdhs deg aw

ae

ee.



&

eee?!

ex

it

ay

pelea:

a

©

eG

a

~.

;

. Wit ¢ iz. 0»

hove

“ever one

ae »

pe

OV

dee

-

.

PS

palrie

ta

1

ah

Chapter 7 Minnesota Multiphasic Personality Inventory-2 Roger L. Greene Pacific Graduate School of Psychology

James R. Clopton Texas Tech University

Overview SUMMARY OF DEVELOPMENT The Minnesota Multiphasic Personality Inventory (MMPI) was developed to provide an objective means to diagnose psychopathology (Hathaway & McKinley, 1940), and it quickly became the most widely used and researched objective, abnormal personality inventory. The restandardization of the MMPI resulted in the MMPI-2 (Butcher, Dahlstrom, Graham, Tell-

egen, & Kaemmer, 1989), which is the focus of this chapter. The reader who is interested in the earlier work on the MMPI should consult any of the standard references (Dahlstrom, Welsh, & Dahlstrom, 1960, 1972, 1975; Duckworth & Anderson, 1986; Friedman, Webb, &

Lewak, 1989; Gilberstadt & Duker, 1965; Graham, 1977, 1987; Greene, 1980, 1988; Hathaway & Meehl, 1951; Lachar, 1974; Marks & Seeman, 1974; Swenson, Pearson, & Osborne, 1973).

1963; Marks, Seeman,

& Haller,

The standard validity and clinical scales for the MMPI were developed empirically (Hathaway & McKinley,

1940), but the advent of the Wiggins (1966) content scales provided

rationally derived scales that could be used for clinical interpretation. The only change in the items of the standard validity and clinical scales of the MMPI-2 was the deletion of 13 items with objectionable or outdated content (Butcher et al., 1989). New content scales (Butcher, Graham, Williams, & Ben-Porath, 1989) were developed for the MMPI-2 so that clinicians still have empirically and rationally derived scales available for interpretation. The MMPI-2 consists of 567 true/false items. Scoring proceeds by counting the client’s

deviant responses to each of the items on a particular scale. The items are not weighted in the scoring process (i.e., each deviant response simply is counted).

The normative group for the MMPI-2 consisted of 2,600 individuals who were selected to be representative of the United States (Butcher et al., 1989). This group matched U.S. census

data for age, ethnicity, and marital status, but was above census data on education and occupation.

137

138

| GREENE AND CLOPTON

TYPES OF NORMS AVAILABLE The MMPI-2 normative group consisted of adults who ranged in age from 18 to 89. The person’s scores on the MMPI-2 scales are compared to either men or women in the normative group, because there are effects of gender on a number of scales. However, specific norms are not provided by age, ethnicity, and so on. The potential effects of demographic variables, such as level of education or occupation, have not been investigated in any systematic manner either on the MMPI or the MMPI-2, and such research clearly is needed. Colligan, Osborne, Swenson, and Offord (1983, 1989) demonstrated that there are substantial effects

of age on MMPI performance, and it probably is safe to assume that age effects also will be found on the MMPI-2. Dahlstrom, Lachar, and Dahlstrom (1986) and Greene (1987) reviewed the effects of ethnicity on MMPI

performance,

and concluded that there is not a

consistent pattern of scale differences between any two ethnic groups. Again, research is needed to examine potential effects of ethnicity on the MMPI-2, although similar results to the MMPI research would be expected.

BASIC RELIABILITY AND VALIDITY INFORMATION Test—retest reliability for the individual validity and clinical scales on the MMPI-2 range from .68 to .92 for a 2-week interval (Butcher et al., 1989). Test—retest reliability for the

MMPI-2 content scales are comparable, ranging from .78 to .91 (Butcher et al., 1989). The research on the validity of the original MMPI is so prolific that it almost defies summarization, because it has been estimated that there are over 10,000 studies on the MMPI (Dahlstrom et al., 1975). A number of general MMPI references can provide an overview of this research (cf. Dahlstrom et al., 1960, 1972, 1975; Duckworth & Anderson, 1986; Friedman et al., 1989; Gilberstadt & Duker, 1965; Graham, 1977, 1987; Greene, 1980, 1988; Hathaway & Meehl, 1951; Lachar, 1974; Marks & Seeman, 1963; Marks et al., 1974).

One of the present questions is whether the validity research on the original MMPI can be generalized directly to the MMPI-2, but there are little data at this point to address this issue. It probably is safe to conclude that most MMPI research will generalize directly to the MMPI-2, particularly for well defined and commonly occurring codetypes such as 2-7/7-2, 4-9/9-4, and so on, but it is not known which specific details will need to be modified slightly.

BASIC INTERPRETIVE STRATEGY Interpretation of the MMPI-2 is based on codetypes (i.e., the two highest clinical scales elevated to a T score of 65 or higher). This interpretation is then supplemented by the examination of specific scales, such as the content scales as well as individual critical items. Graham (1990) and Greene (1991) provided examples of the general interpretive strategy for

the MMPI-2. Computer interpretations of the MMPI-2, which can provide the clinician with the general dimensions of the patient’s psychopathology, are available from several commercial sources (Butcher,

1989; Caldwell,

1991; Greene

& Brown,

1990).

Butcher

(1987) provided

overview of the issues that arise in computerized psychological assessment.

an

At present, there is relatively limited interpretive material that is specific to the MMPI-2

(Butcher, 1990; Butcher et al., 1989; Butcher, Graham, Williams, & Ben-Porath,

1989;

7 Graham,

1990; Greene,

1991; Keller & Butcher,

MMPI-2

139

1991). Most standard references on the

MMPI (cf. Dahlstrom et al., 1960, 1972, 1975; Duckworth & Anderson, 1986; Friedman et al., 1989; Gilberstadt & Duker, 1965; Graham, 1977, 1987; Greene, 1980, 1988; Hathaway & Meehl, 1951; Lachar, 1974; Marks & Seeman, 1963; Marks et al., 1974) provided valuable information that can be used in the interpretation of the MMPI-2, if it can be assumed that MMPI clinical and research data will generalize to the MMPI-2, which overall is probably a safe assumption for well-defined and commonly occurring codetypes.

Use of the MMPI-2 for Treatment Planning GENERAL ISSUES The MMPI-2, as an overall measure of psychopathology, identifies the patient’s specific psychopathology that can be used to plan treatment interventions and techniques. Thus, if the MMPI-2 indicates that the patient is depressed, treatment planning will be different than if the patient has a number of physical symptoms and is not depressed, or the patient has numerous bizarre symptoms. This level of analysis of the MMPI-2 is not explored here, because it is self-evident to most clinicians.

RESEARCH APPLICATIONS AND FINDINGS A number of specific studies that have examined the relationship between the original MMPI and treatment planning or outcome are reviewed next. This research, by necessity, has focused almost exclusively on the MMPI, because the MMPI-2 has not been available long enough to generate such data. Dahlstrom et al. (1975) devoted an entire chapter to the MMPI and treatment evaluation that covers the literature through 1973. That chapter should be consulted by the interested reader. Butcher (1989) provided a recent summary

of this re-

search. In summarizing this MMPI literature, which is very prolific, no attempts have been made to review all studies within a given area. Instead, major review articles and MMPI texts were used in developing this overview. The need for research on the MMPI-2 to facilitate the generalization of the data from the MMPI should be evident. Virtually any MMPI study is worthy of replication on the

MMPI-2.

CLINICAL APPLICATIONS The validity, clinical, and content scales of the MMPI-2

are examined in turn as they affect

treatment planning. Rather than review all of the specific statements that could be made within each set of scales, only summary statements about major issues are made. The interested reader can refer to the various MMPI-2 sources described previously for more specific statements about treatment planning, with a given scale or set of scales, and the research that validates it. The issues for treatment planning that relate to the assessment of the validity of a specific

administration of the MMPI-2 are outlined in Table 7.1. The actual scales/indexes used by

140

GREENE AND CLOPTON TABLE 7.1 Use of MMPI-2 Validity Scales/Indexes for Treatment Planning

Scale

L Scale (T > 64)

Potential Issues

Patients are likely to be naive, psychologically unsophisticated, defensive, and controlled. In an inpatient setting, patients who have a Within-Normal Limit profile (all clinical scales below a T score of 65) are likely to be psychotic or seriously emotionally disturbed.

L Scale (T < 50)

No specific interpretations can be made.

F Scale (T > 80)

Patients are experiencing rather severe psychopathology, which should be readily apparent, if they are not overreporting psychopathology. It may be necessary to lower their level of distress before making any specific treatment interventions.

F Scale (T < 50)

Patients ae not reporting and/or experiencing any form of discomfort or psychological distress. They probably are underreporting.

K Scale (T > 60)

Patients are very defensive and guarded. They are reluctant to acknowledge that they have any psychological problems. They are resistant to any type of treatment intervention.

K Scale (T < 40)

Patients see themselves as having few resources for coping with their problems and are fearful of being overwhelmed by them. Supportive interventions are needed initially.

the clinician to identify consistency of item endorsement, for example, are not specified, because consistency can be assessed in a number of ways (F scale, Variable Response Inconsistency Scale [VRIN], the difference between F and F,,and so on), and the focus in

Table 7.1 is on'the implications of accuracy of item endorsement for treatment planning. Even a brief perusal of Table 7.1 should reveal that the specific issues within a given section have a number of important implications for treatment planning. For example, if a patient is unwilling to comply with completing the MMPI-2 in a consistent and accurate manner, there is little reason to anticipate that the patient will be more compliant with other tasks in treatment. Consequently, clinicians should address such issues directly with the patient, rather than assume that the reasons for giving the MMPI-2 are no longer important. If the clinician believes that the information to be provided by the MMPI-2 warrants administration of the test, the patient’s noncompliance should not be overlooked or dismissed lightly. (There are some aspects of the medical model that clinicians might keep in mind, because physicians rarely omit collecting a blood sample or a urine specimen simply because patients do not want to provide them.) Patients who are able and willing to endorse the items consistently should be compliant with other treatment requests, unless other validity indicators are questionable. However,

patients who are unable or unwilling to endorse the items consistently are unlikely to be compliant with other treatment requests, which indicates the necessity of addressing this issue directly with them. It is important to determine the reason that patients have endorsed the items inconsistently. For example, if patients have limited intellectual ability or educational opportunity, taped administration should be used to avoid these problems. These factors also will have to be considered in planning treatment. If patients are too toxic or

7

MMPI-2

neuropsychologically or psychiatrically impaired to complete the MMPI-2 consistently, further assessment or any educational or psychotherapy interventions will need to be delayed until they can function appropriately. There are a number of general issues that arise based on the accuracy of item endorsement. Patients who are willing and able to provide an accurate self-description have good insight into their behavior and can share it openly, which augurs well for any therapeutic intervention. Content-based interpretations of the MMPI-2 should reflect accurately the patients’ current psychological status in these circumstances. Patients who overreport psychopathology are high risk to terminate treatment prematurely, despite the seemingly pervasive psychopathology that is being reported; or they may have some reason for exaggerating the severity of their psychopathology, which will interfere with their ability to participate in treatment in a meaningful manner. Patients who underreport psychopathology have little internal motivation for treatment because they are not experiencing emotional distress. These patients’ psychopathology and deviant behaviors are very chronic and egosyntonic (not distressing to them), which is reflected by the lack of elevation in the MMPI-2 profile, and consequently their behaviors are very difficult to change in short-term treatment, if they can be changed at all. Also, if the persons taking the MMPI-2 are not the identified patient they are not bothered by the presence of psychopathology in significant others, which does not bode well for any intervention or treatment. The use of the clinical scales of the MMPI-2 in treatment planning is outlined in Table 7.2. Probably the most important caveat is that clinicians should note carefully clinical scales that are not elevated, because the primary emphasis in MMPI-2 interpretation is on elevated scales. Clinicians are unlikely to overlook a T score of 85 on Scale 2 (Depression). However, clinicians frequently conclude that a T score of 40 on Scale 2 is in the normal range without considering the implications of such a score. It is unusual for patients who are requesting treatment to have such low scores on Scales 2 and 7 (Psychasthenia), and the clinician would

need to inquire as to the patients’ motivation for treatment. Similarly, patients who have experienced a significant loss recently should not have low scores on Scales 2 and 7. Elevation of the clinical scales indicates that patients are distressed over the existence of behaviors and/or symptoms of psychopathology, not whether patients actually have psychopathology. That is, patients with chronic and/or egosyntonic behaviors and symptomatology may not elevate any of the clinical scales above a T score of 64, which makes it difficult to

distinguish between a normal individual and a severely disturbed patient on the MMPI-2 without access to additional information. Scales 5 (Masculinity—Femininity) and 0 (Social Introversion) moderate how patients

express the psychopathology that is being tapped by a specific clinical scale. For example, a patient with a T score of 80 on Scale 4 (Psychopathic Deviate), who also elevates Scales 5 and 0 above T scores of 64 is very unlikely to act out, compared with a patient who has T

scores below 40 on these scales. . The emphasis on the specific clinical scales that are elevated can lead clinicians to ignore low-point scales or scales within the normal range. For example, patients who have T scores at or below 50 on Scales 2 (Depression) and 7 (Psychasthenia) are not reporting or experiencing any distress over whatever behaviors/symptoms brought them to treatment. Similarly, patients who have low scores on Scales / (Hypochondriasis), 2 (Depression), and 3 (Hysteria) have few psychological defenses preventing their behaviors/symptoms from being expressed overtly. The implications of these low scores for treatment planning should be

apparent. Scales 5 (Masculinity—Femininity) and 0 (Social Introversion) moderate how patients will

141

142

GREENE AND CLOPTON TABLE 7.2 Use of the MMPI-2 Clinical Scales in Treatment Planning

Potential Issues

Scale

1 (HS) > 64

Patients focus on vague physical ailments. They are very resistant to considering that they might have psychological problems. They are pessimistic about being helped. They are argumentative with staff. Treatment needs to reassure them that their ailments will not be ignored. Conservative interventions should be used whenever possible.

1 (HS) 64

Patients are experiencing distress and likely to be depressed. Suicidal ideation and history should be evaluated carefully. Their depressive mood should be readily apparent. It is important to determine whether internal or external factors are producing the negative mood state, and to plan treatment accordingly.

2 (D) 64

Patients are naive, suggestible, and lack insight into their own and others’ behavior. They deny any type of psychological problems. Under stress, specific physical ailments are seen. They look for simplistic, concrete solutions to their problems. Treatment should focus on short-term goals, because there is limited motivation. They initially are enthusiastic about treatment, then later resist treatment or fail to cooperate.

3 (Hy) 64

Patients are in conflict either with family members and/or persons in positions of authority. They may make a good initial impression, but more long-term contact reveals that they are egocentric and have little concern for others. Any treatment should emphasize short-term goals, with emphasis on behavior change rather than their verbalized intent to change no matter how sincere it may sound. Low scores on Scales 2 (Depression) and 7 (Psychasthenia) make elevations on Scale 4 particularly pathognomonic.

4 (Pd) 64

Patients do not identify with their traditional gender roles. Male patients are passive and reflective, whereas female patients are outgoing, assertive, and frequently aggressive. These personality traits modify how they express their psychopathology.

5 (Mf)

Patients identify strongly with their traditional gender role. Male patients are active and outgoing, whereas female patients are passive and dependent.

< 40

6 (Pa) > 64

Patients are suspicious,

hostile, and overly sensitive, which is

readily apparent to everyone. Any treatment is problematic, because of the difficulty in developing a therapeutic relationship based on trust. Any intervention must be instituted slowly. reer

eee

ee

(Continued)

7

MMPI-2

TABLE 7.2 (Continued) ——

Scale

Potential Issues

a 6 (Pa)

ee 64

Patients are worried, tense, and indecisive, which is readily apparent to everyone. Ruminative and obsessive behaviors may be seen. It may be necessary to lower their level of anxiety before implementing treatment of other symptoms.

7 (Pt) < 45

Patients are secure and comfortable with themselves, which augurs poorly for any type of intervention in a clinical setting.

8 (Sc) > 64

¢

Patients feel alienated and remote from the environment and others. At higher elevations (>79), difficulties in logic and judgment may become evident. Interventions should be directive and supportive. Psychotropic medications may be needed.

8 (Sc) < 45

Patients are conventional, concrete, and unimaginative. Any intervention should be behavioral, directive, and focus on shortterm goals.

9 (Ma)

> 64

Patients are overactive, impulsive, emotionally labile, and euphoric, with occasional outbursts of anger. They may need to be evaluated for a manic mood disorder. Short-term behavioral goals should be pursued.

9 (Ma)

< 45

Patients have low energy and activity levels. They may have a serious depressive disorder that should be evaluated carefully. Suicide potential should be reviewed, particularly as they start to feel better.

O (Sc) > 64

Patients are introverted, shy, and socially insecure.

They

withdraw from and avoid significant others, which exacerbate their distress. Interventions need to address specifically their tendency to withdraw and avoid others. O (Sc) < 45

Patients are extraverted, gregarious, and socially poised. They may have difficulty forming intimate relationships with others at very low scores (T < 35). They are unlikely to have a thought disorder. The probability of acting out is increased. Group therapies are particularly useful with these patients.

express the psychopathology that is being tapped by a specific clinical scale. Male patients

who elevate Scales 5 and 0 above T scores of 64 will be passive, introverted, and shy away from social interactions, which decrease the probability of their acting out and increase the probability of their obsessing, ruminating, and fantasizing. Conversely, male patients who have T scores below 40 on Scales 5 and O will be active, outgoing, and extraverted, which

increase the probability of their acting out and decrease the probability of their obsessing, ruminating, and fantasizing. (These same statements will hold for women patients if their T score on Scale 5 is the opposite of what has been indicated for men.) For example, the

treatment plan for a patient with a T score above 64 on Scale 0 should encourage the patient to interact with friends and small groups of acquaintances, and to avoid isolating and withdrawing from others. Group treatment may be particularly helpful in such patients if they are supported through the initial stages of becoming comfortable with others. The clinician should look for consistency among the clinical scales that are elevated in

deciding the importance of particular areas in treatment planning. If Scale / (Hypo-

143

144

GREENE AND CLOPTON

chondriasis) is elevated, and it is to be interpreted as reflecting the presence of somatization, other clinical scales (3 [Hysteria] or 7 [Psychasthenia]) or content scales (HEA [Health Concerns]) suggestive of somatization should be elevated. If such concordance is not found among somatization scales, some other interpretation of Scale / that is consistent with the other elevations or lack thereof must be considered. The more concordance that is found among scales that have the same correlates and/or scale content, the more treatment planning should emphasize these particular areas. The specific uses of the content scales of the MMPI-2 in treatment planning are outlined in Table 3. There are several caveats, however, when interpreting the content scales. First, the clinician must administer all 567 items so that these scales can be scored. Clinicians are well advised to use these scales routinely, because they provide valuable information about patients with little additional time required for administration. Second, it is mandatory that patients be able and willing to provide an accurate self-description, because the content scales are very susceptible to overreporting or underreporting of psychopathology, due to the face valid or obvious nature of these scales’ items. The implications of these two response styles for treatment planning were described earlier. Third, elevation of the content scales reflects that patients are aware of and willing to report the behaviors that are being assessed by the specific scale. When patients have insight into their behavior and are willing to report it accurately, these scales provide a quick overview of how patients are viewing and responding to their current circumstances. Finally, the absence of elevation of the content scales can

reflect that the behaviors are not characteristic of the patient or the patient is unaware of or unwilling to acknowledge these behaviors. When the content scales are not elevated, clinicians should determine which of the these two alternative interpretations is more appropriate. However, clinicians are cautioned about making specific interpretations of low scores on the content scales because no research has validated their correlates. The specific uses of the factor scales (Welsh A [Anxiety]

and R [Repression]) of the

MMPI-2 in treatment planning are outlined in Table 4. Clinicians should score and interpret the factor scales routinely because they provide valuable information for treatment planning. Low scores on A should be interpreted in a similar manner as low scores on Scales 2 (Depression) and 7 (Psychasthenia) and they have the same implication for treatment planning as were described previously. Low scores on both factors A and R are particularly significant, because patients’ psychopathology is well engrained and not distressing to them, which limits motivation for any short-term treatment. Finally, clinicians should check a number of the specific items on the MMPI-2, which are listed in Table 5. Clinicians should be cautious about attaching too much significance to the response to any single MMPI-2 item, because an item can be thought of as a scale with only one item, which obviously has limited psychometric qualities. However, when patients endorse a number of items within a specific area, clinicians would be well advised to review them to determine their implications for treatment planning. The items relating to dangerous to self (150, 303, 505, 506, 520, 524, and 546) or others (150, 540, 542, 548) must be examined everytime the MMPI-2 is administered, because these areas are an integral part of any treatment plan. Clinicians should check the patient’s answer sheet to determine the responses to these specific items and decide whether they are worthy of being pursued via an

interview. Clinicians also need to document that they have reviewed the patient’s responses to

these items because they could be integral to any litigation that might arise. Clinicians frequently are confronted with MMPI-2s in which the interpretive information for a given scale is or may seem to be contradictory to the information provided by another scale. There are several procedures that can be followed to resolve such inconsistencies.

Probably the best method for resolving such discrepancies involves exploring the issue with

7 MMPI-2 TABLE 7.3 Use of the MMPI-2 Content Scales in Treatment Planning

Scale

Potential Issues

ANX (Anxiety) > 64

Patients report general symptoms of anxiety, nervousness, worries, and sleep and concentration difficulties. Depending on the level of anxiety, psychotropic medication or other anxiety-reducing techniques may be needed before implementing other interventions.

FRS

(Fears) > 64

Patients report a large number of specific fears. These specific fears respond well to systematic desensitization, if they are not part of a larger set of fear and anxiety symptoms.

OBS

(Obsessivenss) > 64

Patients have great difficulty making decisions, ruminate excessively, worry excessively, and have intrusive thoughts. They are gaod candidates for most insight-oriented therapies.

DEP (Depression) > 64

Patients have depressive moods and thoughts. They feel blue and unhappy, and are likely to brood. Suicide potential should be evaluated. Their depression has an angry component that involves blaming others particularly when DEP is higher (+10 T points) than Scale 2 (Depression).

HEA

(Health Concerns) > 64

Patients report specific physical symptoms across several body symptoms, as well as general physical ailments such as nausea, vomiting, and pain. Their physical symptoms may be another manifestation of their emotional distress. They need to be reassured that their symptoms are being taken seriously.

BiZ

(Bizarre Mentation) > 64

Patients report strange thoughts and experiences, paranoid ideation, and hallucinations. In short, they report psychotic thought processes. Psychotropic medications may be indicated, as well as hospitalization.

ANG (Anger) > 64

Patients report being irritable, grouchy, impatient, hotheaded, annoyed, and stubborn. Assertiveness training and/or anger control techniques should be implemented as part of treatment.

CYN (Cynicism) > 64

Patients expect other people to lie, cheat, and steal, and if they do not engage in these behaviors, it is because they fear being caught. Establishing a trusting relationship is imperative if any progress is to be made in therapy.

ASP (Antisocial Practices) > 64

Patients report stealing things, other problem behaviors, and antisocial practices during their school years. They have attitudes similar to individuals who break the law, even if not actually engaging in antisocial behavior. It is important to determine whether these behaviors are still being displayed. Group interventions with similar patients are most productive.

TPA

Patients are hard-driving, fast-moving, and work-oriented individuals, who frequently become impatient, grouchy, irritable, and annoyed. The possibility of a manic mood disorder should be considered.

(Type A) > 64

LSE (Low Self-Esteem) > 64

Patients have very low opinions of themselves, and they are uncomfortable if people say nice things about them. Interventions need to be very supportive and allow ample time for change.

SOD

Patients are very uneasy around others and are happier by themselves. They see themselves as shy. They need to be supported and encouraged to participate in treatment until they are comfortable interacting with others.

(Social Discomfort ) > 64

(Continued)

145

146

GREENE AND CLOPTON TABLE 7.3 (Continued)

Potential Issues

Scale

FAM (Family Problems) > 64

WRK

Work Interference) > 64

TART (Negative Treatment Indicators) > 64

Patients report considerable familial discord. Their families are reported to lack love, support, and companionship, and these patients want to leave home. Involvement of the family system in treatment may be important unless the patient needs to be emancipated from them. Patients report that they are not as able to work as they once were and that they work under a great deal of tension. They are tired, lack energy, and sick of what they have to do. It is important to determine specifically whether the reported symptoms and behaviors actually interfere with their work, because it is primarily a measure of general distress. Patients dislike going to doctors and believe they should not discuss their personal problems with others. They prefer to take drugs or medicine, because talking about probiems does not help them. Patients with depressive mood disorders will elevate TAT because it is primarily a measure of general distress, so clinicians need to be cautious about interpreting

this scale in a characterologic manner.

TABLE 7.4 Use of the MMPI-2 Factor Scales in Treatment Planning

Scale

A (Anxiety) > 69 and R (Repression) > 59

Potential Issues

Patients are reporting general distress and maladjustment, which may be arising either from internal or external sources. They are aware that they are distressed, and they are trying to control its overt expression. They are motivated for most types of psychological intervention.

A (Anxiety) > 69 and R (Repression) < 40

Patients are reporting general distress and maladjustment. However, they are not particularly concerned about these problems and are likely to attribute them to causes outside themselves. Once the immediate distress has passed, these patients have little motivation for treatment. Consequently, treatment should focus on short-term goals.

A (Anxiety) < 50 and R (Repression) > 59

Patients are not reporting general distress, and are confident in their own abilities. They are denying and repressing any awareness that they might have problems, and are reluctant to examine their own behavior. Short-term, behaviorally oriented interventions are indicated.

A (Anxiety) < 50 and R (Repression) < 40

Patients are not reporting general distress, and see themselves as being confident in their own abilities. Ina clinical setting, they have little awareness that they have any problems that need to be repressed and denied. They have very chronic, egosyntonic behaviors, which makes any type of treatment intervention difficult.

7

MMPI-2

TABLE 7.5 Use of MMPI-2 Items in Treatment Planning eee

Content Area

Item Numbers

Anger

37(T), 134(T), 150(T), 372(F), 389(T), 478(T), 513(T), 540(T), 542(T), 548(T)

Depression

38(T), 56(T). 65(T), 95(F), 143(F), 234(T), 273(T), 388(F), 450(T), 463(T), 526(T)

Family Problems

21(T), 83(F), 379(T), 455(F), 478(T)

Hopelessness

22(T), 71(T), 75(F), 92(T), 130(T), 306(T), 454(T), 516(T), 539(T), 554(T)

Poor Impulse Control

23(T), 85(T), 240(T), 266(F), 530(T), 564(F)

Paranoia

99(T), 138(T), 144(T), 162(T), 216(T), 228(T), 259(T), 314(F), 333(T), 424(T)

Physical Ailments

18(T), 36(T), 40(T), 47(F), 117(F), 142(F), 295(F)

Psychoticism

24(T), 60(T), 72(T), 96(T), 198(T), 298(T), 319(T), 336(T), 355(T), 361(T), 551(T)

Sexuality

12(F), 34(F), 121(F), 268(T), 371(T), 470(T)

Sleep Disturbance

3(F), 39(T)

Substance

264(T), 387(T), 429(F), 487(T), 489(T), 511(T), 527(T), 544(T)

Suicidality

Abuse

150(T), 303(T), 505(T), 506(T), 520(T), 524(T), 546(T)

Note. Clinicians need to consult the MMPI-2 booklet for the actual content of the indicated items. Clinicians also must realize that patients’ responses to all of these items are not reproduced in any critical item listings. Consequently, clinicians need to check the patients’ answer sheets to determine the responses to these items.

the patient directly. If the patient is not available for some reason, and the patient has endorsed the items accurately, the MMPI-2 content scales and the specific items that are endorsed should provide a quick means of resolving any discrepancies that may exist with the

empirically derived clinical scales, which have a number of correlates. Clinicians should realize that most, if not all, MMPI-2s will have some minor discrepancies among scales; hence, they should not expect perfect concordance.

a few

If the MMPI-2 is to be used in repeated administrations to monitor change in the patient across treatment, clinicians should realize that a number of the scales (J [Hypochondriasis], 4 [Psychopathic Deviate], 8 [Schizophrenia], 0 [Social Introversion], and MAC-R (Mac-

Andrew Alcoholism Scale—Revised) and items are designed to assess characterologic quali-

ties and past behaviors that they will not change over time. Other scales (2 [Depression], 7 [Psychasthenia], and A [Anxiety]) are more reactive and would be expected to reflect the patients’ changes. Consequently, clinicians should not expect to have consistent changes across all of the scales. Also, it should be remembered that the MMPI-2 is designed to be an initial screening instrument to assess the types of psychopathology that are being manifested

in a particular patient, and the norms reflect the typical defensiveness that is to be expected on initial screening. If the MMPI-2 were going to be used as a dependent variable to assess

148

GREENE AND CLOPTON

the changes across the course of treatment, clinicians should examine the changes that the

patient makes on those scales that are sensitive to the patient’s status, rather than to reference

them to the standard profile. That is, it probably is more accurate to say that the patient’s score on Scale 2 decreased 24 T points across treatment than to say that the patient’s T score of 62 is now within the normal range at the end of treatment. These caveats about using the MMPI-2 to monitor change across treatment do not apply to those circumstances in which the MMPI-2 is used as an independent variable. Clinicians should find it very profitable to determine what codetypes and patterns of MMPI-2 scales at the initiation of treatment are related to outcome, particularly within very homogeneous subgroups of patients.

USE WITH OTHER EVALUATION DATA It is necessary to supplement the MMPI-2 with other evaluation data, such as a clinical interview, to enhance the accuracy of any clinical predictions that will be made. Because the MMPI was developed long before there was any widespread acceptance of the multitude of personality disorders, the MMPI-2 has limited success in this area. (The development of MMPI Personality Disorder scales [Morey, Waugh, & Blashfield, 1985], which are essentially intact on the MMPI-2, may provide additional information in this area, but to date the research has been too limited to provide much specific direction [cf. Morey & Smith, 1988]). Consequently, it is helpful to supplement the MMPI-2 with an instrument such as the Millon Clinical Multiaxial Inventory-II (MCMI-II: Millon, 1987), which is designed specifically to assess personality disorders, although there has been substantial debate over how well

Millon’s charactactization of personality disorders fits the DSM-III classification system (McCann, 1991; Widiger & Sanderson, 1987). Because the MCMI-II does not identify Axis I disorders as well as the MMPI-2,

the routine use of both instruments would seem to be

indicated anytime there is reason to suspect that the patient may have both Axes I and II disorders. The MMPI-2 also has difficulty identifying patients who have “well-intact” psychotic or characterologic processes. In these cases, a Rorschach or some other projective technique can provide useful information on the intactness of the patient’s cognitive processes. Finally, it seems advisable that clinicians should have some estimate of the patient’s level of intellectual functioning, because there is a substantial line of research that has indicated that the correlates of specific MMPI codetypes or scales may change particularly when trying to predict violent or acting-out behavior.

PROVISION OF FEEDBACK REGARDING ASSESSMENT FINDINGS Patients should be provided with the results of their MMPI-2 routinely so that they under-

stand how it is being used in planning treatment. Sharing information with patients ensures

that they will take the MMPI-2 appropriately, without distorting their responses, and makes them meaningful participants in the treatment process. When patients have insight into their behavior and are willing to report it accurately, the MMPI-2 content scales provide a summary of how patients are viewing and responding to their current circumstances, which can be shared directly with them. It probably is better not to share the standard profile for the basic

validity and clinical scales with patients, because of the attributions that they may make to

7 MMPI-2

149

the scale names. Lewak, Marks, and Nelson (1990) have devoted an entire book to providing feedback to patients that should be consulted by the interested reader.

LIMITATIONS/POTENTIAL PROBLEMS IN USE In one sense, the greatest limitation of the MMPI-2 has been its success, which has created an impression that the instrument can be used in any setting to evaluate any type of problem. Frequently, clinicians’ expectations of the MMPI-2 far exceed reality for any psychometric instrument. The importance of ensuring that patients have sufficient intellectual ability and reading skills to complete the MMPI-2 appropriately cannot be overemphasized. One of the primary causes of invalid MMPI-2s is the inability to read and comprehend the items, which require an eighth-grade reading level. Standard, cassette-tape administrations of the MMPI-2 should be used any time there is reason to suspect that the patient’s intellectual or reading ability may be inadequate for standard administration in a paper-and-pencil format.

Use of the MMPI-2 for Treatment Outcome Assessment GENERAL ISSUES The MMPI-2 has been used less frequently to assess treatment outcome because the instrument is used primarily to provide an initial assessment for treatment planning. There have been two common themes in the use of the original MMPI to assess treatment outcome. The most frequent research examines the relationship between MMPI codetypes or scales assessed at the onset of treatment with whatever outcome measure is being used (i.e., the MMPI has been used as an independent variable). There is a smaller group of studies that has

used the MMPI as the dependent variable, and examined the changes that occurred in MMPI scales as a result of treatment. In these latter studies, the sensitivity of the MMPI to changes

in the patients’ status may be limited because the items frequently are worded in the past tense and ask about past, rather than current, behaviors (Scapinello & Blanchard, 1987). For

example, patients would not be expected to change their response to the item, “I have used alcohol excessively,” regardless of how effective their alcohol treatment had been. It is interesting to speculate that the finding that the MacAndrew Alcoholism scale (MAC: MacAndrew,

1965) does not change as a result of treatment (Gallucci, Kay, & Thornby,

1989;

Huber & Danahy, 1975; Rohan, Tatro, & Rotman, 1969) may reflect that its items are written

predominantly in the past tense and ask about past behaviors. Research to evaluate this hypothesis seems warranted.

EVALUATION AGAINST CRITERIA FOR OUTCOME MEASURES The MMPI-2 clearly meets most of the ideal criteria for outcome measures as indicated by Ciarlo, Brown, Edwards, Kiresuk, and Newman (1986), each of which is examined briefly in turn.

150

GREENE AND CLOPTON

The MMPI-2 is relevant and appropriate for assessing treatment outcomes in patient samples where psychopathology is being evaluated, particularly if the emphasis is being placed on DSM-III-R Axis I disorders. The methodology for administering, scoring, and interpreting the MMPI-2 is straightforward and easily implemented across treatment settings. MMPI-2 scores on the various scales and codetypes have clear and objective referents that are consistent across clients. The MMPI-2 is not constructed so that clinicians and/or significant others can have their perspectives of the patient directly measured. However, clinicians and/or significant others can report the patient’s anticipated score on the various scales, such as depression, anxiety, and so on, to obtain such information.

The MMPI-2 has adequate to good psychometric characteristics, and it is particularly sensitive to any attempt by the patient to distort responses to items. The MMPI-2 is relatively inexpensive with the cost primarily dependent on the degree of computer-based assistance desired in administration, scoring, and interpretation. The long history of usage of the MMPI, which will be extended with the MMPI-2, makes it easily understandable by most clinicians. The patient’s MMPI-2 profile can be plotted quickly and provides an easy basis for providing feedback to the patient, other clinicians, and significant others. The MMPI-2 content scales are particularly good for direct feedback, because they provide a description of how the patient reports his or her psychopathology. The MMPI-2 is very useful in making clinical diagnoses, assessments, and treatment recommendations of a broad range of patients. Because the MMPI was developed in an empirical manner, the MMPI-2 and MMPI scales are compatible with a wide range of theories of psychopathology and the goals and procedures of various treatment approaches. In many respects, the MMPI-2 will be the standard against which other tests are evaluated in meeting these criteria.

RESEARCH APPLICATIONS AND FINDINGS Before using the MMPI-2 to predict treatment outcome, there are several conclusions that are apparent based on the research with the original MMPI in this area. First, the original MMPI is not related to treatment outcome in any setting when the patients are examined as a single heterogeneous group. Researchers frequently assume that there is a typical patient within a given diagnostic group or setting, and seem not to consider that there may be an interaction between type of patient and the outcome of treatment. Second, background and demographic variables contribute more variance than personality variables, such as the original MMPI, when

they are examined

within the same

study (Hoffmann

& Jansen,

1973; Nathan

&

Skinstad, 1987), so it is important not to attribute too much significance to those studies that only examined MMPI variables. Third, the original MMPI may be related to treatment outcome when specific subgroups are identified within a particular diagnostic group, but these findings are replicated inconsistently across studies. Finally, a number of these studies has used cluster analyses of the original MMPI data seemingly with little awareness of the multitude of problems that exist with these sets of procedures (cf. Blashfield,

1980).

The original MMPI research findings are summarized within two primary diagnostic groups (alcohol/drug/substance abuse and chronic pain), because they encompass most of the systematic data, and the results are germane to a number of different clinical groups. There are several reviews of this literature that should be consulted by the interested reader. Graham and Strenger (1988) and Greene and Garvin (1988) reviewed the MMPI research in alcoholism, and Stark (1992) reviewed the entire literature on attrition from substance abuse

treatment.

Nathan and Skinstad (1987) reviewed the problems of assessing outcomes of

7

MMPI-2

treatment in alcoholics, which should be read by anyone who is interested in doing research on this topic. Love and Peck (1987), Snyder (1990), and Keller and Butcher (1991) reviewed the MMPI research in chronic pain. Keller and Butcher (1991) also provided specific MMPI-2 data on a large sample of chronic pain patients that should be read by clinicians working in this area. Several studies found that alcoholics and drug addicts who have codetypes involving Scales 4 (Psychopathic Deviate) and 9 (Hypomania) are more likely to drop out of treatment or to have poorer outcomes than alcoholics with other codetypes (Huber & Danahy, 1975; Pekarik, Jones, & Blodgett, 1986; Pettinati, Sugerman, & Maurer, 1982; Rounsaville, Dolinsky, Babor, & Meyer, 1987; Sheppard, Smith, & Rosenbaum, 1988). However, numerous studies have not been able to replicate these findings in alcoholics (Filstead, Drachman, Rossi, & Getsinger, 1983; McWilliams & Brown, 1977; Wilkinson, Prado, Williams, & Schnadt, 1971) or drug addicts (Craig, 1984). Other studies have reported that alcoholics,

who are characterized by denial and minimalization on the MMPI, are more prone to drop out of treatment (Hoffmann

& Jansen,

1973; Mozdzierz,

Macchitelli, & Conway,

1973),

whereas others have not been able to replicate these results (Krasnoff, 1977). Finally, a number of investigators have reported that alcoholics who have the highest profile elevations on the MMPI are more likely to drop out of treatment or have poorer outcomes (Albott, 1982; Pettinati et al., 1982; Zuckerman,

Sola, Masterson, & Angelone,

1975). It is not clear in

these profiles whether the higher elevations reflected the presence of more psychopathology and/or the overreporting of psychopathology. It is important to delineate which of these alternative explanations is accurate, because they would have different implications for

treatment. One group of investigators (Hoffmann, Loper, & Loper, 1973; Loper, Kammeier, & Hoffmann,

& Kammeier, 1974; Kammeier, Hoffmann, 1973) examined the MMPI scores of male

college students, in whom an average of 13 years elapsed between college admission and entrance into an alcoholism treatment program. These investigators compared the alcoholics’ MAC scale scores upon admission to college and at entrance into treatment with the scores of a control group of students who were admitted to college at the same time. The alcoholics had higher MAC scale scores both at college admission and at entrance into treatment than the control group. Using a cutting score of 26, the MAC scale correctly classified 72% of the alcoholic sample both at college admission and at entrance into treatment. The consistency of classification by the MAC scale across such an extensive time interval suggests that the MAC scale is tapping a dimension of behavior that is resistant to change. This conclusion also is supported by the finding that MAC scores in alcoholics remain elevated after treatment (Gallucci et al., 1989; Huber & Danahy, 1975; Rohan et al., 1969) as was noted earlier. However, Allen (1991) suggested that patients who have high versus low scores on the MAC may need different types of treatment. The research on chronic pain patients and treatment outcome is very similar to that cited previously on substance abuse. Various investigators have reported that /-3/3-1 codetypes and their variants (Herron & Pheasant, 1982; Long, 1981; McCreary, Turner, & Dawson, 1977; Sternbach, Wolf, Murphy, & Akeson, 1973), or more elevated profiles in general (Costello, Hulsey, Schoenfeld, & Ramamurthy, 1987; Naliboff, McCreary, McArthur, Cohen, & Gottlieb, 1988), have poorer outcomes, whereas others have been unable to replicate these findings (Guck, Meilman, Skultety, & Poloni, 1988; McArthur, Cohen, Gottlieb, Naliboff, & Schandler, 1987; Moore, Armentrout, Parker, & Kiviahan, 1986). Still

other investigators have reported that pain patients with normal-limit profiles (no clinical scale at or above a T score of 70 on the MMPI) have better outcomes (Costello et al., 1987; Long, 1981; Strassberg, Reimherr, Ward, Russell, & Cole, 1981).

LOT

152

GREENE AND CLOPTON Some researchers have investigated the ability of the MMPI to predict treatment outcome in patients with sleep disturbances (Edinger, Stout, & Hoelscher, 1988; Klonoff, Fleetham, Taylor, & Clark, 1987), headaches (Onorato & Tsushima, 1983; Williams, Thompson, Haber, & Raczynski, 1986), and anorexia nervosa (Edwin, Andersen, & Rosell, 1988). No

clear pattern of results was found within or across these various groups of patients.

CLINICAL APPLICATIONS In substance abuse settings, clinicians should be aware that patients who display psychopathic tendencies or who are characterized by denial and minimalization may be more prone to drop out of treatment and they should confront these issues directly. High scorers on the MAC are more likely to be risk takers who are extraverted and impulsive, and they may have better treatment outcomes in a group-oriented and confrontational program, whereas low scorers are more likely to be risk avoiders who are introverted, withdrawn, and depressed,

and they may have better treatment outcomes in a less confrontational and more supportive

program. It appears that patients who have the most elevated MMPI-2 profiles are likely to have poorer outcomes, regardless of the setting. It is important to assess whether these elevated profiles are reflecting more severe psychopathology or the overreporting of psychopathology, and then plan the course of treatment accordingly.

USE WITH OTHER EVALUATION DATA As was noted previously, the better predictors of treatment outcome tend to be background and demographic variables, such as social support systems, employment status, and so on. Thus, it is important to consider the role of these variables in conjunction with the MMPI-2 when assessing the outcome of treatment. It would be particularly important to see what additional variance is accounted-for by the MMPI-2 in such assessments.

PROVISION OF FEEDBACK REGARDING ASSESSMENT FINDINGS Because the comments about feedback of the assessments of outcome are the same as those about planning treatment, they are not be repeated here. It is important to inoculate patients against specific negative outcomes, such as a high probability of dropping out, so that they aware of these issues prior to their occurrence.

LIMITATIONS/POTENTIAL PROBLEMS IN USE The primary problem in using the MMPI-2 in assessing outcome of treatment is that background and demographic variables tend to be better predictors. Consequently, clinicians need to be cautious about relying too heavily on the MMPI-2 and not giving adequate weight to

such variables.

7 MMPI-2

153

CASE STUDY The patient is a 20-year-old white woman who has a 5-year history of insulin-dependent diabetes. She was referred for a psychiatric evaluation of depression by her family physician. She separated from her husband about 6 months earlier and moved across country to attend nursing school. She reported a depressed mood related to her separation and move, with crying spells and decreased energy when home alone. She feels lonely since she moved and she misses her husband, although neither of them has instituted any attempt toward reconciliation. She had some sleep problems while she was working a night shift as a nurse’s aide. These sleep problems abated once she changed to a daytime shift. She performs well on her job, which she likes. She is well liked by her supervisor and colleagues. She has not lost any weight. She did not report suicidal ideation in the initial assessment with the psychiatry resident, whose diagnostic impression was Adjustment Disorder with Depressed Mood. Figures 7.1—7.3 provide the standard validity and clinical scales, supplementary scales, and content scales for this patient, respectively. The standard profile (Fig. 7.1) is consistent with the history reported earlier. The patient took the MMPI-2 in a consistent (VRIN = 7) and accurate (F = 6; K = 12; Total T-score Difference = 95; Ds = 60T) manner, which indicates

that she is well motivated and likely to be compliant with suggested treatment plans. The 2-7 codetype, along with the low score on Scale 9 (Hypomania), reflects her depressive mood with little energy. The high score on Scale 0 (Social Introversion) and Si, reflects her rRae ay andLC Mekinley

Name

ta “ltiphasic

Mis. Latt

Address

Riley Mavertory -2™

Lubbock

Occupation Nurse's Aide

Minnesota Multiphasic Personality Inventory-2 Copyright © by THE REGENTS OF THE UNIVERSITY OF MINNESOTA 1942, 1943 (renewed 1970), 1989. This Profile Form 1988. ee are All rights reserved. Distributed exclusively by NATIONA

Age

14

Education

Profile for Basic Scales

Referred py___Dr.

ING

=

MMPI-2

under license from The University of Minnesota.

Code

2



Printed in the United States of America

Torte

Lb

F

K

Hes5K 1

0 2

ty ei

Pas 4K 4

4/18/90 __

Bender

e

"70'1-6348/5:94

L F/K:

oon

-MMPI-2" and “Minnesota Multiphasic Personality Inventory-2” are trademarks owned by The University of Minnesota.

Date Tested_

Marital status Separated

20

Scorer's

wt 5

Pa 6

Pik 7

ScriK 8

Mae 2k 9

Initials

RG

Ss 0 Tor Tc

(ae = FEMALE ©

GPe Se. Serdees Bs

CC OC NO OEY EN TT NT SN UC

Torte

Rawiscore

BL ete

oO

ob

F

K

2) -Glen12

Ha 5K

0

ty

Pais 4K

13

39

25

19

K to be Added_© Raw Score with k19

2 24

ut

Pa

38

13

PIIK

30

16

12 12 42

28

ll

SO

2 13

FIG. 7.1. MMPI-2 Profile for Basic Scales. Reproduced by permission of the University of Minnesota Press.

NATIONAL COMPUTER

sySTe

NCS] 24001

154

GREENE AND CLOPTON Name_Mis, Pate SK

‘ree oho Multighasic ae

Address

Inwerttory-2

2

Education

Profile for Supplementary Scales Minnesota oe

Eee

TPy aa toedewed 1970), 1989. This Profile Form 1989, 199 All rights reserved. Distributed exclusively by NATIONAL COMPUTER SYSTEMS. INC

under license from The University of Minneso

and “Minnesota Multiphasic Personality Inventory-2" are trademarks MMPI-2" The Criv versity of Minnesota. Printed in the United States of America

MAC-R

AAS

APS

MDS.

Date Tested 4/18/90

2O _ Marital Status

Age

14

Separated

__

Referred py_Dr. Bender

Inventory-2

REGENTSOF THE UNIVERSITY Ca MONTES

Copyrightt © by THE

Lubbock

Occupation Nurse's Aide

ES

a

a

Ee

Latt

wd.LC. McKinley

O-H

owned

Do

Scorer's Initials RG

by

Re

nance Social Responsibility College Maladjustment Gender Role-Masculine Gender Role-Feminine Post Traum,-Keane Post Traum,-Schlenger

T

A

R

Es

MAC-R

AAS

APS

MDS

0-H

Do

Re

Mt

GM

GF

PK

Rw 25 23 27 15 2 20 3 13 14 21 33 22 37 2

Siz

Sis

37 14 5

8

PS

Si

Fy

&B PRINTED WITH AGRI-BASE INK 8CD987654321

FIG. 7.2. MMPI-2 Profile for Supplementary Scales. Reproduced University of Minnesota Press.

8

VRIN

7

T/F TRIN

T

NATIONAL COMPUTER SYSTEMS

AS

eas 24004

by permission of the

introversion, shyness, isolation, and tendency to withdraw from others, which only serve to

exacerbate her loneliness and depression. It will be important for her treatment plan to incorporate procedures for getting her involved with others and counteracting her isolation. Her elevated score on the L (Lie) scale suggests that she is not very psychologically minded, which is the only indicator that she will not be a good candidate for insight-oriented psychotherapies that are suggested by her codetype. Because Scale 1 (Hypochondriasis) is not elevated above a T score of 64, she does not emphasize her physical symptoms associated with diabetes, so these issues essentially can be ignored in her treatment. The Supplementary Scales (Fig. 7.2) also fit her clinical picture very well. Her simultaneous elevation of both the first factor (A) and the second factor (R) indicates that she is experiencing general emotional distress and she is trying to control or deal with it to the best

of her abilities. Her score of 15 on the MAC is somewhat lower than would be expected for her codetype (see Greene, 1991), and essentially indicates that she is depressed, introverted, inhibited, and overcontrolled. The potential of misusing the MAC to predict substance abuse in this type of patient should be kept in mind once her responses to the MMPI-2 specific substance abuse items are described next. The other scales (Fz, Mt, PK, and PS)

that are elevated on Fig. 7.2 are also first factor scales that correlate highly (> .80) with A. The Content Scales (Fig. 7.3) are generally consistent with her clinical picture, although there are some notable exceptions. The elevations on ANX (Anxiety), OBS (Obsessionality),

159 rs pate Hhatharsay and

Mekintos

Name

ht atiphoersic

ca

es

eee

Occupation Nurse's Aide

Profile for Content Scales Butcher, Graham, Williams and Ben-Porath (1989) Minnesota

Multiphasic Personality

ij Education

Inventor

Copyright © by THE REGENTS OF THE University OF MINNESOTA 1942. 1943 (renewed 1970), 1989. This Profile Form 1989. All ree) reserved Distributed exclusiva LA NATIONAL

under license from The University of Min

COMPUTER

ANX

FRS

oss

DEP

HEA

Biz

ANG

4

Age

2Q

Date Tested_ 4/18/90 Marital StatusSeparated

Referred By_ Bender Dr. SYSTEMS,

INC

*MMPI-2' and “Minnesota Multiphasic Personality Inventorpee'ZareSaree AD SK pwned The University of Minnesota Printed in the United States T

Mis, Latt

Address Lubbock

Pinas: Mwentory -2

CYN

ASP

4

ae

Scorer's Initials

_RG

by

TPA

se

sop

FAM

WRK

HF

LEGEND ANX

Anxiety

FRS

Fears

OBS

Obsessivencss

DEP

Depression

HEA

Health Concerns

BIZ

Bizarre

Anger

CYN

Cynicism

ASP

Antisocial TypeA Low

SOD

Social Discomfort

WRK TRT

T

ANX

soe I &

FRS

oss

DEP

HEA

BIZ

ANG

23 12 18 14 110

RINTEO WITH AGRI-BASE INK

BCD98765432

CcYN

ASP

TPA

LSE

soD

FAM

wrk

TRT

Practices

TPA LSE

FAM

Raw

Mentation

ANG

Self-Esteem

Family Work

eee Interference

= Negati Wiel

Indicators

OT

7 4 7 15 17 11 24 18

ee? systems” PROFESSIONAL ASSESSMENT SERVICES j 24002

FIG. 7.3. MMPI-2 Profile for Content Scales. Reproduced by permission of the University of Minnesota Press.

and DEP (Depression), and LSE (Low Self-esteem) and SOD (Social Discomfort) are redun-

dant with her scores that were seen on the standard clinical scales and the supplementary scales, and provide a solid indication that she is depressed, worried, guilty, introverted, and

uncomfortable in social situations. Her mild elevation of HEA (Health Concerns), similar to Scale ] (Hypochondriasis), again suggests that she does not emphasize physical symptoms even though she is diabetic. However, the significant elevations on WRK (Work Interference)

and TRT (Negative Treatment Indicators) are directly contradictory to her clinical history and

outcome, which reflects that these two scales are better measures of the first factor of general distress than their intended content (Greene, 1991). The mild elevation of ANG (Anger) also

is somewhat unexpected until it is recalled that this scale has two sets of items: the first set indicates that the person is moody, irritable, and grouchy, whereas the second set indicates that the person physically expresses anger. Clearly, this patient endorsed the first set of items within ANG. The patient endorsed only a small number of specific items outlined in Table 5 that warranted attention. First of all, she endorsed item 524, “No one knows it but I have tried to kill myself.” (True), even though she did not report suicidal ideation or attempts in the psychiatric interview. She also endorsed several items (429, 511) specific to substance abuse,

“Once a week or more I get high or drunk.” (True), “Except by doctor’s orders I never take

drugs or sleeping pills.” (False), which could have been overlooked if the clinician relied solely on the MAC.

Summary and Conclusions The MMPI-2 can provide valuable information for the clinician both in planning treatment and assessing the outcome of treatment. Clinicians need to realize the complexity of the questions they are asking and not expect simple answers. Clinicians must start looking for significant subgroups within the patients in a given setting, rather than assuming that all patients are alike. In addition, the results of the studies of such subgroups must consider how

these groupings compare with those found in other settings and evaluate the role of background and demographic variables in the pattern of scores that are found.

References Albott, W. L. (1982). Dropouts from an inpatient treatment program for alcoholics. Jnternational Journal of the Addictions, 17, 199— 204. Allen, J. P. (1991). Personality correlates of the MacAndrew alcoholism scale: A review of the literature. Psychology of Addictive Behaviors, 5, 59-65. Blashfield, R. K. (1980). Propositions regarding the use of cluster analysis in clinical research. Journal of Consulting and Clinical Psychology, 48, 456-459. Butcher, J. N. (Ed.). (1987). Computerized psychological assessment: A practitioner's guide. New York: Basic. Butcher, J. N. (1989).

Adult clinical system user’s

guide for the MMPI-2. Minneapolis: University of Minnesota Press. Butcher, J. N. (1990). MMPI-2 in psychological treatment. New York: Oxford. Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A. M., & Kaemmer, B. (1989). MMPI-2: Manual for administration and scoring. Minneapolis: University of Minnesota Press.

Butcher, J. N., Graham, J. R., Williams, C. L., & Ben-Porath, Y. (1989). Development and use of the MMPI-2 content scales. Minneapolis: University of Minnesota Press. Caldwell,

A. B.

(1991).

Caldwell Report.

Los

Angeles: Author. Ciarlo, J. A., Brown, T. R., Edwards, D. W., Kiresuk, T. J., & Newman, F. L. (1986). Assessing mental health treatment outcome measurement techniques. DHHS Pub. No. (ADM)86-1301. Washington, DC: U.S. Government Printing Office.

156

Colligan, R. C., Osborne, D., Swenson, W. M., & Offord, K. P. (1983). The MMPI: A contemporary normative study. New York: Praeger. Colligan, R. C., Osborne, D., Swenson, W. M., & Offord, K. P. (1989).

The MMPI:

A con-

temporary normative study of adults (2nd ed.). Odessa, FL: Psychological Assessment Resources. Costello, R. M., Hulsey, T. L., Schoenfeld, L. S., & Ramamurthy, S. (1987). P-A-I-N: A four cluster MMPI typology for chronic pain. Pain, 29, 1-11. Craig, R. J. (1984). Personality dimensions related to premature termination from an inpatient drug abuse treatment program. Journal of Clinical Psychology, 40, 351-355. Dahlstrom, W. G., Lachar, D., & Dahlstrom, L. E. (1986). MMPI patterns of American minorities. Minneapolis: University of Minnesota Press. Dahlstrom, W. G., Welsh, G. S., & Dahlstrom, L. E. (1960). An MMPI handbook: A guide to use in clinical practice and research. Minneapolis: University of Minnesota Press. Dahlstrom, W. G., Welsh, G. S., & Dahlstrom, L. E. (1972). An MMPI handbook: Vol. I. Clinical interpretation (rev. ed.). Minneapolis: University of Minnesota Press. Dahlstrom, W. G., Welsh, G. S., & Dahlstrom, L. E. (1975). An MMPI handbook: Vol. II. Research applications (rev. ed.). Minneapolis: University of Minnesota Press. Duckworth, J. C., & Anderson, W. (1986). MMPI interpretation manual for counselors and clinicians (3rd ed.). Muncie, IN: Accelerated Development.

7 Edinger, J. D., Stout, A. L., & Hoelscher, T. J. (1988). Cluster analysis of insomniacs’ MMPI figures: Relation of subtypes to sleep history and treatment outcome. Psychosomatic Medicine, 50, 77-87. Edwin, D., Andersen, A. E., & Rosell, F. (1988). Outcome prediction by MMPI in sub-

types of anorexia nervosa. Psychosomatics, 29, 273-282. Filstead, W. J., Drachman, D. A., Rossi, J. J., & Getsinger, S. H. (1983). The relationship of MMPI subtype membership to demographic variables and treatment outcome among substance misusers. Journal of Studies on Alco-

hol, 44, 917-922. Friedman, A. F., -Webb, J. T., & Lewak, R. (1989). Psychological assessment with the MMPI. Hillsdale, NJ: Lawrence Erlbaum Associates. Gallucci, N. T., Kay, D. C., & Thornby, J. I. (1989). The sensitivity of 11 substance abuse scales from the MMPI to change in clinical status. Psychology of Addictive Behaviors, 3,

29-33. Gilberstadt, H., & Duker, J. (1965). A handbook for clinical and actuarial MMPI interpretation. Philadelphia: Saunders. Graham,

J. R.

(1977).

The MMPI:

A practical

guide. New York: Oxford. Graham, J. R. (1987). The MMPI: A practical guide (2nd ed.). New York: Oxford. Graham, J. R. (1990). MMPI-2: Assessing personality and psychopathology. New York: Oxford.

Graham, J. R., & Strenger, V. E. (1988). MMPI characteristics of alcoholics: A review. Journal of Consulting and Clinical Psychology, 56,

197-205. Greene, R. L. (1980). The MMPI: An interpretive manual. New York: Grune & Stratton. Greene, R. L. (1987). Ethnicity and MMPI performance: A review. Journal of Consulting and Clinical Psychology, 55, 497-512. Greene, R. L. (Ed.). (1988). The MMPI: Use in specific populations. Philadelphia: Grune & Stratton.

Greene, R. L. (1991). The MMPI-2/MMPI: An interpretive manual. Boston: Allyn & Bacon. Greene, R. L., & Brown, R. C. (1990). MMPI-2 adult interpretive system. Odessa, FL: Psychological Assessment Resources. Greene, R. L., & Garvin, R. D. (1988). Sub-

stance abuse/dependence.

In R. L. Greene

(Ed.), The MMPI:

MMPI-2

Use with specific popula-

tions (pp. 159-197). Philadelphia: Grune & Stratton. Guck, T. P., Meilman, P. W., Skultety, F. M., &

Poloni,

L. D. (1988). Pain-patient

MMPI

subgroups: Evaluation of long-term treatment outcome. Journal of Behavioral Medicine, 11,

159-169. Hathaway, S. R., & McKinley, J. C. (1940). A multiphasic personality schedule (Minnesota): I. Construction of the schedule. Journal of Psychology, 10, 249-254. Hathaway,

S. R., & Meehl,

atlas for neapolis: Herron, L. Changes

P. E. (1951).

An

the clinical use of the MMPI. MinUniversity of Minnesota Press. D., & Pheasant, H. C. (1982). in MMPI figures after low-back sur-

gery. Spine, 7, 591-597.

Hoffmann, H., & Jansen, D. G. (1973). Relationships among discharge variables and MMPI scale scores of hospitalized alcoholics. Journal of Clinical Psychology, 29,

475-477.

Hoffman, N. H., Loper, R. G., & Kammeier, M. L. (1974). Identifying future alcoholics with MMPI alcoholism scales. Quarterly Journal of Studies on Alcohol, 35, 490-498. Huber, N. A., & Danahy, S. (1975). Use of the MMPI in predicting completion and evaluating changes in a long-term alcoholism treatment program. Journal of Studies on Alcohol,

36, 1230-1237. Kammeier, M. L., Hoffmann, H., & Loper, R. G. (1973). Personality characteristics of alcoholics as college freshmen and at time of treatment. Quarterly Journal of Studies on Al-

cohol, 34, 390-399. Keller, L. S., & Butcher, J. N. (1991). Assessment of chronic pain with the MMPI-2. Min-

neapolis: University of Minnesota Press. Klonoff, H.; Fleetham, J., Taylor, D. R., & Clark, C. (1987). Treatment outcome of obstructive sleep apnea: Physiological and neuropsychological concomitants. Journal of Nervous and Mental Disease, 175, 208-212. Krasnoff, A. (1977). Failure of MMPI scales to predict treatment completion. Journal of Studies on Alcohol, 38, 1440-1442.

Lachar, D. (1974). The MMPI: Clinical assessment and automated interpretation. Los An-

geles: Western Psychological Services. Lewak, R. W., Marks, P. A., & Nelson, G. E. (1990). Therapist guide to the MMPI and

TS,

158

GREENE AND CLOPTON MMPI-2: Providing feedback and treatment. Muncie, IN: Accelerated Development. Long, C. J. (1981). The relationship between surgical outcome and MMPI figures in chronic pain patients. Journal of Clinical Psychology, 37, 744-749.

Loper, R. G., Kammeier, M. L., & Hoffmann, H. (1973). MMPI characteristics of college freshman males who later became alcoholics. Journal of Abnormal Psychology, 82, 159162. Love, A. W., & Peck, C. L. (1987). The MMPI and psychological factors in chronic low back

P. A.,

Seeman,

W.,

&

Haller,

245-251. Mozdzierz, G. J., Macchitelli, F. J., & Conway,

J. A. (1973). Personality characteristic differences between alcoholics who leave treatment against medical advice and those who don’t. Journal of Clinical Psychology, 29, 78-82. Naliboff, B. D., McCreary, C. P., McArthur,

D. L., Cohen,

pain: A review. Pain, 28, 1-12.

MacAndrew, C. (1965). The differentation of male alcoholic outpatients from nonalcoholic psychiatric outpatients by means of the MMPI. Quarterly Journal of Studies on Alcohol, 26, 238-246. Marks, P. A., & Seeman, W. (1963). The actuarial description of personality: An atlas for use with the MMPI. Baltimore: Williams & Wilkins. Marks,

MMPI: Use with specific populations (pp. 110-158). Philadelphia: Grune & Stratton. Morey, L. C., Waugh, M. H., & Blashfield, R. K. (1985). MMPI scales for DSM-III personality disorders: Their derivation and correlates. Journal of Personality Assessment, 49,

D.

L.

(1974). The actuarial use of the MMPI with adolescents and adults. Baltimore: Williams & Wilkins. McArthur, D. L., Cohen, M. J., Gottlieb, H. J., Naliboff, B. D., & Schandler, S. L. (1987). Treating chronic low back pain. II. Long-term follow-up. Pain, 29, 23-38.

McCann, J. T. (1991). Convergent and discriminant validity of the MCMI-II and MMPI personality disorder scales. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3, 9-18. McCreary, C., Turner, J., & Dawson, E. (1977). Differences between functional versus organic low back pain patients. Pain, 4, 73-78. McWilliams, J., & Brown, C. C. (1977). Treatment termination variables, MMPI scores and frequencies of relapse in alcoholics. Journal of Studies on Alcohol, 38, 477-486. Millon, T. (1987). Manual for the Millon Clinical Multiaxial Inventory-Tl (MCMI-II). Minneapolis: National Computer Systems. Moore, J. E., Armentrout, D. P., Parker, J. C., & Kiviahan, D. R. (1986). Empirically derived pain-patient MMPI subgroups: Prediction of treatment outcome. Journal of Behavioral Medicine, 9, 51-63. Morey, L. C., & Smith, M. R. (1988). Personality disorders. In R. L. Greene (Ed.), The

(1988). MMPI

M. J., & Gottlieb, H. J. changes following behavioral

treatment of chronic low back pain. Pain, 35,

271-277. Nathan, P. E., & Skinstad, A. (1987). Outcomes of treatment for alcohol problems: Current methods,

problems,

and

results.

Journal

of

Consulting and Clinical Psychology, 55, 332-

340. Onorato, V. A., & Tsushima, W. T. (1983). EMG, MMPI, and treatment outcome in the biofeedback therapy of tension headache and posttraumatic pain. American Journal of Clinical Biofeedback, 6, 71-81. Pekarik, G., Jones, D. L., & Blodgett, C. (1986). Personality and demographic characteristics of dropouts and completers in a nonhospital residential alcohol treatment program. The International Journal of the Addictions,

2IFNS 113s Pettinati, H. M., Sugerman, A. A., & Maurer, H. S. (1982). Four year MMPI changes in abstinent and drinking alcoholics. Alcoholism: Clinical and Experimental Research, 6, 487—

494. Rohan, W. P., Tatro, R. L., & Rotman, S. R. (1969). MMPI changes in alcoholics during hospitalization. Quarterly Journal of Studies on Alcohol, 30, 389—400. Rounsaville, B. J., Dolinsky, Z. S., Babor, T. E., & Meyer, R. E. (1987). Psychopathology as a predictor of treatment outcome in alcoholics. Archives of General Psychiatry, 44, 505-513. Scapinello, K. F., & Blanchard, R. (1987). Historical items in the MMPI: Note on evaluating treatment outcomes for a criminal population. Psychological Reports, 61, 775-778. Sheppard, D., Smith, G. T., & Rosenbaum, G. (1988). Use of MMPI subtypes in predict-

ing completion of a residential alcoholism

7 treatment program. Journal of Consulting and Clinical Psychology, 56, 590-596. Snyder, D. K. (1990). Assessing chronic pain with the MMPI. In T. W. Miller (Ed.), Chronic pain (pp. 215-257). Madison, CT: Interna-

tional Universities Press. Stark, M. J. (1992). Dropping out of substance abuse treatment: A clinically oriented review. Clinical Psychology Review, 12, 93-116. Sternbach, R. A., Wolf, S. R., Murphy, R. W., & Akeson, W. H. (1973). Traits of pain patients: The low-back “loser.” Psychosomatics, 14, 226-229. Strassberg, D. S., Reimherr, F., Ward, M., Russell, S., & Cole, A. (1981). The MMPI and chronic pain. Journal of Consulting and Clinical Psychology, 49, 220—226. Swenson, W. M., Pearson, J. S., & Osborne, D. (1973). An MMPI source book: Basic item, scale, and pattern data on 50,000 medi-

cal patients. Minneapolis: University of Minnesota Press. Widiger, T. A., & Sanderson, C. (1987). The convergent and divergent validity of the

MMPI-2

MCMI as a measure of DSM-III personality disorders. Journal of Personality Assessment,

51, 228-242. Wiggins, J. S. (1966). Substantive dimensions of self-report in the MMPI item pool. Psychological Monographs, 80(22, Whole No. 630). Wilkinson, A. E., Prado, W. M., Williams,

W. O., & Schnadt, F. W. (1971). Psychological test characteristics and length of stay in alcoholism treatment. Quarterly Journal of Studies on Alcohol, 32, 1230-1237. Williams, D. E., Thompson, J. K., Haber, J. D.,

& Raczynski, J. M. (1986). MMPI and headache: A special focus on differential diagnosis, prediction of treatment outcome, and patienttreatment matching. Pain, 24, 143-158. Zuckerman, M., Sola, S., Masterson, J., & Angelone, J. V. (1975). MMPI patterns in drug abusers before and after treatment in therapeutic communities. Journal of Consulting and Clinical Psychology, 48, 286-296.

159

Sat

soto

|

ViddeeeithiP EG

ivi

:

ceasmlon

(2, “Sherket Pte

:

:

'

tanagern Temi

© 2 Ae

eeetea cube a oa

ee

Sear,

a

ae

ae gall

ote h a5 aati Get 90th tate Td ihe fit upvopabre

GE

es

Shs

A

sinsSige ing? VST a

a?

vob

a

0 rey: By oat

OR ee)

wae

A 9te Sonal

leas bidet oe

Saige

hectare

yy ee irk aeey BAT Bes hee

£..

:

A

tt

BY

OO

ute eee

‘pale ee

2 Agee.

ie

ai any 5

sate...

eta?

. 3 Yes.. Zd > +- 3.5 Yes.. es > EA Yes.. CF+C > FC No.. X+% < .70 Yes..S >3 No.. P< 3 or >8 No.. Pure H < 2 No.. R < 17 6..... Total Special Scorings Lv1 DV = 0x1 INC = Ox2 DR = 0x3 FAB = 0x4 ALOG = 0x5 CON = 0x7 SUM6 =0 WSUM 6 =0 AB =0 AG=1 CFB=0O COP=2

Lv2 Ox2 Ox4 Ox6 Ox7

CO=0 MOR#=1 PER =2 PSV =0

Ratios, Percentages, and Derivations R=19

L =0.06

EB =6:3.5 eb =8:9

EA =9.5 es = 17 es = 17

Adj FM =7 m =1 a:p = 5:9 Ma:Mp =3:3 2AB+Art+Ay = 1 M= 1

SCZI = 0

> C=4 :V=1

EBPer = 1.7 D = -2 Adj D= -2 T=3 Y=1

Sum6 =0 Lv =0 WSum6 =0 Mnone =0

DEPI = 3

P X+% F+% X-% S-% Xu% DCI=1

=5 = 0.79 = 0.00 =0.11 = 1.00 = 0.11

FC:CF+C = 1:3

COP=2

PureC Afr Ss

Food Isolate/R H:(H) HD(HD)

= OQ =0.36 =7

AG

=1 =0 = 0.53 = 4: 2

Blends: R = 10:19

(HHd) : (AAd) = 1: 1

CP

H+A:Hd+Ad

=0

Zfoi=15 Zd =+9.0 W:D:Dd =9:6:4 W:M = 9: 6 DQ+ =12 DQv= 1 S-CON = 3

HVI = No

= 10.5

3r+(2)/R = 0.74 Fr+rF == 3 FD =1 An+Xy =0 MOR =1

OBS = Yes

11

RORSCHACH

267

tp 19

Rorschach:

Psychodiagnestics

I iv Printed in U.S.A.

IX

FIG. 11.1.

Location choices for Protocol 1.

INTERPRETATION OF FIRST PROTOCOL Interpretation of Rorschach protocols in the Comprehensive System proceeds through a search strategy in which particular clusters of structural and thematic variables are examined in a specific sequence. The sequence for each individual protocol is determined on the basis of which are the most salient features of the record, as outlined in detail by Exner (1991).

Applying this strategy to Ms. A’s Rorschach yields, in approximate order of importance, the following 16 features of her protocol, which identify problems in personality functioning and could constitute treatment targets. 1. Coping Deficit Index (CDI) = 4. An elevated CDI indicates marked difficulty coping effectively with everyday demands of living, particularly with respect to capable and comfortable management of

interpersonal relationships. Ms. A’s treatment accordingly should include a substantial, if not primary, focus on interpersonal coping skills, in the expectation that more effective social coping, reflected in a lower CDI, would be of considerable benefit to her. 2. Adj D of —2. This finding signifies persistent and long-standing distress related to inability to muster sufficient personality resources to meet the demands of her everyday life. This imbalance, which holds the key to the subjectively felt distress she reports, seems due not to deficient personality

268

WEINER resources in general (her EA of 9.5 is average), but to her experiencing many more stressful demands than most peoples (es of 17). A search for the specific source of this excessive stress identifies the next three more specific treatment targets. 3. C’ = 4. This unusually frequent use of achromatic color signifies a heavy burden of painful and dysphoric internalized affect. She will feel and function much better if her therapy pays special attention to ways of lifting her spirits and thereby reducing her C’. 4. V = 1. The presence of even one Vista response identifies the presence of distressing selfcritical attitudes. People in therapy who can be helped to look on themselves more favorably, and thereby to get Vista out of their record, are likely to feel that a great burden has been lifted from their shoulders. 5. T = 3. The presence of more than one T gives evidence of unmet needs for closeness to other people and usually signifies the kind of loneliness that people experience when a rupture of previously enjoyed relationships deprives them of love and affection they had come to expect. It will be important for Ms. A’s therapist to employ whatever strategies seem appropriate to help her establish some new loving and supportive relationships, at which point her T would be expected to diminish. 6. FC:CF + C = 1:3. This pattern of color use is much more typical of young children than mature adults. It suggests that she is an emotionally immature person who experiences and expresses feelings in an overly intense and dramatic fashion and is highly changeable in her moods. She would benefit from being helped to modulate her affects a bit more than has been customary for her. This would involve becoming more restrained emotionally and should be accompanied by more FC in her Rorschach. 7. Affective ratio (Afr) of .36. Probably because her lack of emotional restraint gets her into difficulties when she has to confront emotionally charged situations, she is, in light of this low Afr, inclined to back away from situations in which people are likely to exchange strong feelings. Accordingly, she needs help to feel more comfortable when affects emerge in social situations, in order for her to become more rewardingly engaged in interpersonal relationships, and a higher Afr is an important treatment target in her case. 8. S =7. Although not apparent from the clinical history, a substantial amount of underlying anger and resentment are revealed by her unusually frequent use of white space. The interpersonal implications of these angry feelings will become more apparent in a moment, as this analysis proceeds. However, at this point, there is little doubt that she is troubled by a maladaptive extent of resentment toward others or toward her circumstances that needs to be eased in her treatment. 9/10. Reflections = 3 and egocentricity ratio of .74. These unusual elevations on these two variables, which typically are examined together, identify a stylistic pattern of self-centeredness involving an inflated sense of self-worth, a predilection to externalize blame, and feelings of entitlement. Such personality features often cause adjustment difficulties, because high-reflective, high-egocentric

people tend to be seen by others as selfish, manipulative, and narcissistic. However, in Ms. A’s case, these self-aggrandizing features have to be understood in light of the distress, depression, and selfcritical attitudes already documented by the Rorschach data. It is not unusual clinically for people who

basically feel badly about themselves and their circumstances to ward off deep depression by mechanisms of denial, which produce superficial manifestations of cheerfulness, enthusiasm, optimism, and self-love. Efforts to disabuse these people of such hypomanic defenses can do more harm than good by

precipitating depressive reaction and sometimes suicidal behavior. To the extent that Ms. A’s reflections and egocentricity ratio represent some necessary, although not entirely desirable, defenses, they probably should be regarded not as treatment targets, but as adjustment problems to be left alone for the time being.

11. Thematic content. Analysis of the content of those responses most likely to involve projection (i.e., the minus form, human movement, and embellished responses) reveals four themes related to probable underlying concerns and attitudes. The most dramatic response in the record is No. 3, which is

a morbid M—. This response begins as “A face of someone who’s been hurt,” which appears to capture self-image concerns about suffering past and possible future hurts in interpersonal relationships. Later,

however, she turns the response around so that the apparent object of the hurt is not herself, but instead a male figure. To leave little doubt that she harbors fantasies of retaliation against men who have hurt a

11

RORSCHACH

her, she says that the male in this percept who is hurt and has blood on his chin has the same kind of beard as someone “I used to go with.” The other minus response in the record (No. 8) captures a theme of figures (two ducks) looking in opposite directions, which may speak to her interpersonal concerns about people looking the other way instead of paying sufficient attention to her. Another of her M responses (No. 6, two waiters cleaning off a table) suggests a third theme: her concern about being placed in the role of cleaning up after other people, which she expressed clearly in complaining about the boyfriend who treated her as a maid. Finally, her remaining Ms (Nos. 5, 10, 13, and 17) capture as a fourth theme her self-centered concerns with making an attractive appearance and being admired for it. She pays considerable attention to details of clothing and reports percepts of a Las Vegas showgirl in a flowing costume (No. 10) and a girl primping in the mirror (No. 13). 12. a:p = 5:9. Continuing with the prominent interpersonal difficulties identified by Ms. A’s Rorschach protocol, this finding identifies a passive stance in relation to others. She prefers to follow rather than lead, to have other people make decisions for her, and to let others take on responsibilities that should be hers. This passivity is apparent in her interpersonal history, and it combines with her previously noted anger and resentment to indicate the likelihood of passive—aggressive, rather than effectively assertive ways of coping with problems. Thus, an important treatment target is identified, namely, to help her become less passive and more assertive. 13. Isolation index of .53. This finding further identifies a currently barren interpersonal life, in which her previously noted feelings of loneliness are accompanied by a lifespace in which she has very few people in whom to confide or with whom to keep pleasant company. Thus, the need is emphasized to focus her treatment on expanding her interpersonal involvements. 14, Lambda = .06. This unusually low Lambda identifies limited capacity to deal with experience in a simple and objective manner. Instead, she is likely to do things the hard way, make a major production of minor events, and get wrapped up in thoughts and feelings that serve little purpose. She would be a more relaxed, less preoccupied, and less excitable person if therapy could get her Lambda up, which would involve helping her to deal with life experiences in a psychologically more economical

manner.

:

15. Zd of +9.0. Ms.-A, as one feature of difficulty being economical, likes to examine situations carefully and consider options carefully before coming to conclusions or making decisions. Her excessive tendencies in this regard can lead to inefficiencies and delays in bringing projects to fruition. On the other hand, in some vocations, an unusual thoroughness and determination not to overlook anything can prove valuable, providing that there are no pressing deadlines to meet, and Ms. A, as a

paralegal, may be in one of those vocations. Hence, this particular index of adjustment difficulty may signify, in her case, a personality orientation that should be left in place, rather than established as a

treatment target. 16. Obsessive index is elevated. This indication of obsessive—compulsive style fits closely with the evidence of unusual thoroughness just noted. Although being obsessive can cause adjustment difficulties and constitutes, in some cases, a major treatment target, this seems not to be such a case. Ms. A’s obsessive style may be conducive to her functioning well in her job, and her work is an aspect of her life that is going well. Hence, this feature of her style probably should be left in place.

Finally, Ms. A’s Rorschach can be examined for indications of the obstacles to progress in treatment that were noted earlier. Her a:p of 5:9 does not demonstrate rigidity; her D of —2 does not demonstrate self-satisfaction; her FD of 1 does not demonstrate nonintrospective-

ness; and her T of 3 does not demonstrate interpersonal distancing. To the contrary, she appears to be a reasonably flexible, self-concerned, introspective, and interpersonally inter-

ested individual with good prospects for becoming beneficially engaged in a treatment relationship. The systematic manner in which the Rorschach Comprehensive System interpretive search strategy identifies treatment targets provides a detailed basis for providing feedback to prospective therapy patients concerning their psychological needs and the probable focus of

269

270

WEINER

their therapy. This was done in Ms. A’s case, and she was referred for psychotherapy, the nature and course of which are discussed next to illustrate treatment evaluation with the Rorschach. Following her initial examination, Ms. A entered individual psychotherapy on a weekly basis. She was reexamined 11 months later, at which time her therapist indicated that, in accord with the pre-therapy test findings, the treatment focus had been on social interaction, with attention both to romantic relationships with men and friendships with women. The therapist reported good progress in both areas. She has been dating regularly without becoming involved in any problematic relationships, and she has become more at ease in forming friendships with her female acquaintances. She is more comfortable than before in making decisions, and the problem with the spastic colon has become much less severe. Ms. A’s second Rorschach protocol follows in Tables 11.5, 11.6, and 11.7 and Fig11:2.

TABLE 11.5 Rorschach Protocol 1: Ms. A Pre-Therapy

Response

Inquiry

(Repeats subject’s response)

I'll say a butterfly, are these the same ones from before?

It has the wings and little antennae and the divided tail and it has white markings on it, the wings are spread like it is flying.

Yes.

Down here it looks like the outline of a woman, standing behind something,

you only see her outline.

am

(Repeats subject’s response) Well, it’s her outline, see the legs and her waist and her arms, her head isn't obvious, it’s like she’s standing behind something that you can see through, maybe a curtain or something. I’m not sure where the curtain is.

It’s this part here (Dd24), she’s behind it. This is the one with the blood, | remember it looks like two people fighting and they've got blood around their heads and legs.

I

(Repeats subject's response) It’s like two men in a fist fight, see they

have their fists colliding here (D4),

they're big men and apparently they're both hurt because of the blood on them, see blood on their heads, the red and down here around their legs.

Just this part looks like a dog’s head,

(Repeats subject's response)

one on each side.

Just the sketch of the head of a dog, see the nose and the ear, it just has the shape of the head of a dog, not real, just a sketch.

This is the dancers, like in a musical. They're dressed the same.

(Repeats subject's response) One on each side (D9), two women I'd say, the breasts are pretty apparent,

see the head and they have rather large noses and their hair is pulled tight, and here's the leg and arm and they seem to have white belts or sashes and high-

heels. They're dressed identically, probably because they’re in the chorus. (Continued)

11. TABLE

RORSCHACH

11.5

(Continued) a

Card

ee

Response

Inquiry

rs

IV.

6.

| don’t remember the red, these two remind me of guitars. _

Es S:

rg

This looks like the monster from the

Ee

(Repeats subject's response)

lagoon all covered with mud, like he’s

S:

It’s like you're looking up at him, he’s got big feet and his arms are smaller and his head up here and he’s covered with all that gooey mud, wet looking. Wet looking? He just looks wet, | suppose the shades give that impression, like mud, it creates a lot of irregularity to the outline too.

standing there and you're looking up at him. E: S:

Vi

(Repeats subject’s response) They're shaped like those crazy electric guitars that the punkers use. | went to one of those concerts not long ago and thought I'd go deaf, they all had crazy looking guitars like these.

v8.

This way it’s like a badge.

Ee S:

(Repeats subject's response) Like a pilot's badge or someone who works for the airline, see these wb wings and then some other symbol in the middle.

9

Another butterfly, this one is flying too.

E: S:

(Repeats subject’s response) It has antenna, wings, and a tail, a split tail like some butterflies have those, the wings are spread out as if it is flying.

10.

|remember, a woman in a show, she has a long trailing costume.

E: S:

(Repeats subject’s response) Her legs, she has a headdress on, her

arms up, like a showgirl in a casino show, she has a costume that flows

down as she walks, hugh plumes, big fancy things, like the showgirls wear. Vi.

11.

It looks Indian, like an Indian robe, or maybe a blanket, all fur.

E: S:

Eee S:

vi2.

This way it can be a fancy mirror, an antique one, I’m not counting this on

Ee: S:

the handle (Dd22), or these out here (Dd24). Es S:

Vil.

13.

Oh, the little girl looking in the mirror, she’s sitting on a cushion.

Ee S:

(Repeats subject’s response)

Not this top (D8), just the rest of it, it looks like some furry animal skin that has been cut so that it can be used asa blanket or robe. You said it’s all fur? Oh yes, the shades definitely give that impression.

(Repeats subject’s response) This would be the handle (D6) and the rest is the frame of the mirror, you don’t see the glass, it’s like the back of it with the carving showing. The carving showing? It looks like it has grooves or designs carved into it, bumpy like, those fancy mirrors used to be like that on the back.

(Repeats subject’s response) She’s looking in the mirror, here’s her nose, kinda upturned, her forehead, and chin, she has a ponytail, she has a big bow on the back of her dress (Dd21) and she’s sitting on this cushion down

here (Dd23) and over here (right side) is her image in the mirror, just the same. (Continued)

271

272

~WEINER TABLE 11.5 (Continued)

Card

Response

>14.

Vill.

15.

This looks like a terrier when | hold it this way, a little one just standing there.

It looks like two animals climbing up a

18.

19.

20.

v21.

|

If |just use the orange it{ooks like two clowns in the circus, like they're joshing with each other, doing an act.

The two pixies in their jammies and nightcaps, looking at each other.

(Repeats subject’s response) He has a long tail, see the flat nose and the little ear, that dark spot is his eye and here are his little feet, | have a friend who has one like this.

(Repeats subject’s response) It’s like rectangular wings, I've seen some kites like this, the wings spread out in flight and here you can see where the string connects to it, see this is the string and it goes down to here.

The blue taken alone can be a kite.

The pretty orange and white flower. remember this one.

Es S

(Repeats subject’s response) The two animals on the sides, like beavers, | don’t know if they climb trees but | guess that they can, see their legs and the head and body and this is some kind of tree, see the top is up here and the branches go out, they’re grabbing a branch and this center is the other branches and down here is the ground, they’re just standing up to see if there is any food on top.

tree.

16.

Inquiry

(Repeats subject’s response) The flower is all of the top and center, it has petals in the center, orange and white petals (points), here is the stem going up through the center and these big green leaves and the pot is down here, this red, you just see the top of it.

(Repeats subject’s response) They have pointed hats and it looks like they're pointing these long sticks at each other, like pretending to duel, you know they sometimes carry those firecracker sticks (Dd34) chasing each other around, they're dressed in orange suits.

(Repeats subject's response) Here, the pink, the outline of their two heads is fairly precise, the forehead and nose and chin and their pointed nightcaps, but the rest of them is concealed like they're in those Dr. Denton kind of one piece pajama suits, pink, | bought them for my sister’s little girls.

| remember this part, it looks like the headset for the Walkman, that | use when | jog.

Right here, the earpieces and this is the part that connects them.

This part looks like a person waving two big green things, like doing a

(Repeats subject's response) Right here (D5), you can see his

(Repeats subject's response)

oo eee .

(Continued)

11. TABLE

RORSCHACH

11.5

(Continued) Card

Response

Inquiry

dragon dance, like for Chinese New Year.

v22._

It all reminds me of a painting of a bouquet of flowers.

outline, the legs and the head and body and he’s holding up these two big green things like paper dragons, like doing a dance like they do when they celebrate the Chinese New Year, see the dragon heads here. E: S:

(Repeats subject's response) Actually more like a floral arrangement with all different colored flowers arranged very nicely, two big pink ones in the center and then all the others around them, blues and yellows.

TABLE 11.6 Sequence of Scores for Protocol 2

Card No.

Location No.

Determinant(s)

(2)

| Content(s)

|

1 2

WSo Dd+

1 24

FC’.FMao Mp. FVo

Il

3 4

W+ Ddo

1 21

MA.CFo FO

2

H,B1 Art.(Ad)

HT}

5 6

DS+ Do

9

Ma.FC’+ Fu

2 2

H,Cg Sc

7 W+

1.

Mp.FD-FTo

(H), Ls

8

Wo

1

Fo

Art

Vv

9 Wo 10 W+

1 1

FMao Ma.mpo

A H,Cg

©

Vi

11 12

Do Ddo

1 99

FTo FVu

Ad, Ay Ay, Hh

F

Vil

13 14

W+ Do

1 2

Mp.Fr+ FMpo

H,Cg,Hh A

>

15 16

W+ Dd+

1 99

FMao mpu

A,Bt Sc

P

17 18

WS+ D+

ft 3

CF+ Ma.FCo

19 20 21 22

D+ Do Dt W+

9 3 5 1

Mp.FCo Fu Ma.FCu Cfo

IV

Vill

A H,Hh

Pop

2

EB

Z

Special Scores

3:5) 4.0 45

AG,MOR

P

4.0

COP PER

P

40 2.0 1.0 2.5 MOR

2.5 PER 4.5 3.0

PER

5.5 4.5

COP

IX x

(oA 2 2 2

Bt,Hh (H).Cg.Sc (H),Cg SC H, Art, (A) Art,Bt

P

4.5 4.0 5.5

PER, DV PER

273

274

~~ WEINER TABLE 11.7 Structural Summary for Protocol 2 Determinants Location Features Zf = 16 ZSum = §9.5 ZEst = 52.5 W

Blends

Single

Contents

S -Constellation

= 10

FC’.FM M.FV M.CF M.FC’ M.FD.FT

M FM m FC CF

=2 =2 =0 =] =1

(W = 0)

M.m

C

M.Fr M.FC

M.FC

Cn FC

CP

M.FC

Cc FT TF T FV

0 (0)

VF

=0

0

Vv =0 EY@us YF: =0 =O

cl Ex Fd Fi

=0,0 =0,0 =0,0 =0,0

DV INC

Lv1 = 1x1 = Ox2

Lv2 Ox2 Ox4

Fr rF

Ge Hh

=0,0 =0,4

DR FAB

= 0x3 = 0x4

0x6 Ox7

D =8 Dd=4

S =3

jateveeiers (FQ-) + = 13 (0) o = 9 (0)

w+

=

v=

(0)

Form Quality FQx += 3 o=14 u= § -= 0

none=O

FQf 0 2 2 0

MQal 2 6 1 0

-—

0

H (H) Hd (Hd) Hx

=6,0 =3,0 =1,0 =0,0 =0,0

=0

A

=4,0

NO..es > EA

=0 =0

=0

(A) Ad

(Ad)

=0,1 =1,0

=O, 1

NO..CF+C+Cn > FC NO..X+% < .70

NO..S >3

6-0 =O =1 =0 =O

An Art Ay BI Bt

=0,0 =3,1 =1,1 =0,1 =1,2

NO..P 8 NO..Pure H < 2 NO..R < 17

=O =0

=0,5

Special Scorings

FD

=O

Ls

=0,1

ALOG = 0x5

F

=1

Na

=0,0

CON

Sc Sx Xy Id

=3,1 =0,0 =0,0 =0,0

SQx 2 1 0 0

0

Cg

YES..FV+VF+V+FD>2 NO..Col-Shd B1>0 YES..Ego.44 NO..MOR > 3 YES..Zd > +- 3.5

= 0x7

SUM6 =1 WSUM6 =1

(2) =8

AB =0 AG=1

CO=0 MOR#=2

CFB=O COP=2

PER =5 PSV =0

Ratios, Percentages, and Derivations

ee

See

ee

R=22

L =0.22

EB =9: 4.5 eb =6:6

EA = 13.5 es = 12 es = 11

Adj FM =4 m =2 ap Ma:Mp

:C=#2 :V=2 = =

87 5:4

Sum6 Lv2

EBPer = 2.0 D = +0 Adj D= +4

T=2 Y=0 =1 =0

2AB+Art+Ay = 6

WSum6 = 1

M-

Mnone

= 1

ee

P =8 X+% = 0.77 F+% = 0.50

X-% = 0.00

=0

ee

ee

DEPI = 3

DCI = 1

ee

FC:CF+C =3:3 COP=2 AG =1 PureC = OQ _ Food =0 Afr =0.57 Isolate/R = 0.18 Ss =3 H:(H) HD(HD) =6: 3

Blends: R = 10:22

(HHd) : (AAd) =3: 2

CP

H+A:Hd+Ad

=0

Zf =16 Zd =+7.0 W:D:Dd = 10: 8:4

W:M = 10: 9

S-% = 0.00 DQ+ =13 Xu% = 0.23 DQv= 0 eee

SCZI = 0

oe

S-CON = 3

= 14:2

3r+(2)/R = 0.50 Fr+rF = 1 FD =1

An+Xy

=0

MOR

=2

HVI = No

OBS = Yes

11.

Rorschach:

Psychodiagnostics

IX

FIG. 11.2.

21

x

Printed

RORSCHACH

275

in U, S.A.

Location choices for Protocol 2.

INTERPRETATION OF SECOND PROTOCOL Utilization of these Rorschach findings in evaluating progress in treatment can be assessed in terms both of the 16 original treatment targets listed previously and the 27 adjustment indices studied by Weiner and Exner (1991). Beginning with the treatment targets, there are four of these, including the two that originally were considered first and second in order of importance, in respect to which Ms. A is now showing functioning in the normal range, as defined by the nonpatient norms. Specifically, her CDI has gone from 4 to 1 (Target No. 1) and her AdjD has gone from —2 to 0 (Target No. 2), which, taken together, indicate that she is feeling less distressed than before and more capable of coping with daily life events, especially social interactions. Moreover, her a:p ratio has gone from 5:9 to an acceptable 8:7 (Target No. 12), and her isolation index has dropped from .53 to a clinically unremarkable .18 (Target No. 13). Hence, she now seems capable of avoiding maladaptive passivity in her interpersonal relationships, of making decisions on her own, and of populating her life with a sufficient number of friends and companions. The consistency among these dramatic Rorschach changes, the major goals of the therapy, and the therapist’s report of behavioral change serve

276

WEINER

to validate each other with respect to the accuracy of the personality assessment and the efficacy of the treatment. On six other treatment targets, Ms. A shows some improvement, but is still responding in a clinically significant manner (Nos. 3, 5, 6, 7, 8, and 14). Thus, she is less dysphoric than

before (C’ from 4 to 3); she is less lonely and emotionally deprived (T from 3 to 2); she is modulating her affect in a more mature way (FC:CF+C

from

1:3 to 3:3); she feels more

comfortable dealing with affective exchange (Afr from .36 to .57); she is less angry and resentful (S from 7 to 3); and she is more capable of dealing with situations in economical, uncomplicated ways (Lambda from .06 to .57). Whereas each of these changes indicate progress in treatment, her present levels of functioning still exceed normative expectation and identify needs for further treatment. Ms. A also demonstrates some change on three structural variables that initially were regarded as potentially problematic, but also as features of her defensive style that might best be left in place (Target Nos. 9, 10, and 15). Thus, she is a little less self-centered and narcissistic than before (reflections from 3 to 1; egocentricity ratio from .74 to .55) and a little less overincorporative (Zd from +9.0 to +7.0), although, compared with adult nonpatients, she still elevates on all three variables. In this same vein, she still demonstrates the

obsessiveness noted previously (Target No. 16). These slight changes suggest that some easing or relaxation of her defensive style may be taking place, but they are modest enough to be consistent with the initial impression of her need to retain these defenses and the fact that her therapist has not been challenging them. Ms. A’s thematic content shows both changes and consistencies of interest over the 11month retest interval. The most dramatic response in the first protocol, “A face of someone who’s been hurt,” has now become a response in which two men are in a fist fight, with blood around them, and both are hurt. The preoccupation with people hurting each other remains, but now both are suffering instead of just one party being hurt. The improved structural features of this response bear further witness to her improved capacity to deal comfortably with such interpersonal tensions: the previously inaccurate percept (minus form level) now has become a commonly seen percept (ordinary form level), and the partial object representation of just the face has been replaced by a percept involving whole people. As for other responses in this second record that are likely to involve projection, there are no minus responses to examine, but eight human movement responses in addition to the men fighting. Five of these M responses continue the theme from the first record of admiring herself and showing herself off to others to receive their admiration. Thus, she sees dancers

in a musical (No. 5), women in a show wearing costumes (No. 10), a girl looking at herself in a mirror (No. 13), a clown doing an act in a circus (No. 18), and a person doing a dance in celebration (No. 21). Like her elevated reflections and egocentricity ratio, this persistent thematic content appears to identify a characterological reliance on narcissistic or hypomanic mechanisms that is not likely to change much and perhaps should not be disrupted by her therapy. The one treatment target on which there has been a change for the worse involves Vista, which has increased from | to 2. Although having 2 Vs in one’s record indicates an undesirable and discomfiting state of affairs, an increase in painful self-examination during the course of psychotherapy is not unexpected, and may even be an indication that patients are working hard to examine features of themselves or their circumstances that they would rather not have to confront. Although Weiner and Exner (1991) did not examine V in their study,

they found that dimensionality (FD), an index of introspectiveness (see Table 11.4), increased among both of their samples during the first year of treatment before decreasing later on to its baseline frequency.

11.

RORSCHACH

277

Finally, a brief examination of Ms. A’s two protocols with respect to the adjustment indices listed in Table 11.4 elucidates both her improvement and her need for continued treatment. Her pre-therapy Rorschach is positive for 12 of the 27 indices of adjustment difficulty (Nos. 1, 2, 4, 10, 11, 13, 14, 16, 19, 20, 23, and 25), with 7 of these relating to

difficulties in managing stress and modulating affect. Her second protocol after 11 months of therapy is positive for only five of the indices (Nos. 10, 18, 19, 20, and 25), with none of

these relating to stress management and only one to affect modulation. However, she does continue to show some noteworthy emotional distress (No. 10), the self-centeredness and self-glorification already noted (Nos. 19 and 20), and continuing unmet needs for closeness (No. 25). In addition, she now demonstrates some excessive reliance on intellectualization as a defense (No. 18), which constitutes an addition to the defensive repertoire she displayed

initially.

Conclusion A review of this chapter reveals that it is relatively long on conceptual underpinnings and clinical applications, and relatively short on empirical findings. This may be surprising, inasmuch as the Rorschach literature includes many thousands of research reports. However, as indicated in the discussion, much of this empirical work preceded the establishment of widely accepted, standardized procedures for coding and examining Rorschach protocols. In addition,

as also noted

in the chapter,

the bulk of research

concerned

with Rorschach

assessment in treatment planning and evaluation has focused nonspecifically on prognosis for change and improvement. Carefully designed studies concerned with differential selection of treatment approaches to meet individual patients’ needs are not available yet, and systematic longitudinal studies to assess treatment progress only have begun to appear recently. On the other hand, abundant data have demonstrated the psychometric soundness of the Rorschach and its validity as a measure of personality states and traits. To the extent that effective treatment planning and reliable assessment of treatment outcome can be based on aspects of personality functioning, there is every reason to believe that future research will document the validity of the clinical applications illustrated in this chapter and provide empirical confirmation of the utility of Rorschach in treatment planning and evaluation.

References Alpher, V. S., Perfetto, G. A., Henry, W. P., & Strupp, H. H. (1990). The relationship between the Rorschach and assessment of the capacity to engage in short-term dynamic psychotherapy. Psychotherapy, 27, 224-229. Atkinson, L., Quarrington, B., Alp, I. E., & Cyr, J. J. (1986). Rorschach validity: An empirical approach to the literature. Journal of Clinical Psychology, 42, 360-362. Blatt, S. J. (1975). The validity of projective techniques and their clinical and research contributions. Journal of Personality Assessment,

39, 327-343.

Cramer, P., & Blatt, S. J. (1990). Use of the TAT to measure change in defense mechanisms following intensive psychotherapy. Journal of Personality Assessment, 54, 236SL. Ellenberger, H. F. (1954). The life and work of Hermann Rorschach (1884—1922). Bulletin of the Menninger Clinic, 18, 173-219. Exner, J. E., Jr. (1969). The Rorschach systems. New York: Grune & Stratton. Exner, J. E., Jr. (1974). The Rorschach: A comprehensive system. New York: Wiley. Exner, J. E., Jr. (1978). The Rorschach: A com-

278

WEINER prehensive system. Vol. 2: Current research and advanced interpretation. New York: Wiley. Exner, J. E., Jr. (1983). Rorschach assessment. In I. B. Weiner (Ed.), Clinical methods in psychology (2nd ed., pp. 58-99). New York: Wiley. Exner, J. E., Jr. (1986). The Rorschach: A comprehensive system. Vol. 1: Basic foundations (2nd ed.). New York: Wiley. Exner, J. E., Jr. (1989). Searching for projection in the Rorschach. Journal of Personality Assessment, 53, 520-536. Exner, J. E., Jr. (1991). The Rorschach: A comprehensive system. Vol. 2: Interpretation (2nd ed.). New York: Wiley. Exner,

J. E., Jr., Thomas,

E. A.,

&

Mason,

B. (1985). Children’s Rorschachs: Description and prediction. Journal of Personality Assessment, 49, 13-20. Exner, J. E., Jr., & Weiner, I. B. (1982). The Rorschach: A comprehensive system. Vol. 3: Assessment of children and adolescents. New York: Wiley. Fishman, D. B. (1973). Rorschach adaptive regression and change in psychotherapy. Journal of Personality Assessment, 37, 218-224. Frank, L. K. (1939). Projective methods for the study of personality. Journal of Psychology, 8, 389-413. Garwood, J. (1977). A guide to research on the Rorschach Prognostic Rating Scale. Journal of Personality Assessment,

41, 117-119.

Gerstle, R. M., Geary, D. C., Himmelstein, P., & Reller-Geary, L. (1988). Rorschach predictors of therapeutic outcome for inpatient treatment of children. Journal of Clinical Psychology, 44, 277—280. Goldfried, M. R., Stricker, G., & Weiner, I. B. (1971). Rorschach handbook of clinical and research applications. Englewood Cliffs, NJ: Prentice-Hall. Holt, R. R. (1967). Diagnostic testing: Present status and future prospects. Journal of Nervous and Mental Disease, 144, 444-465.

Holt, R. R. (1977). A method mary process manifestations in Rorschach responses. In Ovsiankina (Ed.), Rorschach

for assessing priand their control M. A. Rickerspsychology (2nd

ed., pp. 375-420). Huntington, NY: Krieger. Klopfer, B., Ainsworth, M. D., Klopfer, W. G., & Holt, R. R. (1954). Developments in the Rorschach technique. I: Theory and development. Yonkers-on-Hudson, NY: World Book. Klopfer, B., Kirkner, F. J., Wisham, W., & Baker, G. (1951). Rorschach prognostic rating scale. Journal of Projective Techniques, 15,

425-428. Lerner, P. M. (1991). Psychoanalytic theory and the Rorschach. Hillsdale, NJ: Analytic Press. Parker, K.C.H. (1983). A meta-analysis of the reliability and validity of the Rorschach. Journal of Personality Assessment, 47, 227-231. Parker, K.C.H., Hanson, R. K., & Hunsley, J. (1988). MMPI, Rorschach, and WAIS: A meta-analytic comparison of reliability, stability, and validity. Psychological Bulletin, 103,

367-373. Rorschach, R. (1921/1942). Psychodiagnostics. Bern: Hans Huber. Schachtel, E. G. (1966). Experiential foundations of Rorschach’s test. New York: Basic Books. Schafer, R. (1954). Psychoanalytic interpretation in Rorschach testing. New York: Grune & Stratton. Stricker, G., & Healey, B. J. (1990).

Projective

assessment of object relations: A review of the empirical literature. Psychological Assessment, 2, 219-230. Weiner, I. B. (1977). Approaches to Rorschach validation. In M. A. Rickers-Ovsiankina (Eds.), Rorschach psychology (2nd ed., pp. 575-608). Huntington, NY: Krieger. Weiner, I. B., & Exner, J. E., Jr. (1991). Rorschach changes in long-term and shortterm psychotherapy. Journal of Personality Assessment, 56, 453-465.

Chapter 12 Beck Depression Inventory and Hopelessness Scale Randy Katz Joel Katz Brian F. Shaw University of Toronto

The Beck Depression Inventory (BDI) The Beck Depression Inventory (BDI) is a self-administered inventory designed to assess current severity of depression developed from clinical observations of depressed and nondepressed psychiatric patients. Clinical observations of attitudes and symptoms characteristic of depressed patients are represented in a 21-item, multiple-choice style questionnaire. Each item consists of several statements varying in the degree to which they reflect specific depressive symptoms and attitudes. Each BDI item requires a rating response on an ordinal scale from 0 to 3, where O represents the total absence of the symptom or attitude and 3 indicates the most severe level. The following 21 symptoms and attitudes were established from clinical observation: (a) mood, (b) pessimism, (c) sense of failure, (d) lack of satisfaction, (e) guilt feelings, (f) sense of punishment, (g) self-dislike, (h) self-accusation, (i) suicidal wishes, (j) crying, (k) irritability, (1) social withdrawal, (m) indecisiveness, (n) distortion of body image, (0) work inhibition, (p) sleep disturbance, (q) fatigability, (r) loss

of appetite, (s) weight loss, (t) somatic preoccupation, and (u) loss of libido.

SUMMARY OF DEVELOPMENT The BDI was designed for use as a semistructured interview administered by trained interviewers. However, it was developed and refined further to be used as a self-rating instrument taking only 10—15 minutes for administration and scoring. When self-administered, the individual selects one or more of the choices from each item that best reflects how he or she feels. The BDI score is the sum of the rank value associated with the highest ranked statement endorsed from each of the 21 items. The original BDI was developed by Beck and his colleagues in 1961 (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) and revised by Beck for publication in Beck, Rush, Shaw, and Emery (1979). In refining the psychometric characteristics of the instrument,

279

280

KATZ, KATZ, SHAW

modifications have been made, including reducing the number of response possibilities and rewording certain items. There currently are two paper-and-pencil forms of the BDI. One is a short 13-item format that mainly measures a cognitive dimension of depression; and the other is a longer 21-item format that measures noncognitive dimensions of depressive disorder, including somatic concerns (Beck, Steer, & Garbin, 1988). Validity coefficients between the two forms are acceptably high and range from 0.89 to 0.97 (Beck & Beck, 1972; Beck, Rial, & Rickels, 1974; Reynolds & Gould, 1981). Despite these minor differences between ver-

sions, the two instruments have been found to be comparable in psychiatric patients (Beck & Steer, 1984). A card format (May, Urquart, & Tarran,

1969) and a number of computer-

administered forms have been developed, but there are no data on the reliability and validity of these methods of administration (Beck et al., 1988).

BASIC VALIDITY AND RELIABILITY INFORMATION Content Validity. Over the past 30 years, advances in the classification and diagnostic practices of psychiatric disorders have led to the development of DSM in North America and the ICD system in Europe. Although both systems have progressed along similar paths, in recent years the DSM has received more international attention and is perhaps the more widely used and accepted diagnostic system. As noted earlier, the BDI originally was developed from an atheoretical model derived from observations by trained clinicians of patients suffering from depressive illness. Although the BDI is a useful tool for assessing many features of clinical depression, it does not provide enough information to establish a DSM-III-R diagnosis of major depressive episode, but must be supplemented with additional material. For instance, the BDI focuses on a 1-week period preceding administration, whereas DSM-III-R requires the presence of symptoms over a minimum of 2 weeks. The BDI does not assess symptoms relevant to weight gain, hypersomnia, psychomotor agitation, or retardation (Moran & Lambert, 1983; Viedenburg, Krames, & Flett, 1985). Finally, the BDI does not assess for change from a previous

level of functioning, which is a critical criteria for the diagnosis of DSM-III-R major depressive disorder. Overall, the content validity is good for six of the nine DSM-III criteria for depressive episode (Moran & Lambert, 1983), but does not address satisfactorily the remaining three criteria. Concurrent Validity. Beck et al. (1988) cited 35 studies where correlations were reported between the BDI and other well-established instruments that measure depression, including (a) Hamilton Psychiatric Rating Scale for Depression (HRSD; Hamilton, 1960), (b) Zung Self-Reported Depression Scale (Zung SDS; Zung, 1965), (c) Minnesota Multiphasic Personality Inventory Depression Scale (MMPI-D; Hathaway & McKinley, 1943), (d) Multiple Affect Adjective Checklist Depression Scale (MAACL-D; Zukerman & Lubin, 1965), and (e)

clinicians’ ratings of depth of depression (Beck et al., 1974; Salkind, 1969; Strober, Green, & Carlson,

1981; Wittig,

Hanlon,

& Kurland,

1963)

(see Table

12.1). The correlation

coefficients between the BDI and these measures ranged anywhere from a relatively modest .33 with DSM-III major depression (Hesselbrock,

Hesselbrock, Tenmen, Meyer, & Work-

man, 1983) to a more substantial .86 with the Zung SDS (Turner & Romano, 1984) and HRSD (Steer, McElroy, & Beck, 1982). However, the most significant relationship was

found between clinicians’ ratings and the BDI, where the correlation coefficient was reported at .96 (Beck et al., 1974). This is not surprising, because the BDI was developed on the basis of clinical observation of patients suffering with depression. Taken together, the data show that the BDI correlates well with most other self-report measures of depression. 4

-

12

BDI AND BHS

TABLE 12.1 Correlations Between the Beck Depression Inventory and Other Measures of Depression

References

Clinical

Hamilton

Zung

MMPI-D

MAACL-D

Psychiatric

Bailey and Coopen (1976) Beck et al. (1961)

.68 .66

Blatt et al. (1982)

81M

.44

Bloom and Brady (1968)

.66

Davies et al. (1975) Hesselbrock et al. (1983)

ates)

BS) 59

May et al. (1969)

.65

Mendels, Secunda, and Dyson (1972) Metcalfe and Goldman (1965)

.62

Reynolds and Gould (1981)

79

LO

.70

155

.83

41

.59

I/

Rounseville et al. (1979) Schnurr et al. (1976) }

.60

71 61

Seitz (1970)

Steer et al. (1982) Strober et al. (1981)

.86 :67

Nonpsychiatric

Atkeson et al. (1982)

73

Campbell et al. (1984)

156

Clarke and Williams (1979)

.67

Coleman and Miller (1975)

a5

Giambra (1977)

.66

Hammen (1980) Hatzenbuchler et al. (1983) Marsella et al. (1975)

.80 78 .62

Salkind (1969)

Phe

Schwab et al. (1967) Scogin and Merbaum (1983)

We .63

Scott et al. (1982)

.63

Tanaka-Matsumi, and Kameoka (1986)

.68

Turner and Romano (1984)

-86

he

Combined

Beck et al. (1974)

5)

Carroll et al. (1973) Schaeffer et al. (1985)

.65

81

59

.67

76

Sh

41

Reprinted from Clinical Psychology Review, 8, by A. T. Beck, R. A. Steer, and M. A. Garbin, ”Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation,” pp. 77-100, Copyright (1988)), with kind permission from Pergamon Press Ltd, Headington Hill Hall, Oxford 0X3 OBW, UK.

Although the BDI was developed to assess the severity or depth Discriminant Validity. patients (Beck et al., 1961), a number of authors have investipsychiatric of depression in of the BDI in relation to psychiatric and nonpsychiatric validity gated the discriminant populations (Akiskal, Lemmi,

Yetevanian, King, & Belluomini,

1982; Byerly & Carlson,

1982; Clark, Cavanaugh, & Gibbons, 1983; Gallagher, Nies, & Thompson, 1982). These studies demonstrated significantly lower scores on the BDI among nondepressed normals

than depressed psychiatric patients and patients with other nonpsychiatric clinical disorders.

281

282

KATZ, KATZ, SHAW

Evidence of the ability of the BDI to discriminate between subtypes of depression is limited. Studies looking at the ability of the BDI to discriminate between subtypes of depression generally have failed to show any significant effect (Delay, Pachot, Lemperiere, & Mirouze,

1963; Schnurr, Hoaken, & Jarrett, 1976). However, Beck et al. (1988) reported

that outpatients with a recurrent episode of major depression showed higher mean BDI scores than patients suffering with a dysthymic disorder. Beck et al. (1988) examined the reliability of the BDI by conducting a metaReliability. analysis of 25 published papers using the BDI. The subject samples for these populations consisted of schizophrenics, substance abusers, college students, and depressed patients. Regardless of the population sampled, internal consistency estimates were high (ranging from .73 to .95). In addition, Beck et al. (1988) presented information on the stability of the

BDI from 10 studies that administered the inventory to the same patients on two occasions. As expected, stability estimates were higher for nonpsychiatric patients (.60 to .83) than for psychiatric patients (.48 to .86), reflecting the sensitivity of the BDI to changes in psychiatric symptomatology.

RESEARCH APPLICATIONS AND FINDINGS One of the most important applications of the BDI has to do with its sensitivity in measuring change in depressive symptoms and severity. The BDI has been used extensively in research studies designed to assess the efficacy of pharmacological interventions (Bellack & Rosenberg, 1966; Broadhurst,

1970; Burrows,

Foenander,

Davies, & Scoggins,

1976; Coppen,

Whybrow, Nuguera, Maggs, & Prange, 1972; Lipsedge & Rees, 1971; Mendels, Secunda, & Dyson, 1972), electroconvulsive therapy (ECT; Green & Statduhat, 1966), psychotherapy (Blackburn, Bishop, Glen, Whalley, & Cristie, 1981; Kovacs, Rush, Beck, & Hollon, 1981; Rush, Beck, Kovacs, & Hollon, 1977), and group therapy (Antonuccio, Lewinsohn, & Steinmetz, 1982). Overall, these studies have shown the BDI to be a sensitive and valuable

instrument in detecting statistically significant changes in symptoms and their severity as a result of these various treatment approaches. The value of using symptom-based research tools such as the BDI recently was advocated by Costello (1992). Recent studies have highlighted the importance of defining a significant change in BDI scores from a clinical as opposed to a statistical perspective. One approach, advocated by Jacobson, Follette, and Revenstorf (1984), aims to determine whether the observed changes

exceed measurement error of the particular psychometric instrument taking into account correctional factors. Alternatively, Steer, Beck, and Garrison (1986) suggested that at least a 10-point drop in BDI scores from pre- to posttreatment would indicate a clinically significant change, but there are no specific studies on this important decision.

LIMITATIONS/POTENTIAL PROBLEMS IN USE The BDI was developed as a symptom inventory, not as a diagnostic instrument. Therefore,

inappropriate use of the BDI as a diagnostic instrument can lead to misleading information, which may overestimate the prevalence of depressive illness. For instance, Ennis, Barnes, Kennedy,

and Trachtenberg (1989) examined a series of 71 consecutive admissions to an

inpatient psychiatric crisis service following the patients’ deliberate attempts at self-harm. Although 80% of those admitted to hospital scored within the moderate to severe ranges of

12

BDI AND BHS

283

depression as measured by the BDI, only 31% met DSM-III criteria for major depressive episode. Ennis and his colleagues reported a dramatic reduction in BDI scores within a few days following admission, even though these patients did not receive any significant treatment for depression. Similar findings were reported by Newson-Smith and Hirsch (1979), using the General Health Questionnaire (GHQ) and the Present State Examination (PSE), and by van Pragg and Plutchik (1985), using subjective recollection of distress. These findings suggest that for patients in a current state of acute emotional distress, high BDI scores may not necessarily reflect clinical depression, but may be interpreted as general psychological distress.

Beck Hopelessness Scale (BHS) Two opposing views tended to dominate the literature on depression in the early 1960s. One view held that hopelessness represents an amorphous emotional experience that does not lend itself to measurement or systematic quantification. A second opposing view proposed that, although the emotional component is prominent in the experience of hopelessness, the construct nevertheless can be defined, measured, and objectified in terms of a system of negative statements and attitudes concerning an individual’s current view of self and future expectations (Stotland, 1969). Although difficult to define, hopelessness may be seen as the degree to which an individual has a general negative expectancy about events in his or her future, and is one component of Beck’s (1967) cognitive triad of negative cognition (i.e., the

depressed person’s experiences regarding the self, the world, and the future). The relationship between an individual’s specific goals and his or her expectations about the likelihood of achieving them plays a major role in determining the degree of hopelessness experienced (Melges & Bowlby, 1969). The Beck Hopelessness Scale (BHS) was designed operationally to define and quantify the concept of hopelessness and to facilitate the study of negative expectations and their relationship to psychopathology. The BHS is a 20-item self-administered inventory constructed in a forced choice (true/false) format to assess the respondent’s negative expectations and pessimistic outlook. Each of the 20 items is scored either 1 or 0. A score of 1 is assigned to 11 items for a true response and to the remaining 9 items when a false response is endorsed. The total score is obtained by calculating the sum of the scores on all 20 items (range of possible scores is from 0 to 20).

SUMMARY OF DEVELOPMENT The BHS (Beck, Weissman, Lester & Trexler, 1974) was developed to advance the study of

those psychopathological states in which a pervasive sense of personal hopelessness domi-

nated the clinical picture. For instance, hopelessness is a core characteristic of depressive disorder (Beck, 1963, 1967; Melges & Bowlby, 1969), a defining feature of suicidal intent (Beck, 1963; Beck, Brown, Berchick, Stewart, & Steer, 1990; Beck, Steer, Kovacs, & Garrison, 1985; Hill, Gallagher, Thompson, & Ishida, 1988), and is associated strongly with

certain physical illnesses (Schmale, 1958). In its development, items were selected from two main sources. Nine items were selected from Heimberg’s (1961) test regarding attitudes

about the future, and 11 items were drawn from a series of statements made by psychiatric

284 _ KATZ, KATZ, SHAW patients, reflecting the clinical characteristics of hopelessness or negative expectations about the future (Beck et al., 1974).

BASIC VALIDITY AND RELIABILITY INFORMATION Content Validity. Content validity initially was assessed by several clinicians who reviewed the BHS for depressive content and comprehensibility (Beck et al., 1974). It subsequently was administered concurrently with the BDI. The BHS has a moderately high correlation with the BDI (e.g., r =

.68; Minkoff, Bergman, Beck, & Beck, 1973) and with

clinical ratings of hopelessness (Ammerman,

1988).

Concurrent Validity. Concurrent validity was assessed by comparing BHS scores with general clinical ratings of hopelessness, which included the negative expectancies and observable behaviors of (a) outpatients in a general medical practice and (b) patients who had been hospitalized for attempting suicide. Correlations between BHS scores and clinical ratings of hopelessness for general practice patients and the attempted suicide sample were .74 and .62, respectively (Beck et al., 1974) as well as with the Stuart Future test (.60). In addition, BHS

ratings have been shown to be related significantly to expressed suicidal intent (Beck, Kovacs, & Weissman,

1975).

Predictive Validity. Beck et al. (1985) carried out a prospective study, in which 165 patients initially hospitalized for significant suicidal ideation were followed-up over a 10year period. The data were analyzed to determine the relevant cutoff score to maximize the predictive power of the BHS. Ninety-one percent of the sample obtained a BHS score of 10 or more, whereas only 9% (one patient) of completed suicide attempts had a score under 10. More recently, Beck et al. (1990) confirmed the predictive power of the BHS in its ability

to identify suicide completers from among a large sample (n = 1,958) of psychiatric outpatients. A scale cutoff score of nine or higher identified 94% (n = 16) of the 17 patients who

eventually committed suicide. The high-risk group identified by this cutoff score was 11 times more likely to commit suicide compared with low-risk patients with BHS scores under nine. These findings support the view that the BHS can be an important instrument in correctly identifying psychiatric patients who ultimately commit suicide. However, this

sensitivity in detecting suicide risk occurs at the expense of incorrectly classifying a high proportion of patients who will not commit suicide (i.e., low specificity). Nevertheless, given the importance of correctly identifying high-risk patients, a high rate of false positives is acceptable. Construct Validity. Perhaps the most convincing evidence for the construct validity of the BHS comes from its strong association with suicidal intent and actual suicide completion (Beck et al., 1985, 1990). Hopelessness as measured by the BHS has a stronger association with suicidal intent than do measures of clinical depression (Beck et al., 1985, 1990; Weissman, Beck, & Kovacs, 1979). Indeed, Beck et al. (1975) found that the relationship

between depression and suicidality is reduced when the effect of hopelessness is partialled out statistically. Further evidence for the construct validity of the BHS comes from two factor analytic studies, where three similar main factors consistently emerged from both (Beck et al., 1974; Hill et al., 1988). These studies suggested that three factors with the most clinical relevance represented affective, motivational, and cognitive aspects of hopelessness. Factor 1, labeled “feelings about the future” (Beck et al., 1974) or “hope” (Hill et al., 1988), loaded on affectA,

12

BDI AND BHS

285

laden associations such as hope, enthusiasm, happiness, faith, and good times. Factor 2,

labeled “loss of motivation” (Beck et al., 1974) or “giving up” (Hill et al., 1988), loaded heavily on constructs associated with giving up and deliberate self-denial. Factor 3, labeled “future expectations”

(Beck et al., 1974) or “plans about the future” (Hill et al., 1988),

included items related to a dark future, negative expectations, and a vague and uncertain outlook. Reliability.

Overall, the BHS has been shown to be a reliable measure of hopelessness

reflecting a negative expectation for positive future outcomes. Beck et al. (1974) examined the reliability of the BHS in a population of 294 hospitalized patients who had attempted suicide. The coefficient alpha for internal consistency of the scale calculated using the KuderRichardson formula was 0.93. Intercorrelations for individual scale items and total scale score were within an acceptable range from .39 to .76. Further evidence for the reliabilityof the BHS was obtained by Hill et al.. (1988) in their examination of hopelessness as a measure of suicidal intent in the depressed elderly. An examination of the internal consistency of the BHS indicated a coefficient alpha of .84 and a Spearman-Brown split-half reliability of .82.

Interpretative Strategies and Treatment Planning The total score on the BDI of 63, indicating a severe scores designed to reflect (1988) typically have been

can range from 0, suggesting no depression, to a maximum score state of clinical depression. Although there are no specific cutoff clinical caseness, the following ranges, suggested by Beck et al. used to guide decision making in clinical and research settings; 0—

9 absence of, or minimal, depression; 10-18 mild to moderate depression; 19—29 moderate

to severe depression; 30—63 severe depression. In addition to using the total BDI score as a general index of severity in assessing depressive symptoms, an examination of individual items endorsed with a rank score of 2 or 3 on the questionnaire may point the clinician to further investigation. For example, when patients endorse Item 9 (concerned with suicide) with a response of 2 or 3, it is imperative

that the clinician carry out a thorough assessment of the risk of suicide. There also is evidence that the pessimism item on the BDI differentiates suicide completers from noncompleters (Beck et al., 1985), and therefore should alert the clinician to the possible danger of suicide ideation or behavior,

and hence to further investigation.

Likewise,

an affirmative

response to the item related to concerns about health or somatic preoccupation might lead one to consider further medical investigation and on the cognitive-affective items, to further psychological investigation. The BDI can be used to develop treatment planning from early on in the initial stages of therapy. High scores on items related to motivational deficits, such as social withdrawal and work inhibition, would suggest a treatment plan emphasizing behaviorally oriented strategies focused on helping the patient to increase his or her activities. In contrast, high scores on items related to cognitive deficits, such as pessimism, self-dislike, and self-blame/criticism,

would suggest a treatment plan with greater emphasis on identifying and addressing hopelessness, negative thinking, and cognitive distortions. The BDI is sensitive to changes in depressive symptoms, and therefore can be used to track variations in these symptoms on a session-by-session basis. A number of studies using the BDI as a pre- and posttreatment measure have demonstrated significant reductions in mean BDI scores as a consequence of various types of pharmacological treatments. For

286

KATZ, KATZ, SHAW

instance, mean BDI scores were found to be reduced in depressed patients treated with tricyclic medications (Bellack & Rosenberg, 1966; Lipsege & Rees, 1971), lithium carbonate (Mendels

et al., 1972), and ECT

(West,

1981). The BDI

also has been found to be

sensitive to psychologically oriented therapeutic interventions. The mean BDI score was lower following cognitive behavior therapy (CBT) in several studies (Blackburn et al., 1981; Kovacs et al., 1981; Rush et al., 1977) and comparable results have been found with

interpersonal therapy. The importance of the BHS lies in its clinical utility. It has been successful in identifying patients experiencing such intense hopelessness that they are of high risk for suicide. As mentioned earlier, the total score on the BHS can range from 0, suggesting no hopelessness, to a maximum score of 20, indicating the absence of all hope. Although there are no specific cutoff scores designed to reflect caseness with respect to hopelessness, a score of 9 or more has been associated with a significant risk of suicide (Beck et al., 1985, 1990). High-risk

psychiatric outpatients with a score of 9 or more were 11 times more likely to commit suicide than low-risk patients with scores below 9 (Beck et al., 1990). When interpreting scores on the BHS, clinicians should be mindful that scores above 10 may signal immediate or longterm suicide potential. It must be emphasized that a comprehensive assessment of suicide should include other clinical indices, including a history of suicide attempts, family history of suicide, alcohol and drug abuse, and the presence of an affective disorder (Beck et al.,

1990).

Case Report Mr. A is a 43-year-old married man (second marriage) with three children (from his current

marriage). He presented to the clinic with severe anxiety and sleep difficulty that he attributed to concerns about his job. He also reported increasing his alcohol consumption from being a “business drinker” (he was in sales/marketing) to drinking for stress relief (average of four drinks per day for 7 weeks). He denied feeling depressed and denied having suicidal ideation. Mr. A had concerns about how the clinician would respond to him, and on several occasions commented that he must seem like a real “baby” for being so “stressed out.” He had no family history for depression or alcohol abuse. He described his father as an “Tron John” type and his mother as “loving, but a worrier.” The major precipitants for his recent symptoms involved both financial and work stresses. His company was going through a major restructuring, and it appeared that he would be under extreme pressure to produce or be fired. Two years ago, he moved into a new house with a large mortgage—a decision that

had worried him.

j

He was very concerned that his friends who were “fun loving jocks” would see through him and ridicule him. In fact, his best friend had commented that Mr. A seemed “off in space” at their last lunch. Mr. A reported that his wife was understanding and supportive. She had been in the health-care field and encouraged him to get a psychological consultation. Mr. A felt consid-

erable responsibility toward his family and was moved to tears in the interview when he thought about “letting them down.”

As part of our standard intake assessment, Mr. A completed the BDI and the BHS. His BDI score was 18, with notable items (scored 2 or 3) being sleep disturbance, guilt, failure, and decreased interest. This score was notable given Mr. A’s general comments that he AY

12

BDI AND BHS

287

wasn’t depressed (he endorsed the BDI statement No. 1—sadness as 0). Mr. A’s BHS score was 14, a score that was concerning, given his clinical presentation. Mr. A had considerable pessimism about his situation. In the second interview, he minimized his report stating that work might “turn around.” The clinician took careful note of his hopelessness and, consistent with cognitive therapy, related it to his degree of helplessness and his self-criticism (worthlessness). The risk of suicidal behavior was considered. Mr. A denied any intent to attempt suicide, any previous attempts, and had only fleeting thoughts about suicide and escape. He began a treatment regimen including antidepressant medication and cognitive behavior therapy. Three weeks later, Mr. A, during his therapy session, acknowledged that he had, in fact, bought ammunition for his rifle just 1 week before his initial evaluation. He reported feeling positive about his therapy. By disclosing this information, the therapist arranged to dispose of the gun and ammunition. Mr. A maintained that he did not intend to harm himself, but acknowledged that his feelings of despondency were greater than he had expressed initially. It was clear that his concerns about his job were going to be ongoing. The company was not doing well and the marketing efforts in the recession were having limited effects. Therapy focused on his perceived helplessness and his attributional style (significant self-blame and tendency to take excessive responsibility for failure). Interestingly, his BDI score remained relatively stable at 18 to 20 for 11 weeks. Mr. A’s sleep improved, but other symptoms (guilt, sense of failure) were very resilient. By 16 weeks of therapy, his score was 11; and by 20 weeks, it was 9. His BHS score dropped from 14 to 7 by week 11 and was 3 at the end of 20 weeks. In sum, this case illustrates how a psychological assessment utilizing the BDI and BHS may help to alert the clinician to issues as a function of their discrepancy with self-report in the clinical interview. The BDI and BHS are both sensitive to change over the course of therapy and may be used to determine the severity of depression and hopelessness, respectively. In addition, it may be useful to consider higher (or lower) than expected scores to

pursue in the interview and/or over time. The self-report scales are both prone to social desirability, and unfortunately it may be that significant clinical symptoms are not reported. On the other hand, as in the case of Mr. A, important symptoms or a state of mind like hopelessness may be detected when clinically the patient minimizes his or her distress. Selfreport instruments are not infallible, but they do provide information that is clinically useful.

Summary and Conclusions The Beck Depression Inventory (BDI) and the Beck Hopelessness Scale (BHS) are 20-item,

self-report inventories designed to measure depression and hopelessness, respectively, in a variety of clinical and research settings. Both questionnaires are easily understood and administered, and require approximately 5—10 minutes to complete and score. The BDI has been the subject of extensive psychometric evaluation and has been demonstrated to have high content, concurrent, predictive, and construct validity, and also to be highly internally

consistent. It is especially useful in treatment planning with high and low scores suggesting

different psychotherapeutic strategies. The BHS was designed to define and measure operationally the concept of hopelessness and its relationship to psychopathology. Although the BHS has not been studied as extensively as the BDI, the available literature indicates that it,

too, has high validity and internal consistency. In particular, the BHS is useful in identifying

288

KATZ, KATZ, SHAW

patients at high risk for attempted or completed suicide, but it also has low specificity. The resulting high rate of false positives can be overlooked in view of the importance of correctly identifying patients at high risk for suicide.

References B.,

Beck, A. T., Rial, W. Y., & Rickels, K. (1974).

King, D., & Belluomini, J. (1982). The utility of the REM latency in psychiatric diagnosis: A study of 81 depressed outpatients. Psychia-

Short form of depression inventory: Crossvalidation. Psychological Reports, 34, 1184—

Akiskal,

H.

S., Lemmi,

try Research, Ammerman,

H.,

Yetevanian,

R. T. (1988).

Hopelessness

scale.

In M. Mersen (Ed.), Dictionary of behavioural assessment techniques (pp. 251-252). University of Pittsburgh: Pergamon. Antonuccio, D. O., Lewinsohn, P. M., & Steinmetz, J. L. (1982). Identification of therapist differences in a group treatment for depression. Journal of Consulting and Clinical Psychology, 50, 433-435.

Atkeson, B. M., Calhoun, K. S., Resnick, P. A., & Ellis, E. M. (1982). Victims of rape: Repeated assessment of depressive symptoms. Journal of Consulting and Clinical Psychology, 50, 96-102. Bailey, J., & Coopen, A.

(1976).

A comparison

between the Hamilton Rating Scale and the Beck Inventory in the measurement of depression. British Journal of Psychiatry, 128, 486— 489. Beck, A. T. (1963). Thinking and depression. 1: Idiosyncratic content and cognitive distortions. Archives of General Psychiatry, 9, 324— 335. Beck, A. T. (1967). Depression: Clinical, experimental,

and theoretical aspects.

New

York:

Harper & Row. Beck, A. T., & Beck, R. W.

1186. Beck, A. T., Rush, A. J., Shaw, B. F., & Emery,

7, 101-110.

(1972).

Screening

depressed patients in family practice: A rapid technique. Postgraduate Medicine, 52, 81-— 85. Beck, A. T., Brown, G., Berchick, R. J., Stewart, B. L., & Steer, R. A. (1990). Relationship between hopelessness and ultimate suicide: A replication with psychiatric outpatients. American Journal of Psychiatry, 147, 190-195. Beck, A. T., Kovacs, M., & Weissman, A. (1975). Hopelessness and suicidal behavior: An overview. Journal of the American Medical Association, 234, 1146-1149.

G. (1979). Cognitive therapy of depression. New York: Guilford. Beck,

A.

T., &

Steer,

R.

A.

(1984).

Internal

consistencies of the original and revised Beck depression inventories. Journal of Clinical Psychology, 40, 1365-1367. Beck, A. T., Steer, R. A., Kovacs, M., & Garrison,

B. (1985).

Hopelessness

and

eventual

suicide: A 10-year prospective study of pa-~ tients hospitalized with suicidal ideation. American Journal of Psychiatry, 142(5), 556—

563. Beck, A. T., Steer, R. A., & Garbin, M. A. (1988). Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation. Clinical Psychology Review, 8,

77-100. Beck,

A.

Mock,

T.,

Ward,

C.

H.,

J., & Erbaugh,

Mendelson,

J. (1961).

M.,

An inven-

tory for measuring depression. Archives of General Psychiatry, 4, 561-571. Bellack, L., & Rosenberg, S. (1966). Effects of antidepressant drugs on psychodynamics. Psychosomatic Medicine, 7, 106-114. Berndt,

S. M., Berndt,

D. J., & Byars, W. D.

(1983). A multi-institutional study of depression in family practice. Journal of Family Practice, 16, 83-87. Blackburn, I. M., Bishop, S., Glen, A.I.M.,

Whalley, I. J., & Christie, W. (1981). The efficacy of cognitive therapy in depression: A treatment trial using cognitive therapy and pharmacotherapy each alone and in combination. British Journal of Psychiatry, 139, 181—

189. Blatt, S. J., Quinlan, D. M., Chevron, E. S., McDonald, C., & Zuroff, D. (1982). Dependency and self-criticism: Psychological dimensions of depression. Journal of Consulting and Clinical Psychology, 50, 113-115. A.

12 Broadhurst, A. D. (1970). I tryptophan vs. ECT

(letter). Lancet, 1, 1392. Burrows, G. D., Foenander, G., Davies, H., & Scoggins, B. A. (1976). Rating scales as predictors of response to tricyclic antidepressants. Australian and New Zealand Journal of Psychiatry, 10, 53—56. Byerly, F. C., & Carlson, W. A. (1982). Comparison among inpatients, outpatients, and normals on three self-report depression inventories. Journal of Clinical Psychology, 38,

797-804. Campbell, M. M., Burgess, P. M., & Finch, S. J. (1984). A factorial analysis of BDI scores. Journal of Clinical Psychology, 40,

992-996.

;

Carroll, B. J., Fielding, J. M., & Blashki, T. G. (1973). Depression rating scales: A critical review. Archives of General Psychia-

try, 28, 361-366. Christenfeld, R., Lubin, B., & Satin, M. (1978). Concurrent validity of the Depression Adjective Check List in a normal population. American Journal of Psychiatry, 135, 582-584. Clarke, D. C., Cavanaugh, S. V., & Gibbons, R. D. (1983). The core symptoms of depression in medical and psychiatric patients. Journal of Nervous and Mental Diseases, 171,

705-713. Clarke, M., & Williams, A. J. (1979). Depression in women after perinatal death. Lancet, 1,

916-917. Coleman, R. E., & Miller, A. G. (1975). The relationship between depression and marital maladjustment in a clinic population: A multitrait-multimethod study. Journal of Consulting and Clinical Psychology, 43, 647-651. Coppen, A., Whybrow, P. C., Nuguera, R., Maggs, R., & Prange, A. J. (1972). The comparative antidepressant value of 1 trytophan and imipramine with and without attempted potentiation by leithrenine. Archives of General Psychiatry, 26, 474-478. Costello, C. G. (1992). Research on symptoms versus research on syndromes—Arguments in favor of allocating more research time to the study of symptoms. British Journal of Psychi-

atry, 160, 304-308. Davies, B., Burrows, G., & Poyton, C. (1975). A comparative study of four depression rating scales. Australian and New Zealand Journal of Psychiatry, 9, 21-24. Delay, J., Pachot, P., Lemperiere, T., & Mirouze, R. (1963). La nosologie des etats depressifs:

BDI AND BHS

Rapports entre l’etologie et la semiologie: 2 Resultats du Questionnaire de Beck/Classification of depressive states. [Agreement between etiology and symptomatology: 2 Results of Beck’s Questionnaire.] Encephale, 52,

497-505. Ennis, J., Barnes, R. A., Kennedy, S., & Trachtenberg, D. D. (1989). Depression in selfharm patients. British Journal of Psychiatry,

154, 41-47. Gallagher, D., Nies, G., & Thompson, L. W (1982). Reliability of the Beck Depression Inventory with older adults. Journal of Consulting and Clinical Psychology, 50, 152-

153% Giambra, L. M. (1977). Independent dimension of depression: A factor analysis of three selfreport depression measures. Journal of Clinical Psychology, 33, 928-935. Green, W. J., & Statduhat, P. P. (1966). The effects of the ECT on the sleep dream cycle in a psychotic depression. Journal of Nervous Mental Disorders, 143, 123-134. Hamiliton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23, 56-62. Hammen, C. I. (1980). Depression in college students: Beyond the Beck Depression Inventory. Journal of Consulting and Clinical Psychology, 48, 126-128. Hatzenbuchler, L. C., Parpal, M., & Mathews, L. (1983). Classifying college students as de-

pressed or nondepressed using the Beck Depression Inventory: An empirical analysis. Journal of Consulting and Clinical Psycholo-

gy, 51, 360-366. Heimberg, L. (1961). Development and construct validation of an inventory for the measurement of future time perspective. Unpublished master’s thesis, Vanderbilt University, Nashville, TN. Hesselbrock, M. M., Hesselbrock, V. M., Tenmen,

H., Meyer, R. E., & Workman,

K. L.

(1983). Methodological considerations in the assessment of depression in alcoholics. Journal of Consulting and Clinical Psychology, 51,

399-405. Hill, R. D., Gallagher, D., Thompson, L. W., & Ishida, T. (1988). Hopelessness as a measure of suicidal intent in the depressed elderly. Psychology and Aging, 3, 230-232. Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods of reporting variability and evaluat-

289

290

KATZ, KATZ, SHAW ing clinical significance. Behaviour Therapy, 15, 336-352. Kovacs, M., Rush, A. J., Beck, A. T., & Hollon, D. S. (1981). Depressed outpatients treated with cognitive therapy or pharmacotherapy: A one-year follow-up. Archives of General Psychiatry, 38, 33-39. Lipsedge, M. S., & Rees, W. I. (1971). A double-blind comparison of doxepin and amitriptyline for the treatment of depression with anxiety. Psychopharmacologia, 19, 153-162. Marsella, A. J., Sanborn, Shiguru,

L., &

K. O., Kameoka,

Brennan,

J. (1975).

V.,

Cross-

validation of self-report measures of depression among normal populations of Japanese, Chinese, and Caucasian ancestry. Journal of Clinical Psychology, 31, 281—287. May, A. E., Urquart, A., & Tarran, J. (1969). Self-evaluation of depression in various diagnostic and therapeutic groups. Archives of General Psychiatry, 21, 191-194. Melges, F., & Bowlby, J. (1969). Types of hopelessness in psychopathological process. Archives of General Psychiatry, 20, 690-699. Mendels,

J., Secunda,

S. K., & Dyson,

W. L.

(1972). A controlled study of the antidepressant effects of lithium carbonate. Archives of General Psychiatry, 26, 154-157. Metcalfe, M., & Goldman, E. (1965). Validation of an inventory for measuring depression. British Journal of Psychiatry, 111, 240-242. McKinley, J. C., & Hathaway, S. R. (1943). Identification and measurement of psychoneuroses in medical practice; Minnesota Multiphasic Personality Inventory. Journal of the American Medical Association, 122, 161167. Minkoff, K., Bergman, E., Beck, A. T., & Beck,

R. W. (1973). Hopelessness, depression, and attempted suicide. American Journal of Psychiatry, 130, 455-459. Moran, P. W., & Lambert, M. J. (1983). A review of current assessment tools for monitoring changes in depression. In M. S. Lambert, E. R. Christensen, & S. S. DeJulio (Eds.), The assessment of psychotherapy outcome. New York: Wiley. Newson-Smith, J.G.B., & Hirsh, S. R. (1979). Psychiatric symptoms in self-poisoning patients. Psychological Medicine, 9, 493-500. Reynolds, W. M., & Gould, J. W. (1981). A psychometric investigation of the standard and short form of the Beck Depression Inventory.

Journal of Consulting and Clinical Psycholo-

gy, 49, 306-307. Rounseville, B. J., Weissman, M. M., Rosenberger, P. H., Wilber, C. H., & Kleber, H. D. (1979). Detecting depressive disorders in drug abusers. Journal of Affective Disor-

ders, 1, 255—267. Rush, A. J., Beck, A. T., Kovacs, M., & Hollon, S. (1977). Comparative efficacy of cognitive therapy and pharmacotherapy in the treatment of depressed outpatients. Cognitive Therapy and Research,

1, 17—37.

Salkind, M. R. (1969). Beck Depression Inventory in general practice. Journal of the Royal College of General Practitioners, 18, 267—

Pfc Schaefer, A., Brown, J., Watson, C. G., Plemel, D., DeMotts, J., Howard, Balleweg,

B.

J., &

M. T., Petrik, N.,

Anderson,

D.

(1985).

Comparison of the validities of the Beck, Zung, and MMPI depression scales. Journal of Consulting and Clinical Psychology, 53,

415-418. Schmale, A. H. (1958). Relationship of separation and depression to disease: A report on a hospitalized medical population. Psychosomatic Medicine, 20, 259-277. Schnurr,

R., Hoaken,

P.C.S.,

& Jarrett,

F. J.

(1976). Comparison of depression inventories in a clinical population. Canadian Psychiatric Association Journal, 21, 473-476. Schwab, J. J., Bialow, M., Brown, J. M., & Holzer, C. E. (1967). Diagnosing depression in medical inpatients. Annals of Internal Medicine, 67, 695-707.

Scogin, F. R., & Merbaum, M. (1983). Humorous stimuli and depression: An examination of Beck’s premise. Journal of Clinical Psycholo-

gy, 39, 165-169. Scott, N. A., Hannum, T. E., & Christ, S. L. (1982). Assessment of depression among incarcerated females. Journal of Personality Assessment, 46, 372-379. Seitz, F. C. (1970). Five psychological measures of neurotic depression: A correlation study. Journal of Clinical Psychology, 26,

504-505. Steer, R. A., McElroy, M. G., & Beck, A. T. (1982). Structure of depression in alcoholic men: A partial replication. Psychological Report, 50, 723-728. Stotland, E. (1969). The psychology of hope. San Francisco, CA: Jossey-Bass.

12 Strober, M., Green, J., & Carlson, G. (1981). Utility of the Beck Depression Inventory with psychiatrically hospitalized adolescents. Journal of Consulting and Clinical Psychology, 49,

482-483. Tanaka-Matsumi, J., & Kameoka, V. A. (1986).

Reliabilities and concurrent validities of popular self-report measures of depression, anxiety, and social desirability. Journal of Consulting and Clinical Psychology, 54, 328-333. Turner, J. A., & Romano, J. M. (1984). Selfreport screening measures for depression in chronic pain patients. Journal of Clinical Psychology, 40, 909-913. van Pragg, H., & Plutchik, R. (1985). An empirical study on the “cathartic effect” of attempted suicide. Psychiatry Research, 16(2),

123-130. van Pragg, H., & Plutchik, R. (1987). Interconvertability of five self-report measures of de-

pression. 256. Viedenburg,

BDI AND BHS

Psychiatry Research,

22(3),

K.,

Flett,

Krames,

I., &

243G.

L.

(1985). Reexamining the Beck Depression Inventory: The long and short of it. Psychological Reports, 36, 767-778.

Weissman, A., Beck, A. T., & Kovacs, M. (1979). Drug abuse, hopelessness, and suicidal behavior. International Journal of addiction, 14, 451-464. West, E. D. (1981). Electric convulsion therapy in depression: A double-blind controlled trial. British Medical Journal, 282, 355-357.

Zukerman, M., & Lubin, B. (1965). Manual for the Multiple Affect Adjective Checklist. San Diego: California Educational and Industrial Testing Service. Zung, W.W.K. (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63— 70.

eo

Chapter 13 State-Trait Anxiety Inventory and State-Trait Anger Expression Inventory Sumner J. Sydeman University of South Florida

Early studies of emotion endeavored to discover, from an analysis of the introspective reports of trained observers, the qualitative feeling states or “mental elements” that comprised different emotions

(Titchener,

1897; Wundt,

1896). Unfortunately, this phenomenological

approach generated findings that were obviously artificial and unrelated to other kinds of behavior, and consequently resulted in a discouraging degree of conceptual ambiguity and empirical inconsistency (Plutchik, 1962; Young,

1943). Moreover, subjective reports about

emotional states came to be viewed with extreme suspicion because they were unverifiable and easily falsified (Duffy, 1941). Distrust of verbal reports was intensified further by psychoanalytic formulations that emphasized the distortions in mood and thought that may be produced by unconscious mental processes. With the advent of behaviorism shortly after the turn of the century, together with psychology’s acceptance of the physicalistic assumptions of logical positivism, research on emotion shifted from the investigation of subjective feeling states to the evaluation of behavioral and physiological variables. The typical paradigm employed in research on emotion involved the manipulation of experimental conditions designed to influence a particular emotional state, and observation of the effects of these manipulations on behavioral and/or physiological responses that presumably reflected changes in the emotion. This emphasis on behavior and physiology was attributed by Arnold (1960) to the fact that early phenomenological conceptions of emotion did not fit readily with current scientific methods. The epistemology and methodology of stimulus—response (S—R) psychology and, especially, the prevailing bias against subjective experience as a desideratum for the science of psychology, required investigators to evaluate the impact of carefully defined and manipulated antecedent (stimulus) conditions on specific physiological and behavioral responses. Beginning in the 1960s, there has been growing recognition and acceptance of the unique

importance of the experiential component of emotions. Most authorities now regard emotions as complex psychobiological states or conditions—reactions in humans that are charac-

terized by specific feeling qualities and widespread bodily changes, particularly in the autonomic nervous system. Clearly, emotional states cannot be defined by stimulus and response operations alone. Differences in personality and past experience also must be taken

292



13

STAI AND STAXI

293

into account, because they dispose people to respond to similar stimulus objects and circumstances

in radically different ways (Lazarus,

Deese, & Osler,

1952). It is now

generally

accepted that an individual’s appraisal of a particular event or situation will greatly influence his or her reactions to that circumstance

(Lazarus & Folkman,

1984; Lazarus & Opton,

1966). In the present context, the term emotion is used much as it currently is used in common language: to refer to complex, qualitatively different, psychobiological states or conditions of the human organism that have both phenomenological and physiological properties. The quality and intensity of the feelings experienced in emotional states seem to be their most unique and distinctive features. Therefore, to achieve a comprehensive understanding of emotional phenomena, appropriate methods must be developed to distinguish between qualitatively different emotional states, as well as the intensity of such states as they change over time. The nature of anxiety and anger as emotional states and the procedures employed in their measurement are reviewed briefly in this chapter. First, the measures of state and trait anxiety are discussed, and the development of the State-Trait Anxiety Inventory (STAI) is described

in some detail. Second, we examine conceptual ambiguities in the constructs of anger, hostility, and aggression, briefly evaluate a number of instruments developed to assess anger and hostility, and describe the construction and validation of the State-Trait Anger Scale (STAS). Third, the expression and control of anger are considered, and the development of the Anger Expression (AX) Scale and the State-Trait Anger Expression Inventory (STAX]) are described. The chapter concludes with a discussion of the utilization of anxiety and anger measures in treatment planning and evaluation.

Nature and Measurement of Anxiety The importance of fear (anxiety) and rage (anger) as scientific constructs is reflected in the writings of Darwin (1872/1965), who considered these emotions to be adaptive characteristics of both humans and animals that had evolved over countless generations through a process of natural selection. Noting that both fear and rage varied in intensity, Darwin observed that fear increased from mild apprehension or surprise to an extreme “agony of terror,” and that manifestations of fear included: trembling, dilation of the pupils, increased

perspiration, changes in voice quality, erection of the hair, and peculiar facial expression. For Freud (1924), fear and anxiety both referred to “something felt’”—a specific unpleasant emotional state or condition that included experiential, physiological, and behavioral components. Fear, which Freud equated with objective anxiety, implied an emotional reaction that was proportional in intensity to a real danger in the external world. In contrast, Freud used the term neurotic anxiety to describe emotional reactions that were greater in

intensity than would be expected on the basis of the objective danger because the source of

the danger was the individual’s own unacceptable (repressed) sexual or aggressive impulses.

Freud regarded anxiety as the “fundamental phenomenon and the central problem of

neurosis” (Freud, 1936, p. 85). He initially believed that anxiety resulted from the discharge of repressed, somatic sexual tensions (libido). When blocked from normal expression, libidinal energy accumulated and was discharged automatically as free-floating anxiety. This view was modified subsequently in favor of a more general conception of anxiety as a signal indicating the presence of a danger situation. The perceived presence of danger evokes an

unpleasant emotional state that serves to warn the individual that some form of adjustment is

294

SPIELBERGER AND SYDEMAN

necessary. In emphasizing the adaptive utility of anxiety as a motivator of behavior that helps individuals avoid or cope with danger, Freud’s danger signal theory is quite consistent with Darwin’s evolutionary perspective. For nearly a century, clinical studies of anxiety have appeared in the psychiatric and psychoanalytic literature with increasing regularity, but prior to 1950 there was relatively little experimental research on human anxiety (Spielberger, 1966). The complexity of anxiety phenomena, the lack of appropriate measuring instruments, and ethical problems associated with inducing anxiety in laboratory settings all contributed to the paucity of research. However, since 1950, research on human anxiety has been facilitated on two fronts: Conceptual

advances have clarified the nature of anxiety as a theoretical construct, and a number of scales have been created for measuring this construct. Cattell and Scheier (1963) pioneered the application of multivariate techniques to define and measure anxiety. A variety of self-report and physiological measures of anxiety were included in their factor analytic studies, in which relatively independent state and trait anxiety factors consistently have emerged (Cattell, 1966). Physiological measures that fluctuated over time, such as respiration rate and blood pressure, had strong loadings on the state anxiety factor, but only slight loadings on trait anxiety. In contrast, several psychometric scales had strong loadings on the trait anxiety factor, but not on state anxiety. These scales were stable over time and did not covary over occasions of measurement. Thus, based on Cattell’s research, there are two related, yet logically quite different, anxiety constructs. Perhaps most often, the construct of anxiety refers to an unpleasant emotional state or condition, but this construct also describes relatively stable individual differences in anxiety proneness as a personality trait. The concept of anxiety as an emotional state (S-Anxiety) is comparable in many respects to the conceptions of fear and objective anxiety that were formulated originally by Darwin (1965/1872) and Freud (1936). Anxiety states can be most meaningfully and unambiguously operationally defined by some combination of introspective verbal reports and physiologicalbehavioral signs (Spielberger, 1972a). As an emotional state, S-Anxiety consists of unpleasant, consciously perceived feelings of tension, apprehension, nervousness, and worry, with associated activation or arousal of the autonomic nervous system. Trait anxiety (T-Anxiety) has the characteristics of a class of constructs that Campbell (1963) called acquired behavioral dispositions and which Atkinson (1964) labeled motives. Measures of T-Anxiety assess individual differences in the tendency to perceive a wide range of situations as dangerous or threatening, and for those high in T-Anxiety to respond to these perceived threats with more frequent and intense elevations in S-Anxiety than persons low in T-Anxiety.

INSTRUMENTS FOR MEASURING ANXIETY A variety of questionnaires, rating scales, and psychometric tests are employed currently to measure anxiety in research and clinical practice. The Hamilton (1959) Rating Scale is used widely for evaluating symptoms of anxiety observed in clinical interviews or psychotherapy sessions. The severity of each symptom is rated on a 5-point scale, from “none” to “very severe, grossly disabling.” The specific anxiety symptoms that are assessed by the Hamilton Scale include: anxious mood (worry, apprehension); tension (inability to relax, trembling,

restlessness); and fears (of strangers, animals, traffic, crowds). Projective techniques such as the Rorschach Inkblots and the Thematic Apperception Test also are used extensively in the clinical evaluation of anxiety, but self-report psychometric questionnaires are by far the most popular procedures for assessing anxiety. Among these,

the Taylor (1953) Manifest Anxiety Scale (MAS) has been used extensively in experimental

13

STA! AND STAXI

295

research. The MAS consists of 50 items selected by clinical psychologists from the 566 items of the Minnesota Multiphasic Personality Inventory (MMPI) on the basis of item content reflecting symptoms of anxiety that are characteristic of individuals with anxiety neuroses. In responding to the MAS, subjects indicate how they generally feel by reporting either true or false for each MAS item. The Anxiety Scale Questionnaire (ASQ) was developed by Cattell.and Scheier (1963) to

assess anxiety in clinical situations. They assembled a large number of multiple-choice items presumed to be related to anxiety phenomena, and employed factor analytic procedures as the primary basis for item selection. Correlations between the ASQ and the MAS are typically .80 or higher, despite major differences in the authors’ conceptions of anxiety, method of test construction, and item format. Because these correlations approach the reliabilities of the individual scales, the MAS and the ASQ may be considered equivalent measures. The MAS and the ASQ were constructed before the importance of the state-trait distinction was established, but both instruments require subjects to report how often they experience anxiety symptoms, suggesting that these scales measure T-Anxiety. In early studies, S-Anxiety was measured most often by assessing physiological changes associated with activation (arousal) of the autonomic nervous system. Although a number of different physiological measures have been used as indicators of S-Anxiety (Borkovec, Weerts,

&

Bernstein,

1977;

Hodges,

1976;

Lader,

1975;

Levitt,

1980,

Martin,

1973;

McReynolds, 1968), the galvanic skin response and changes in heart rate appear to be the most popular. The utility of physiological measures of S-Anxiety have been evaluated critically by Hodges (1976). A number of questionnaires, rating scales, psychometric inventories, and physiological measures that have been used to assess anxiety are described by Levitt (1980). Many of these measures also have been reviewed and evaluated by McReynolds (1968) and Borkovec et al. (1977). The Affect Adjective Check List (AACL) developed by Zuckerman (1960) and his associates (Zuckerman & Biase, 1962; Zuckerman

& Lubin,

1965) was the first instrument de-

signed to measure both S-Anxiety and T-Anxiety. Although evidence of the validity of the AACL-Today form as a measure of S-Anxiety is impressive, the format of this scale, which only requires subjects to check adjectives that describe them, makes it somewhat insensitive in assessing the intensity of anxiety as an emotional state. Moreover, relatively low correlations of the AACL General Form with the MAS and the ASQ raise questions about the concurrent validity of this component of the AACL as a measure of T-Anxiety.

THE STATE-TRAIT ANXIETY INVENTORY (STAI) The STAI was developed by Spielberger, Gorsuch, and Lushene (1970) to provide reliable, relatively brief self-report scales to assess both state and trait anxiety in research and clinical practice. Freud’s (1936) danger signal theory and Cattell’s concepts of state and trait anxiety (Cattell, 1966; Cattell & Scheier, 1958, 1961, 1963), as refined and elaborated by Spielberger (1966, 1972a, 1972b, 1976, 1977, 1979a, 1983), provided the conceptual framework that guided the STAI test-construction process. State anxiety (S-Anxiety) was defined by Spielberger et al. (1970) as a temporal cross section in the emotional stream of life of a person, consisting of subjective feelings of tension, apprehension,

nervousness,

and worry, and activation (arousal) of the autonomic

nervous system. It was assumed further that S-Anxiety would vary in intensity and fluctuate over time as a function of perceived threat. Trait anxiety (T-Anxiety) was defined in terms of

relatively stable individual differences in anxiety proneness (i.e. , differences between people in the tendency to perceive stressful situations as dangerous or threatening, and in the

296

SPIELBERGER AND SYDEMAN

disposition to respond to such situations with more frequent and intense elevations in SAnxiety). It was assumed further that differences in T-Anxiety are reflected in the frequency that anxiety states have been experienced in the past, and in the probability that S-Anxiety reactions will be manifested in the future. When test construction for the STAI began in 1964, the initial goal was to develop an inventory consisting of a single set of items that could be administered with different instructions to assess both state and trait anxiety. A large pool of items was selected and adapted from other anxiety measures, mostly from the existing trait measures. In addition, a number of new items were written using adjectives from the AACL that were considered appropriate for assessing S-Anxiety. For each of these new items, the essential psychological content was retained, but the format was modified so that the item could be given with different instructions to assess either S-Anxiety or T-Anxiety. In selecting the items for the preliminary form of the STAI, the item pool was administered to a large sample of undergraduate university students, first with state and then with trait instructions. The state instructions required subjects to report the intensity of their feelings of anxiety, “right now, at this moment.” The trait instructions asked subjects to report how they generally feel by indicating the frequency of occurrence of their anxietyrelated feelings or symptoms. The same 20 items were administered with both state and trait instructions. When given with trait instructions, each STAI item that correlated significantly with the students’ scores on three well-known T-Anxiety scales was retained for further study. The three criterion measures

were:

The Taylor (1953) MAS

and Cattell and Scheier’s

ASQ

(1963), the two most widely used anxiety measures at the time test construction was begun, and the Welsh (1956) “Factor A” Anxiety Scale, which was derived from a factor analysis of

the 566 MMPI items. The internal consistency and stability of each STAI item was evaluated when given with either trait or state instructions. In addition, the construct validity of each S-Anxiety item given with state instructions was evaluated under high and low stress conditions. On the basis of extensive item-validity research with more than 2,000 students comprising 10 independent samples, a final set of 20 items was selected for Form A, the preliminary version of the STAI. Although the STAI (Form A) was designed to be administered with different instructions to measure both S-Anxiety and T-Anxiety (Spielberger et al., 1970), research with the inventory revealed that altering the instructions could not overcome the strong psycholinguistic state or trait connotations of key words in some items. For example, “I feel upset” was a highly sensitive measure of S-Anxiety; scores on this item increased markedly under stressful conditions and were lower under relaxed conditions, compared with a neutral condition.

However,

when given with trait instructions, correlations of this item

with other T-Anxiety items were relatively low and unstable over time. Conversely, “I worry too much” was stable over time and correlated highly with other T-Anxiety items. However, scores on this item did not increase reliably in response to stressful circumstances, nor did scores on this item decrease under relaxed conditions,

as was required for the construct

validity of an S-Anxiety item. Because of the difficulties encountered in measuring state and trait anxiety with the same items, we modified our test-construction strategy and selected separate sets of items for the STAI S-Anxiety and T-Anxiety Scales. The 20 items with the best concurrent validity (i.e., highest correlations with the MAS, ASQ, and Welsh Anxiety Scale) and most stability over time were selected for the STAI (Form X) T-Anxiety Scale. The 20 items with the highest

internal consistency and best construct validity as measures of state anxiety were selected for the STAI (Form X) S-Anxiety Scale. Only five items met the validity criteria for both scales.

13

STAI AND STAXI

207

The 30 remaining items were sufficiently different in content to be regarded as unique measures of either state or trait anxiety. Representative STAI T-Anxiety items, reflecting either the presence or the absence of trait anxiety, are listed next: Anxiety Present: 1 worry too much over something that really doesn’t matter; I get in a state of tension or turmoil as I think over recent concerns and interests. Anxiety Absent: I am content; I am a steady person.

In responding to these items, subjects rate themselves on the following 4-point frequency scale: (a) almost never, (b) sometimes, (c) often, and (d) almost always.

The main goal in constructing the STAI (Form X) S-Anxiety Scale was to measure a continuum of increasing intensity on which low scores indicated feeling calm and serene, intermediate scores were associated with moderate levels of tension and worry, and high scores reflected intense fear, approaching terror and panic. In responding to the S-Anxiety items, subjects rate the intensity of their feelings of anxiety on the following 4-point scale: (a) not at all, (b) somewhat,

(c) moderately

so, and (d) very much

so. Representative

S-Anxiety present and absent items are listed next: Anxiety Present: | am tense; I am worried. Anxiety Absent: I feel calm; I feel secure.

Insights gained in a decade of research stimulated a major revision in the STAI (Form X) (Spielberger, 1983). The main goal in revising the scale was to develop purer measures of state and trait anxiety to provide a firmer basis for differentiating between patients suffering from anxiety and depressive disorders in clinical diagnosis. Careful scrutiny of the content of the STAI items with the best psychometric properties resulted in a clearer conception of the constructs of state and trait anxiety, which then guided the formulation of potential replacement items. Selection of replacement items was based on item analyses and factor analyses of responses to the original and replacement items; 30% of the original items were replaced. In the construction and standardization of the STAI (Form Y), more than 5,000 additional

subjects were tested. The item replacement procedures are described in detail in the revised test manual (Spielberger,

1983). Factor analyses of the STAI (Form Y) items (Spielberger,

Vagg, Barker, Donham, & Westberry, 1980; Vagg, Spielberger, & O’Hearn, 1980) identified clear-cut trait and state anxiety factors, which generally were consistent with the results of previous factor studies of Form X (Barker, Barker, & Wadsworth,

1977; Gaudry & Poole,

1975; Gaudry, Spielberger, & Vagg, 1975; Kendall, Finch, Auerbach, Hooke, & Mikulka, 1976; Spielberger et al., 1980). Distinctive state and trait anxiety-absent and anxiety-present factors emerged in the four-factor solutions for Form Y, which were similar to those reported in previous factor studies of Form X. However, Form Y had better simple structure, and the factors were more differentiated and more stable than in Form X, reflecting a better balance between anxiety-present and anxiety-absent items (Spielberger et al., 1980).

RELIABILITY, STABILITY, AND INTERNAL CONSISTENCY OF THE STAI Detailed reliability data for the STAI (Form Y) are reported in the test manual (Spielberger, 1983). The test-retest stability coefficients for the Form Y T-Anxiety Scale are reasonably

high for college students, ranging from .73 to .86, but somewhat lower for high school

298

SPIELBERGER AND SYDEMAN

students, ranging from .65 to .75; the median stability coefficients for a number of different samples of college and high school students were .77 and .70, respectively. In contrast, the stability coefficients for the S-Anxiety Scale were relatively low, with a median of only .33. However, this lack of stability was expected, because a valid measure of state anxiety should reflect the influence of unique situational factors that exist at the time of testing. Because anxiety states are expected to vary in intensity as a function of perceived stress, measures of internal consistency such as alpha coefficients provide a more meaningful index of the reliability of state measures than test-retest correlations. Alpha coefficients for the STAI (Form Y) S-Anxiety Scale, computed by Formula KR-20 as modified by Cronbach (1951), are uniformly high. The S-Anxiety alphas were above .90 for large, independent samples of students, working adults, and military recruits, with a median coefficient of .93. The alpha coefficients for the STAI (Form Y) T-Anxiety Scale were also uniformly high for these groups, with a median coefficient of .90. In addition, the S-Anxiety and T-Anxiety alpha coefficients for younger, middle-aged, and older working adults remained high over the entire age range. Because the distribution of scores on the STAI S-Anxiety Scale when given under neutral conditions is skewed positively, alpha reliability coefficients are generally slightly higher when this scale is given under conditions of psychological stress. For example, the alpha reliability was .92 for the S-Anxiety Scale when it was administered to college males immediately after a difficult intelligence test, and .94 when it was given immediately after a distressing film. For the same subjects, the alpha reliability was .89 when the scale was given following a brief period of relaxation training. Further evidence of the high degree of internal consistency of the STAI scales is provided by the item-remainder correlations, which are .50 or higher for more than half of the items on both scales; all of the T-Anxiety items and 19 of the 20 S-Anxiety items had item-remainder correlations of .30 or higher for both genders in all of the normative samples. In summary, the internal consistency of the STAI (Form Y) S-Anxiety and T-Anxiety Scales is quite high as measured by alpha coefficients and item-remainder correlations. Test— retest stability is also relatively high for the STAI T-Anxiety Scale, but low for the S-Anxiety Scale, as would be expected for a measure designed to assess transitory changes in anxiety as an emotional state in more or less stressful situations.

CONTENT, CONCURRENT, AND CONSTRUCT VALIDITY OF THE STAI Individual STAI items were required to meet stringent validity criteria at each stage of the test development process (Spielberger, 1983; Spielberger & Gorsuch, 1966; Spielberger et al., 1970). As previously noted, each item was selected initially on the basis of significant correlations with both the Taylor (1953) MAS and Cattell and Scheier’s (1963) ASQ, the two

most widely used measures of trait anxiety at the time the STAI was being developed

(Spielberger et al., 1970). But the MAS contains a number of items that reflect depression rather than anxiety (e.g., “I cry easily,” “I feel useless at times,” “At times I think I am no good at all”). In the revised STAI (Form Y), items with depressive content had weaker psychometric properties and therefore were eliminated (Spielberger, 1983). Several ASQ items are more closely related to anger than anxiety (e.g., “Often I get angry with people too quickly”); items with anger content were not included in the original STAI item pool.

13

STA! AND STAXI

The relatively high correlations of scores on the STAI T-Anxiety Scale with the ASQ and the MAS, ranging from .73 to .85, indicate a high degree of concurrent validity. Because the correlations among the three scales approach the scale reliabilities, the three inventories essentially can be considered equivalent measures of trait anxiety. However, a major advantage of the STAI T-Anxiety Scale is that it provides a measure of anxiety that is much less contaminated with depression and anger. A second advantage is that the STAI T-Anxiety Scale is comprised of only 20 items, compared with the 43-item ASQ and the 50-item MAS, and thus requires only about half as much time to administer. Evidence of the construct validity of the T-Anxiety Scale is reflected in the mean scores of various neuropsychiatric patient (NP) groups compared with normal subjects. The STAI significantly discriminates between normal individuals and psychiatric patients, for whom anxiety is a major symptom (Spielberger, 1983). Except for character disorders, all NP groups have substantially higher T-Anxiety scores than normal subjects. General medical and surgical (GMS) patients with psychiatric complications also have higher T-Anxiety scores than GMS patients without such complications, indicating that the T-Anxiety Scale can identify nonpsychiatric patients with emotional problems. The lower T-Anxiety scores of patients with character disorders, for whom the absence of anxiety is an important defining condition, provides further evidence of the discriminant validity of the STAI. To demonstrate construct validity, the scores for each S-Anxiety item had to increase significantly in stressful situations and decline in relaxing situations when compared with a neutral situation. Evidence of the construct validity of the STAI S-Anxiety Scale can be noted in the finding that the S-Anxiety scores of college students are significantly higher under examination conditions and lower after relaxation training than when the students were tested in a regular class period (Spielberger, 1983). Further evidence of the construct validity of the S-Anxiety Scale may be observed in military recruits tested shortly after they began a highly stressful training program. The S-Anxiety scores of the recruits were much higher than those of high school and college students of about the same age who were tested under relatively nonstressful classroom conditions. The mean S-Anxiety scores for the recruits also were much higher than their own T-Anxiety scores, suggesting that the recruits were experiencing a high state of emotional turmoil when they were tested. In contrast, the mean S-Anxiety and T-Anxiety scores for high school and college students tested under relatively nonstressful conditions were approximately the same. More than 10,000 adolescents and adults were tested in constructing and validating the STAI. Norms for high school and college students; working adults; military personnel; prison inmates; and psychiatric, medical, and surgical patients were reported in the revised STAI (Form Y) Test Manual (Spielberger, 1983). The State-Trait Anxiety Inventory for Children (STAIC) measures anxiety in young children (Spielberger, 1973) and also may be used with adolescents. With extensive norms for fourth-, fifth-, and sixth-grade students, the STAIC

has been used in numerous studies of normal children as well as with children who have emotional or physical problems. Since first introduced a quarter century ago (Spielberger & Gorsuch, 1966), the STAI and the STAIC have been used in more than 6,000 studies. Adapted for cross-cultural research in 43 different languages and dialects (Spielberger, 1989), the STAI has been used extensively in psychological research in many areas, including: experimental investigations and clinical studies of stress-related psychiatric, psychosomatic, and medical disorders; investigations of general psychological processes, such as attention, memory, learning, and academic achieve-

ment; research on situation-specific anxiety phenomena,

such as test anxiety, anxiety in

299

300 — SPIELBERGER AND SYDEMAN sports, and speech anxiety; studies of depression, schizophrenia, sociopathy, and substance abuse; and as an outcome measure in research on the effectiveness of biofeedback, psychotherapy, and various forms of behavioral and cognitive treatment.

Anger, Hostility, and Aggression The maladaptive effects of anger in psychopathology traditionally are emphasized as important contributors to the etiology of the psychoneuroses, depression, and schizophrenia. Much has been written about the negative impact of anger and hostility on physical health and psychological well-being, but the definitions of these constructs are ambiguous and sometimes contradictory. Moreover, the terms anger, hostility, and aggression often are used interchangeably in the research literature, and this conceptual confusion is reflected in a diversity of measurement operations of questionable validity (Biaggio, Supplee, & Curtis, 1981). Given the substantial overlap in prevailing conceptual definitions of anger, hostility, and aggression, and the variety of operational procedures used to assess these constructs, we have referred to them, collectively, as the AHA! Syndrome (Spielberger et al., 1985). Spielberger, Jacobs, Russell, and Crane (1983) proposed the following working definitions of these constructs: The concept of anger usually refers to an emotional state that consists of feelings that vary in intensity, from mild irritation or annoyance to intense fury and rage. Although hostility usually involves angry feelings, this concept has the connotation of a complex set of attitudes that motivate aggressive behaviors directed toward destroying objects or injuring other people. . . . While anger and hostility refer to feelings and attitudes, the concept of aggression generally implies destructive or punitive behavior directed towards other persons or objects. (p. 16)

Anger is clearly at the core of the AHA! Syndrome, but different aspects of this emotion typically are emphasized in various definitions of hostility and aggression. Moreover, ambiguities and inconsistencies in the definitions of these constructs are reflected in the procedures that have been developed to assess them. The earliest efforts to assess anger and hostility were based on clinical interviews, behavioral observations, and projective techniques, such as the Rorschach Inkblots and the Thematic Apperception Test. The physiological and behavioral correlates of anger and hostility, and various manifestations of aggression,

have also been investigated in numerous studies. In contrast, the phenomenological experience of anger (i.e., angry feelings) has been largely neglected in psychological research. Moreover, most psychometric measures of anger and hostility confound angry feelings with the mode and direction of the expression of anger.

MEASURES OF HOSTILITY AND ANGER Beginning in the 1950s, a number of self-report psychometric scales were developed to measure hostility (Buss & Durkee, 1957; Caine, Foulds, & Hope, 1967; Cook & Medley, 1954; Schultz, 1954; Siegel, 1956). A rational-empirical strategy was employed in developing the Buss-Durkee (1957) Hostility Inventory (BDHI), which generally is regarded as the most carefully constructed psychometric measure of hostility. Conceptualizing hostility as a multidimensional concept, Buss (1961) constructed items to assess seven facets of this construct, each of which is defined by a BDHI subscale. The dimensions of the BDHI were » :

.

13

STA! AND STAXI

301

investigated in two studies in which the responses to individual BDHI items were factored. Although seven dimensions of hostility are assessed by BDHI subscales, Bendig (1962) identified only two major underlying factors, which he labeled overt and covert hostility. Russell (1981) identified three meaningful BDHI factors, which he described as: (a) neuroti-

cism, (b) general hostility, and (c) expression of anger. The need to distinguish between anger and hostility was recognized in the early 1970s with the appearance in the psychological literature of three anger measures: The Reaction Inventory (RI), the Anger Inventory (AJ), and the Anger Self-Report (ASR). The RI was developed by Evans and Stangeland (1971) to assess the degree to which anger was evoked in a number of specific situations (e.g., “People pushing into line”). Similar in conception and format to the RI, Novaco’s (1975) AI consists of 90 statements that describe anger-provoking incidents (“Being called a liar,” “Someone spits at you’). In responding to the RI and the AI, subjects rate the degree to which each situation or incident would anger or provoke them. The ASR was designed by Zelin, Adler, and Myerson (1972) to assess both “awareness of anger” and different modes of anger expression. In validating this scale, the ASR scores of psychiatric patients were found to correlate significantly with psychiatrists’ ratings of anger. Because the ASR and the RI each have been used in only one or two published studies over the past 30 years, the construct validity of these scales has yet to be established firmly. Although the AI has been used more often in research than the other anger measures, Biaggio et al. (1981) found no significant correlations of this scale with either self-ratings or observer ratings of anger and hostility. Moreover, over a brief two-week interval, Biaggio et al. reported that the test—retest stability of the AI was only .17. A common problem with existing measures of anger and hostility is that, in varying degrees, these scales confound the experience and expression of anger with situational determinants of angry reactions. Furthermore, none of these measures explicitly takes the state-trait distinction into account. The ASR Awareness subscale comes closest to examining the extent to which subjects experience angry feelings, but this instrument does not assess the intensity of these feelings at a particular time. A number of BDHI items specifically inquire about the frequency that anger is experienced or expressed (e.g., “I sometimes show my anger”; “Almost every week, I see someone | dislike”; “I never get mad enough to throw things,” italics added). Although these items implicitly assess individual differences in a personality trait, most BDHI items evaluate hostile attitudes (e.g., resentment, negativism, suspicion), rather than angry feelings.

It seems apparent that the phenomena assessed by the RI, ASR, AI, and BDHI are heterogeneous and complex. In a series of studies, Biaggio (1980) and her colleagues (Biaggio & Maiuro, 1985; Biaggio et al., 1981) examined and compared the reliability, concurrent, and predictive validity, and the correlates of the BDHI and the three anger scales

described earlier. On the basis of her findings, Biaggio (1980) concluded that evidence of the validity of these measures was both fragmentary and limited. A coherent theoretical framework that distinguishes between anger, hostility, and aggression as psychological concepts, and that takes the state-trait distinction into account, is essential for constructing and validat-

ing psychometric measures of anger and hostility.

THE STATE-TRAIT ANGER SCALE (STAS) The concept of anger, as previously noted, refers to phenomena that are both more fundamental and less complex than hostility and aggression. The State-Trait Anger Scale (STAS), which is analogous in conception and similar in format to the (STAI) (Spielberger, 1983;

302

SPIELBERGER AND SYDEMAN

Spielberger et al., 1970), was constructed to measure anger as an emotional state and individual differences in anger proneness as a personality trait. Prior to constructing the STAS, working definitions of state and trait anger were formulated. State anger (S-Anger) was defined as a psychobiological state or condition consisting of subjective feelings of anger that vary in intensity, from mild irritation or annoyance to intense fury and rage, with concommitant activation or arousal of the autonomic nervous system. It was assumed further that S-Anger would fluctuate over time as a function of perceived affronts, injustice, or frustration. Trait anger (T-Anger) was defined in terms of

individual differences in the frequency that S-Anger was experienced over time. Assuming that persons high in T-Anger perceive a wider range of situations as anger provoking (e.g., annoying, irritating, frustrating) than those low in T-Anger, high T-Anger individuals are likely to experience more frequent and intense elevations in S-Anger whenever annoying or frustrating conditions are encountered. On the basis of these working definitions, a pool of items was assembled to assess the intensity of angry feelings (S-Anger) and individual differences in anger proneness (TAnger). The following are examples of S-Anger items: “I am furious”; “I feel irritated”; “I feel like I’m about to explode.” Subjects report the intensity of their angry feelings by rating themselves on the following 4-point scale: “not at all,” “somewhat,” “moderately so,” “very much so.” Examples of T-Anger items are: “I have a fiery temper,” “I fly off the handle,” “It makes me furious when I am criticized in front of others.” In responding to the T-Anger items, subjects indicate how they generally feel by rating themselves on the following “often,” “almost always.” frequency scale: “almost never,” “sometimes,” 99

66

99

66

RELIABILITY AND INTERNAL CONSISTENCY OF THE STAS Fifteen S-Anger and 15 T-Anger items were selected for the preliminary form of the STAS. Alpha coefficients for the 15-item STAS S-Anger Scale were .93 for both males and females, indicating a high degree of internal consistency. The alpha coefficients for the STAS T-Anger Scale were .87 for both genders, providing equally strong evidence of the internal consistency of this scale. The item-remainder correlations for the individual S-Anger and T-Anger items also were uniformly high (median r = .68). Given the high internal consistency of the preliminary STAS scales, it was possible to reduce the length of these scales without weakening their psychometric properties. In revising the STAS, it was considered desirable to develop internally consistent mea-

sures of anger that were relatively independent of anxiety. Therefore, in selecting the final set of items, the S-Anger and T-Anger items with the highest item-remainder correlations for each scale and the lowest correlations with measures of anxiety were identified. With only two exceptions, the item-remainder correlations for the 15 S-Anger items were .5O or higher. Two S-Anger items with the lowest item-remainder coefficients (“I am annoyed,” “I am resentful”) and three items with the highest correlations with the STAI S-Anxiety Scale (“I feel irritated,” “I feel frustrated,” “I feel aggravated”) were eliminated, reducing the number of S-Anger items from 15 to 10. To reduce the number of T-Anger items from 15 to 10, item-remainder coefficients and correlations of each item with measures of anxiety were examined (Barker, 1979; Westberry,

1980). Two items with low item-remainder correlations (“People who think they are always right irritate me,” “I get annoyed when I am singled out for correction”), and three items for which the correlations with the STAI T-Anxiety Scale were relatively high (“I feel irritated,”

13

STAI AND STAXI

“It makes my blood boil when I am pressured,” “I feel angry”) were eliminated.

303

It is

interesting to note that two of the T-Anger items that were eliminated (i.e., “I feel irritated,”

“T feel angry”) had content validity as measures of anger. However, the correlations of these items with T-Anxiety were almost as high as their item-remainder coefficients, suggesting that feelings of anger and irritation are frequently associated with symptoms of anxiety. Correlations between

the 10- and 15-item forms of the S-Anger and T-Anger Scales,

ranging from .95 to .99 for Navy recruits and college students, indicate that the 10-item scales provide essentially the same information as the longer forms (Spielberger, 1988). Because those items with the highest correlations with anxiety were eliminated, the correla-

tions of the 10-item S-Anger and T-Anger Scales with anxiety: were substantially lower than was the case for the 15-item anger scales. Given the fact that the STAS S-Anger and T-Anger items were generated primarily on a rational basis, the internal consistency of these scales is impressive. In addition to providing evidence of the utility of the working definitions that guided the item-selection process, the high degree of internal consistency for both the STAS S-Anger and T-Anger Scales, as reflected in item-remainder correlations and alpha coefficients, indicates that most people are sensitive to their experience of angry feelings and highly consistent in reporting the intensity and the frequency of experiencing these feelings. Jacobs, Latham, and Brown (1988) examined the stability of the STAS for a large group of undergraduate students. The test—retest reliability coefficients for the STAS T-Anger Scale over a 2-week interval were .70 and .77, respectively, for males and females. In contrast, the

stability coefficients for the STAS S-Anger Scale of .27 for males and .21 for females were much lower, as would be expected for a measure of transitory anger. Because factor analyses of the STAS S-Anger items indicated only a single underlying factor for both males and females, the S-Anger Scale appears to measure a unitary emotional state that varies in intensity. In contrast, the results of the factor analyses of the T-Anger items identified two correlated factors, which were labeled Angry Temperament (T-Anger/T) and Angry Reaction (T-Anger/R). The T-Anger/T items describe the individual differences in the disposition to express anger, without specifying any provoking circumstance (e.g., “I am a hotheaded person”). The T-Anger/R items describe angry reactions in situations that involve frustration and/or negative evaluations (e.g., “It makes me furious when I am criticized in front of others’). That the two T-Anger scales assess different facets of anger is clearly reflected in the results of a study by Crane (1981). She found that the T-Anger scores of hypertensive patients were significantly higher than those of medical and surgical patients with normal blood pressure, and that this difference was due entirely to the substantially higher TAnger/R scores of the hypertensives. No difference was found in the T-Anger/T scores of the hypertensive and control patients. Crane also reported that hypertensives had significantly higher T-Anxiety scores than control patients, and that their scores on the S-Anger and S-Anxiety scales after performing on a mildly frustrating task were higher than the corresponding scores for the controls.

CONCURRENT, DISCRIMINANT, PREDICTIVE, AND CONSTRUCT VALIDITY OF THE STAS To evaluate concurrent validity, the STAS, the Buss-Durkee Hostility Inventory (BDHI; 1957), and the Hostility (HO; Cook

& Medley,

1954) and Overt Hostility (Hv; Schultz,

1954) Scales of the Minnesota Multiphasic Personality Inventory (MMPI) were administered

304

SPIELBERGER AND SYDEMAN

to undergraduate college students and Navy recruits. Moderately high positive correlations of the STAS T-Anger Scale with the three hostility measures were found for males and females in both samples, providing evidence of a substantial relationship between T-Anger and hostility. Moderate positive correlations of the STAS T-Anger Scale also were found with the Neuroticism Scale of the Eysenck Personality Questionnaire (EPQ; Eysenck & Eysenck, 1975) and the T-Anxiety Scale of the State-Trait Personality Inventory (STPI; Spielberger, 1979b) for a large sample of college students. These findings are consistent with the clinical observation that neurotic individuals frequently experience angry feelings that they cannot readily express (Spielberger, 1988). Small positive correlations between the STAS T-Anger Scale and the EPQ Psychoticism Scale suggested that individuals with high scores on the latter experience anger somewhat more frequently than individuals with low Psychoticism scores. Small negative correlations of T-Anger with the EPQ Lie Scale suggest that anger scores may be reduced slightly by testtaking attitudes that lead some people to inhibit reports of negative characteristics such as anger. However, these correlations also might be interpreted as indicating that individuals who experience anger more frequently make less use of repression and denial as defenses against emotional arousal. The finding of essentially zero correlations of the T-Anger Scale with the EPQ Extraversion and STPI Curiosity Scales indicates that T-Anger is unrelated to these personality dimensions. Although STAS T-Anger scores correlated substantially with a number of hostility measures, the research literature indicates that there are important differences in the meaning of anger and hostility as personality constructs. The nature of the relationship between anger and hostility was explored in factor analyses of the 10 T-Anger items, in which the BDHI Total and subscale scores and scores on the MMPI HO and Hv Scales were included. To evaluate the discriminant validity of the anger and hostility measures, the STPI T-Anxiety and T-Curiosity item and scale scores also were included in these analyses (Spielberger, 1980; Westberry,

1980).

The resulting three- and four-factor solutions were similar for both males and females. In the three-factor solutions, the very strong first factor clearly measured an anger/hostility dimension; the second and third factors were anxiety and curiosity. The STAS T-Anger and Buss-Durkee Total scores had the highest loadings on the anger/hostility factor. All 10 T-Anger items, the HO and Hv Scale scores, and all of the BDHI subscales except Guilt also had salient loadings on this factor. Interestingly, the BDHI Guilt, Suspicion, and Resentment subscales had higher loadings on the anxiety factor than on the anger/hostility factor. In the four-factor solutions, separate anger and hostility factors emerged for both males and females; anxiety and curiosity factors similar to those obtained in the three-factor solutions also were found. The T-Anger Scale and all but one of the T-Anger items had their highest loadings on the anger factor. The hostility factor was defined by high loadings for scores on the Buss-Durkee Total and HO Scales, and by salient loadings for all of the Buss-

Durkee subscales except Guilt. Several BDHI subscales also had salient secondary loadings on the anger factor. Interestingly, the HO Scale and the BDHI Suspicion and Resentment subscales had higher secondary loadings on the anxiety factor than on the anger factor. Thus, the results of the factor analyses indicate that measures of anger and hostility assess different, but related constructs, and that measures of anger and hostility correlate substantially with

anxiety.

In a series of studies at Colorado State University, Deffenbacher (1992) used the STAS

T-Anger Scale to assess multiple aspects of anger. The researcher found that individuals with high T-Anger scores reported that they experienced greater intensity and frequency of day-today anger across a wide range of provocative situations than persons low in T-Anger. The

13.

STAI AND STAXI

305

high T-Anger individuals also reported anger-related physiological symptoms twice to four times more often than low anger subjects. When provoked, the high T-Anger individuals were characterized by stronger general tendencies to both express and suppress anger, and by less constructive and more dysfunctional coping, as manifested in physical and verbal antagonism. In a study in which trait anger and self-concept were assessed, Stark and Deffenbacher (1986) found a moderately strong inverse relationship between these measures. The high T-Anger students did not like themselves as much as the low T-Anger subjects, nor did they feel as worthwhile or confident. Negative events such as failure also appeared to have a more devastating (catastrophizing) impact on high T-Anger individuals (Story & Deffenbacher, 1985), who reported that they experienced high levels of anxiety more frequently than students with low T-Anger scores. As anger research has progressed, the critical importance of differentiating between the experience and expression of anger has become increasingly apparent (Spielberger et al., 1985). It is essential to distinguish, both conceptually and empirically, between the experience of anger as an emotional state (S-Anger) and individual differences in anger proneness as a personality trait (T-Anger), and to identify and measure the characteristic ways in which people express their anger. In the following section, theory and research on anger expression are reviewed briefly, and the development of scales to assess the expression and control of anger is described in some detail.

The Expression and Control of Anger The conceptual and operational distinction between “anger-in” and “anger-out” as major modes of anger expression long has been recognized in psychophysiological research. The effects of these modes of anger expression on the cardiovascular system were a major focus almost 40 years ago in the classic studies of Funkenstein and his coworkers (Funkenstein, King, & Drolette, 1954). These researchers exposed healthy college students to anger inducing laboratory conditions and measured their pulse rate and blood pressure. Students who became angry during the experiment and directed their anger toward the investigator or the laboratory situation were classified as anger-out; those who suppressed their anger and/or directed it at themselves were classified as anger-in. Typically, the increase in pulse rate for students classified as anger-in was three times greater than for the anger-out group. Following the procedures used by Funkenstein et al. (1954), individuals generally are classified as anger-in in studies on anger expression if they suppress their anger or direct it inward—toward the ego or self (Averill, 1982; Tavris, 1982). Those who express their anger in aggressive behavior, directing it toward other persons or objects in the environment, are classified as anger-out. When held in or suppressed, anger may be subjectively experienced as an emotional state, S-Anger, which varies in intensity and fluctuates over time as a function of the provoking circumstances. Defining anger-in in this manner differs from the

psychoanalytic conception of anger turned inward toward the ego or self (Alexander, 1939,

1948). In the psychoanalytic conception, the feelings of anger often result in guilt and

depression (Alexander & French, 1948), whereas the thoughts and memories relating to the anger-provoking situation may be repressed and, thus, not directly experienced. Anger directed outward generally involves both the experience of S-Anger and its manifestation in some form of aggressive behavior. Anger out may be expressed in physical acts such as slamming doors, destroying objects, and assaulting other persons, or in verbal

306

SPIELBERGER AND SYDEMAN behavior in the form of criticism, threats, insults, or the extreme use of profanity. These

physical and verbal manifestations of anger may be directed toward the source of provocation or frustration, or expressed indirectly toward persons or objects associated with or symbolic of the provoking agent. Harburg and his associates have reported impressive relationships between anger expression, elevated blood pressure (BP), and hypertension, demonstrating that anger-in and angerout have different effects on the cardiovascular 1979; Harburg

et al., 1973; Harburg

system (Harburg,

& Hauenstein,

Blakelock,

1980; Harburg,

Schull,

& Roeper, Erfurt,

&

Schork, 1970). These investigators classified individuals as “anger-in” or “anger-out” on the basis of their self-ratings of how they would express anger if treated unfairly by a supervisor, a landlord, or a police officer. Gentry (1972) and his colleagues (Gentry, Chesney, Gary, Hall, & Harburg, 1982; Gentry, Chesney, Hall, & Harburg, 1981) have corroborated subse-

quently and extended Harburg’s findings. The procedure used by Harburg and Gentry to classify individuals as anger-in who did not report feeling angry in anger-provoking situations raises important conceptual issues. This procedure equates individuals who do not experience anger with those who experience and suppress their angry feelings. Different personality dynamics have been attributed by Rosenzweig (1976, 1978) to “impunitive” persons, who do not experience anger in angerprovoking situations; and “intrapunitive” persons, who turn anger in when provoked, often blaming themselves for the anger directed toward them by others.

THE ANGER EXPRESSION (AX) SCALE Differentiating between the experience of angry feelings and how these feelings are expressed can be accomplished by measuring both the intensity of S-Anger as an emotional state and individual differences in the frequency that S-Anger is expressed in behavior (angerout), suppressed (anger-in), or otherwise controlled. Because anger expression is defined implicitly by Funkenstein et al. (1954), Harburg et al. (1973), and Gentry et al. (1982) as a single dimension, varying from extreme suppression or inhibition of anger to the expression of anger in assaultive or destructive behavior, Spielberger et al. (1985) attempted to construct a unidimensional, bipolar scale to assess this dimension.

As a first step in constructing the Anger Expression (AX) Scale, working definitions of anger-in and anger-out were formulated on the basis of a review of the relevant research literature. Anger-in was defined in terms of how often an individual experiences, but holds in (suppresses), angry feelings, rather than on the basis of the more ambiguous psychoanalytic construct of anger turned against the ego. Anger-out was defined in terms of the frequency that an individual expresses angry feelings in verbally or physically aggressive behavior. In contrast to the procedure used by Funkenstein and Harburg (i.e., assigning subjects to dichotomous anger-in or anger-out categories), the AX Scale was designed to measure a

continuum of individual differences in how often anger was held in or expressed. The ratingscale format for the AX Scale was the same as that used with the STAS T-Anger Scale (Spielberger,

1980), but the instructions

differed markedly

from those used to assess T-

Anger. Rather than asking subjects to indicate how they generally feel, they were instructed to report “. . . how often you generally react or behave in the manner described when you feel angry or furious.” In responding, subjects rated themselves on the following 4-point frequency scale: (1) almost never, (2) sometimes,

(3) often, and (4) almost always.

13.

STA! AND STAXI

Consistent with our working definitions of anger-in and anger-out, the content of the items for the AX Scale ranged from strong inhibition or suppression of angry feelings (AX/In) to extreme expression of anger toward other persons or objects in the environment (AX/Out). Examples of AX Scale items are (“When angry or furious”): AX/In: I keep things in; I boil inside, but I don’t show it. AX/Out: I lose my temper; I strike out at whatever infuriates me.

In a study of the relationship between anger expression and blood pressure, Johnson (1984) administered a 33-item preliminary version of the AX Scale to 1,114 high school students; three items with poor psychometric properties and judged to be ambiguous were subsequently discarded. To verify that the AX Scale items were measuring a unitary psychological construct, the students’ responses to the individual items were evaluated in separate factor analyses for males and females. Although we originally intended to develop a unidimensional, bipolar measure of anger expression, the results of the factor analyses suggested that the AX items were tapping two independent dimensions. On the basis of the content of the items with high loadings, these factors were labeled Anger/In and Anger/Out. Most of the preliminary AX Scale items had strong loadings on one of these factors and negligible loadings on the other. Given the strength and clarity of the Anger/In and Anger/Out factors, the striking similarity (invariance) of these factors for males and females, and the large samples on which the factor analyses were based, the test-construction strategy for developing the AX Scale was modified to identify homogeneous subsets of items for measuring anger-in and anger-out. Of the 30 items on which the identification of the Anger/In and Anger/Out factors was originally based, 8 had relatively small loadings (below .35) on both factors. After eliminating these items, item-remainder correlations were computed for males and females for the remaining items; two items with relatively low item-remainders for the females were eliminated, reducing the total number of items to 20.

The selection of subsets of AX Scale items for measuring anger-in and anger-out was based on further factor analyses and subscale item-remainder correlations (Spielberger et al., 1985). Eight items with uniformly high loadings for both genders on the Anger/In factor and negligible loadings on the Anger/Out factor were selected for the AX/In subscale. The median loadings of these items on the Anger/In and Anger/Out factors were .665 and —.045, respectively. Similarly, eight items with uniformly high loadings for both genders on the Anger/Out factor and negligible loadings on Anger/In were selected for the AX/Out subscale. The median loading of the AX/Out items was .59 on the Anger/Out factor, and —.01 on the Anger/In factor. The internal consistency of the 8-item AX/In and AX/Out subscales was evaluated by

computing alpha coefficients and item-remainder correlations. All but one of the itemremainder correlations for the AX/In and AX/Out subscales were .37 or greater. The alphas ranged from .73 to .84, and were somewhat higher for the AX/In subscale. Jacobs, Latham, and Brown (1988) examined the test-retest reliability of the AX Scale and found coefficients that ranged from

.64 to .86. Johnson

(1984) and Pollans

(1983) found essentially zero

correlations between the AX/In and AX/Out subscales for both males and females in large samples of high school and college students; similar findings also have been reported for other populations (Knight, Chisholm, Paulin, & Waal-Manning,

1988; Spielberger,

1988).

Thus, the AX/In and AX/Out subscales are empirically independent, as well as factorially orthogonal. Clearly, these subscales assess two independent anger-expression dimensions.

307

308

SPIELBERGER AND SYDEMAN

MEASUREMENT OF ANGER CONTROL A number of items intended to measure the middle range of the anger-in/anger-out continuum were included in the original AX Scale item pool. Three of these items (“Control my temper’; “Keep my cool”; “Calm down faster”) were retained in the final set of 20 AX Scale items, because the item-remainder correlations for these items were strong; all three items

had substantial loadings on both the Anger/In and Anger/Out factors. In research with the AX Scale, emerging evidence that these items coalesced to form the nucleus of an anger control factor (Pollans, 1983) stimulated further work on developing an AX Anger Control (AX/Con) subscale.

The first step in constructing the AX/Con subscale was to assemble a pool of items with appropriate content. Using the three anger control items from the 20-item AX Scale as a guide, a number of additional anger control items were written. Dictionary and thesaurus definitions of control and idioms pertaining specifically to the control of anger were consulted in writing these items. The new AX/Con items were administered along with the 20 original AX Scale items to a large sample of undergraduate university students. In separate factor analyses of the AX/Con items for males and females, a large anger control (Anger/Con) factor and several very small factors were found for both genders. The items with the strongest loadings on the Anger/Con factor for both males and females were added to the three original AX/Con items to form an 8-item AX/Con subscale. To confirm the independence of the Anger/Con factor, and to evaluate its relation to the Anger/In and Anger/Out factors, the 24 AX Scale items, which included the 8-item AX/Con,

AX/Out,

and AX/In subscales, were administered to another large sample of university

students (Spielberger, Krasner, & Solomon,

1988). In the factor analyses of the AX Scale

items, an Anger/Con factor was the strongest to emerge for both males and females; all eight AX/Con items had salient loadings on this factor. Well-defined Anger/In and Anger/Out factors, on which all eight AX/In and AX/Out items had salient loadings on the appropriate factor, also were found. For both genders, the AX/Con subscale correlated negatively with AX/Out (r = —.59 and —.58 for males and females, respectively). Correlations of the AX/In subscale with the AX/Out subscale were essentially zero for both genders. The independence of the AX/In and AX/Out subscales, and moderately high negative correlations of the AX/Con and AX/Out subscales, have been demonstrated consistently (Pollans, 1983; Spielberger, 1988; Spielberger et al., 1985).

Evidence of the concurrent and discriminant validity of the AX subscales is reflected in the correlations of these scales with other anger and personality measures (Spielberger, 1988). Moderately high correlations of AX/Out scores with T-Anger and T-Anger/T scores, and smaller correlations of both AX/Out and AX/In scores with T-Anger/R scores suggest that individuals who have angry temperaments are more likely to express their anger outwardly than suppress it, whereas those individuals who frequently experience anger when they are frustrated or treated unfairly are equally likely to suppress or outwardly express their anger. Small, but highly significant correlations of the AX/In and AX/Out subscales with the STPI T-Anxiety Scale suggest that individuals who suppress or express anger more often are also likely to experience anxiety more frequently than individuals with low anger expression scores. Correlations of all three anger expression measures with the STPI T-Curiosity subscale were essentially zero, providing evidence of discriminant validity. A major reason for constructing the AX Scale was to develop an instrument that would facilitate the investigation of how various components of anger contribute to the etiology of hypertension and coronary heart disease. As previously noted, Harburg et al. (1973, 1979)

and Gentry et al. (1981, 1982) reported that individuals who tend to suppress anger have a

*

13.

STA! AND STAXI

309

higher systolic and diastolic blood pressure, and Williams et al. (1980) found that patients with high scores on the MMPI HO Scale were more likely to develop coronary artery disease. Similarly, Dembroski, MacDougall, Williams, and Haney (1985) found that high ratings of potential for hostility and anger-in were associated positively with angiographically documented severity of coronary atherosclerosis. Johnson (1984) administered the AX Scale to 1,114 high school students in an investiga-

tion of the relationship between anger expression and blood pressure (BP). Measures of systolic (SBP) and diastolic (DBP) blood pressure were obtained during the same class period in which these students responded to the psychological tests. The correlations of AX/In scores with SBP and DBP were positive, curvilinear, and highly significant for both genders. There was no relation between suppressed anger and BP over 60% to 80% of the range of AX/In scores, but students with very high AX/In scores had much higher BP. Because the correlations of AX/Out scores with the BP measures were quite small, the overall pattern of correlations indicates that higher blood pressure is associated with holding anger in. Johnson (1984) also examined the influence of a number of variables that have been found to be related to BP in previous research. Height, weight, dietary factors (salt intake), racial differences, and family history of hypertension and cardiovascular disorders correlated significantly with BP, but even after partialing out the influence of these variables, the AX/In scores still were associated positively and significantly with elevated SBP and DBP. Indeed, in separate multiple regression analyses for males and females, AX/In scores were found to be better predictors of blood pressure than any other measure (i.e., the AX/In scores were first to enter step-wise multiple discriminant equations for both genders).

THE STATE-TRAIT ANGER EXPRESSION INVENTORY The STAS and the AX Scale recently were combined to form the State-Trait Anger Expression Inventory (STAXI), which provides relatively brief, objectively scored measures of the experience, expression, and control of anger (Spielberger, 1988). The STAXI consists of 44

items, which form five primary scales and two subscales. The components of anger that are assessed by each STAXI scale are described in Table 13.1. Fuqua et al. (1991) recently administered the STAXI to a large sample of college students and factor analyzed their responses to the 44 individual items. The results of this analysis led these investigators to conclude: “. . . that seven factors provided the best fit of the data to the instrument and its theoretical foundations” (1991, p. 442). Four of the factors extracted by

Fuqua et al. corresponded almost exactly to four of the five primary STAXI scales; the items from the STAXI T-Anger Scale loaded on two separate factors that corresponded exactly to the T-Anger Temperament and Reaction subscales. The first six factors identified by Fuqua et al. (1991) in the order that they emerged, were:

S-Anger, Anger/Con, Anger/In, Anger/Out, T-Anger/T, and T-Anger/R. Almost all of the items in the corresponding STAXI scales had salient loadings on the appropriate factor and

negligible loadings on the other factors. Thus, six of the seven factors identified by Fuqua et al. (1991) corresponded with the components of anger measured by the STAXI scales. These

findings provide strong confirmation from the factor structure of the STAXI that the subscales of the inventory measure meaningful, relatively independent components of the experience, expression, and control of anger. The seventh factor identified by Fuqua et al. (1991) was defined by secondary, but salient

loadings for 3 of the 10 STAXI S-Anger items (Feel like . . . breaking things, . . . banging

310

SPIELBERGER AND SYDEMAN TABLE 13.1 Definitions of the Components of Anger Assessed by the Subscales of the State-Trait Anger Expression Inventory!

Scale

Anger Component Measured by Each STAXI Scale

S-Anger 10 items

An emotional state marked by subjective feelings that vary in intensity, from mild annoyance or irritation to intense fury and rage, accompanied by activation of the autonomic nervous system. The intensity of S-Anger varies as a function of perceived injustice, being attacked or treated unfairly by others, or frustration resulting from barriers to goal-directed behavior.

T-Anger 10 items

Individual differences in anger proneness, that is, the tendency to perceive a wide range of situations as annoying or frustrating, and to respond with elevations in S-Anger. High T-Anger individual’s experience S-Anger more often and with greater intensity than persons low in T-Anger. T-Anger/T (4 items): Individual differences in a general disposition to experience anger with little or no specific provocation. T-Anger/R (4 items): Individual differences in the disposition to feel angry when criticized or treated unfairly.

AX/In: 8 items

Individual differences in the frequency that angry feelings are experienced, but held in or suppressed.

AX/Out: 8 items

Individual differences in the frequency that feelings of anger are expressed in aggressive behavior directed toward other people or objects in the environment.

AX/Con: 8 items

Individual differences in the frequency that an individual attempts to control the outward expression of angry feelings.

AX/Ex: 24 items

This measure provides a general index of the frequency that anger is experienced and expressed, irrespective of the direction of expression.

1 Adapted from the Professional Manual for the State-Trait Anger Expression Inventory: Revised Research Edition(Spielberger, 1988, p. 1), with the permission of Psychological Assessment Resources, Inc. (PAR).

on the table, . . . hitting someone). Although these items all had higher loadings on the original S-Anger factor, the findings of Fuqua et al. nevertheless suggest that there may be a second S-Anger factor. The content of the three items with salient loadings on this factor seem to reflect high levels of S-Anger that may provide strong instigation to the expression of anger in aggressive behavior. van der Ploeg (1988) administered a Dutch adaptation of the 20-item State-Trait Anger Scale (STAS) to male military draftees in The Netherlands. In separate analyses of the 10 S-Anger and 10 T-Anger items that comprise the STAS, two T-Anger and two S-Anger factors were found. van der Ploeg’s two T-Anger factors were essentially the same as the STAS T-Anger Temperament and Reaction factors that have been reported consistently in studies of American subjects (Spielberger, 1988); his two S-Anger factors were quite similar to those reported by Fuqua et al. (1991). Thus, there appear to be two meaningful facets of state anger, but further research is required to clarify the nature of these S-Anger components.

GUIDELINES FOR INTERPRETING SCORES ON THE STAXI The STAXI has proved useful for assessing the experience, expression, and control of anger in normal and abnormal individuals (Deffenbacher,

1992; Moses,

1992), and for evaluating %

4

13

STAI AND STAXI

the role of these anger components in a variety of disorders, including alcoholism, hypertension, coronary heart disease, and cancer (Spielberger, 1988). Comparing STAXI test scores with appropriate scale norms is an important step in test interpretation. Norms for the STAXI scales are reported in the test manual for male and female high school and college students and working adults (Spielberger, 1988). In addition, there are norms for the following special interest groups: general medical and surgical patients, prison inmates, and military recruits. The distributions of scores on the S-Anger and T-Anger/T Scales are positively skewed, which prevents these scales from effectively discriminating among respondents with low scores. However, low scores on the other STAXI scales may provide useful information that contributes to understanding the personality dynamics of an individual with such scores. Individuals who score below the 25th percentile on the T-Anger, AX/Out, and AX/In Scales generally experience, express, or suppress relatively little anger. However, low scores on these scales when AX/Con scores are very high may indicate excessive use of denial and repression to protect an individual from experiencing unacceptable angry feelings. General guidelines for interpreting high scores for each of the STAXI scales are provided in Table 13.2. Percentile ranks reported in the STAXI manual corresponding to STAXI scale scores (Spielberger, 1988) indicate how a particular person compares with other individuals who are similar in age and gender. Scores between the 25th and 75th percentiles on individual STAXI scales fall in what may be considered the normal range. Although individuals with scale scores that approach the 75th percentile are more prone to experience, outwardly express, or suppress anger than those with scores below the median, such differences generally are not sufficient to detect persons whose anger problems may predispose them to develop physical or psychological disorders (Spielberger, 1988). Individuals with anger scores above the 75th percentile are likely to experience and/or express angry feelings to a degree that may interfere with optimal functioning. The anger of these individuals may contribute to difficulties in interpersonal relationships or dispose them to develop psychological or physical disorders. High AX/In scores, especially when associated with low AX/Out scores and high levels of anxiety, have been found to be associated with elevated blood pressure (Johnson, 1984). Very high scores on both the AX/In and AX/Out Scales (above the 90th percentile) may place an individual at risk for coronary artery disease and heart attacks. The STAS and the AX Scales have been used extensively in research on the relationship between anger and health (Brooks, Walfish, Stenmark, & Canger, 1981; Cavanaugh, Kanon-

choff, & Bartels, 1987; Johnson & Broman, 1987; Johnson-Saylor, 1984; Schlosser, 1986; Vitaliano, 1984; Vitaliano et al., 1986). With the development of the improved STAXI measures to assess the experience and expression of anger, suppressed anger has been identified consistently as an important factor in elevated BP and hypertension (Crane, 1981; Deshields, 1986; Gorkin, Appel, Holroyd, Saab, & Stauder, 1986; Hartfield, 1985; Johnson, 1985; Johnson, Spielberger, Worden, & Jacobs, 1987; Kearns, 1985; Schneider, Egan, & Johnson, 1986; Spielberger et al., 1985, 1988; van der Ploeg, van Buuren, & van Brummelen,

1988).

McMillan

(1984) used the STAXI

scales to assess the anger experienced by patients

undergoing treatment for Hodgkins disease and lung cancer. The STAXI scales also have been used to examine relationships between hardiness, well-being, and coping with stress (Schlosser & Sheeley, 1985a, 1985b), and to investigate the role of anger in Type-A behavior

(Booth-Kewley & Friedman, 1987; Croyle, Jemmott, & Carpenter, 1988; Goffaux, Wallston, Heim, & Shields, 1987; Herschberger, 1985; Janisse, Edguer & Dyck, 1986; Krasner, 1986;

Spielberger et al., 1988). Kinder and his colleagues (Curtis, Kinder, Kalichman, & Spana, 1988; Kinder, Curtis, & Kalichman, 1986) used the STAXI scales in a series of studies of psychological factors that

311

312

SPIELBERGER AND SYDEMAN TABLE 13.2 Guidelines for Interpreting High STAXI Scores!

Characteristics of Persons With High Scores

Scale

S-Anger

Individuals with high scores are experiencing relatively intense angry feelings at the time the test was administered. If S-Anger is elevated relative to TAnger, the individuals’s angry feelings are likely to be determined situationally. Elevations in S-Anger are more likely to reflect chronic anger if TAnger and AX/In scores are also high.

T-Anger

High T-Anger individuals frequently experience angry feelings, especially when they feel they are treated unfairly by others. Whether persons high in T-Anger suppress, express, or control their anger can be inferred from their scores on the AX-In, AX/Out, and AX/Con Scales.

T-Anger/T

Persons with high T-Anger/T scores are quick tempered and readily express their anger with little provocation. Such individuals are often impulsive and lacking in anger control. High T-Anger/T individuals who have high AX/Con scores may be strongly authoritarian and use anger to intimidate others.

T-Anger/R

Persons with high T-Anger/R scores are highly sensitive to criticism, perceived affronts, and negative evaluation by others. They frequently experience intense feelings of anger under such circumstances.

AX/In

Persons with high AX/In scores frequently experience intense angry feelings, but tend to suppress these feelings rather than express them in either physical or verbal behavior. Persons with high AX/In scores who also have high AX/Out scores may express their anger in some situations, whereas suppressing it in others.

AXx/Out

Persons with high AX/Out scores frequently experience anger, which they express in aggressive behavior. Anger-out may be expressed in physical acts, such as assaulting other persons or slamming doors; or verbally, in the form of criticism, sarcasm, insults, threats, and extreme use of profanity.

AX/Con

Persons with high scores on the AX/Con Scale tend to invest a great deal of energy in monitoring and preventing the expression of anger. Although controlling anger is certainly desirable, the overcontrol of anger may result in passivity and withdrawal. Persons with high AS/Con and high T-Anger scores also may experience anxiety and depression

1 Adapted from Table 4 of the Professional Manual for the State-Trait Anger Expression Inventory: Revised Research Edition (Spielberger, 1988, p. 5), with the permission of Psychological Assessment Resources, Inc. (PAR).

contribute to chronic pain, and Stoner (1988) investigated the effects of marijuana use on the experience and expression of anger. The STAXI scales also have been used in research on the effects of situational factors on the experience and expression of anger (Aragona, 1983; Bromet & Leonard,

1987; Buck,

1987; Pape, 1986).

Assessment of Emotions in Treatment Planning The DSM-IIIR provides criteria for diagnosing anxiety disorders (American Psychiatric Association, 1987), but no such attention has been given to the classification of problems with anger (Deffenbacher, 1992). Nevertheless, the assessment of both anger and anxiety is essential in planning an effective treatment program, and in evaluating therelative efficacy of different forms of behavioral and pharmacological interventions. Because the management of 4

*

13

STAI AND STAXI

313

anxiety and anger during treatment is among the chief concerns of most psychotherapists and counselors,

the valid assessment

of these emotions

can

facilitate the treatment

process

(Deffenbacher, Demm, & Brandon, 1986; Spielberger et al., 1985). Consequently, obtaining reliable and valid measures of state and trait anxiety, and carefully assessing the experience, expression, and control of anger, are essential in selecting an optimal form of treatment,

monitoring the treatment process, and evaluating treatment outcome.

ASSESSING ANXIETY IN TREATMENT PLANNING AND EVALUATION Symptoms of anxiety typically are found in almost all emotional disorders. From a psychoanalytic perspective, Freud (1936) regarded anxiety as the “fundamental phenomenon and the central problem of neurosis” (p. 85), as was noted previously. According to de la Torre (1979), dealing with transitory anxiety (S-Anxiety) is also a major priority in all forms of short-term psychotherapy, including crisis intervention and dynamic treatments that focus on specific problems of the patient or client, such as test anxiety. Diverse manifestations of anxiety in various physical and psychological disorders generally require different forms of treatment, as de la Torre (1979) noted: The ubiquitousness of anxiety among psychiatric patients demands a careful assessment and diagnosis. The transitory anxiety in a well-compensated individual differs considerably from the intense anxiety that heralds psychotic decompensation. Both situations require different kinds of interventions and will have different prognostic outcomes. (p. 379) The STAI has been used to assess state and trait anxiety in more than 6,000 investigations,

including psychological and pharmacological treatment studies of psychiatric, psychosomatic, and medical patients (Spielberger, 1989). The assessment of anxiety as a personality trait (T-Anxiety) is especially important in evaluating treatment outcomes in phobias (Foa & Kozak, 1985), and in panic and generalized anxiety disorders (Barlow, 1985). Careful assessment of anxiety is also essential in applications of systematic desensitization to the treatment of phobic patients, and in clients with conditioned aversion reactions (Suinn & Deffenbacher,

1988). The STAI also has been used extensively in test anxiety treatment studies. Test anxious individuals manifest high levels of S-Anxiety during examinations, which contributes to impaired test performance (Spielberger, Anton, & Bedell, 1976). It has been demonstrated

that systematic tions, and even tions. However, test anxiety and

desensitization, rational—emotive therapy, cognitive—behavioral intervenrelaxation training are all successful in reducing S-Anxiety in testing situacognitive treatment strategies appear to be more effective for reducing both T-Anxiety levels in test anxious students (Spielberger et al., 1976).

ASSESSING ANGER IN TREATMENT PLANNING AND EVALUATION Deffenbacher (1992) reported research findings from a series of studies that have important implications for clinical assessment and treatment. In these studies, high T-Anger subjects

experienced heightened S-Anger and physiological arousal in ongoing situations on a daily

basis, which could be targeted for behavioral treatment such as relaxation training and coping skills programs (Deffenbacher et al., 1986; Deffenbacher & Stark, 1990; Hazaleus & Deffen-

314

SPIELBERGER AND SYDEMAN

bacher, 1986). By helping clients learn to lower anger by engaging in self-initiated re _,ation exercises, successful treatment would free tiem to use more effective probleiu-solving and social skills that were previously disrupted oy unpieasant and distracting physiological arousal associated with heightened states of anger. Deffenbacher’s (1992) consistent finding that high T-Anger individuals experience anger across a wide range of ongoing daily situations has important implications for clinical treatment. His research suggested that emotional states of anger can be conceptualized as a complex cognitive—psychophysiological phenomenon embedded in a specific situational context. Effective treatment requires that all aspects of this phenomenon be assessed carefully, along with the behaviors triggered by or associated with anger. Deffenbacher recommended that a number of different measurement strategies be used in assessing anger, such as interviewing, role plays, and self-monitoring so that the range of real and potential sources of anger may be mapped. He further suggested that, in the later stages of therapy, it may be appropriate to use self-monitoring measures of S-Anger, along with role-play simulations to provide opportunities for assessment, rehearsal, and transfer of skills and insights. The observed tendency for high T-Anger individuals to suppress anger and/or express it in less-controlled, socially desirable ways requires careful clinical assessment in treatment programs. As previously noted, Deffenbacher (1992) found that high T-Anger individuals reported strong tendencies toward verbal and physical antagonism and less constructive behavior, which suggested that these individuals are generally more abrupt, abrasive, and intimidating. The verbal and nonverbal cues associated with such behavior may elicit anger in others, leading them to withdraw or counterattack—the latter response is likely to stimulate further anger and aggression in the high T-Anger individual. Effective treatment will require raising the high T-Anger person’s awareness of this vicious cycle, and then training him or her to control the tendency to counterattack. Assessment of when, where, and why clients employ different anger expression strategies not only will contribute to clarifying the nature of anger and its expression, but also will help identify adaptive strategies that can be used effectively in angering situations. High T-Anger individuals seem to interpret many situations as insulting and frustrating (Beck, 1976) and maladaptive anger is related to serious personality problems, including difficulties in interpersonal relationships and many health-related disorders (Hazaleus & Deffenbacher, 1985; Hogg & Deffenbacher, 1986; Story & Deffenbacher, 1985; Zwemer & Deffenbacher, 1984).

Therefore, effective strategies for controlling anger are urgently needed in treatment planning (Deffenbacher,

1992).

Effective treatment of anger-related problems requires detailed knowledge concerning an individual’s experience of both state and trait anger and modes of anger expression (Sharkin, 1988). Careful assessment of the experience, expression, and control of anger is not only essential for understanding problems that are rooted in anger, but assessment is also a necessary first step in treatment planning. Because of the multidimensionality of anger, multifaceted interventions are likely to be required to produce beneficial treatment outcomes (Deffenbacher,

1991; Novaco,

1979).

According to Deffenbacher (1992), therapeutic strategies for dealing with anger and anxiety should include psychodynamic, self-explorative, behavioral, and cognitive interven-

tions to help patients perceive the world as less threatening. If successful, such interventions will help patients feel less vulnerable, thereby reducing personal frustration and decreasing the intensity and frequency of angry reactions. Research evidence indicates that relaxation exercises, social skills training, and cognitive—behavioral interventions have proved effective in decreasing levels of anxiety and anger (Deffenbacher et al., 1986; Deffenbacher, Story, Stark, Hogg, & Brandon, 1987).

13

STA! AND STAXI

315

- Research with the STAI and, more recently, with the STAXI and its subscales provides

encouraging evidence of the utility of these inventories in treatment planning, and in the evaluation of treatment process and outcome. In a recent comprehensive evaluation and critique, Moses (1992) concluded that the STAXI is a “specific, sensitive, psychometric instrument,” and that: If future applications of the STAXI are as experimentally rigorous as the development of this measure, there is great potential for its use to significantly further our understanding of important stress-based and stress-influenced syndromes and to help in identifying effective means by which such disorders may be reversed and prevented. (1992, p. 524)

Summary Recent advances in the conceptualization of anxiety and anger have stimulated the develop-

ment of improved instruments for the measurement of these emotions. Early theories of anxiety; the concepts of state and trait anxiety; and conceptual ambiguity and confusion in current theoretical interpretations of anger, hostility, and aggression were examined. A number of techniques and procedures that have been developed to assess anxiety were discussed, and the construction and validation of a psychometric inventory designed to assess state and trait anxiety was reviewed. The research literature on the expression of anger was examined, and the procedures employed in developing and validating a new psychometric instrument for measuring the experience, expression, and control of anger were described in detail. The chapter concludes with a discussion of issues concerning the utilization of measures of anxiety and anger in treatment planning, and in the evaluation of therapeutic interventions with individuals experiencing anger-related problems.

References Alexander, F. G. (1939). Emotional factors in essential hypertension: Presentation of a tentative hypothesis. Psychosomatic Medicine, 1,

175-179. Alexander, F. G. (1948). Emotional factors in hypertension. In F. Alexander & T. M. French (Eds.), Studies in psychosomatic medicine: An approach to the cause and treatment of vegetative disturbances (pp. 289-97). New York:

Ronald. Alexander, F. G., & French, T. M. (Eds.). (1948). Studies in psychosomatic medicine: An approach to the cause and treatment of vegetative disturbances. New York: Ronald. American Psychiatric Association. (1987). Diagnostic and statistical manual (DSM-III-R) (3rd ed., rev.). Washington, DC: American Psychiatric Association.

Aragona, J. C. (1983). Physical child abuse: An interactional analysis (Doctoral dissertation, University of South Florida, Tampa, 1983). Dissertation Abstracts International, 44, 1225B. Arnold, M. B. (1960). Emotion and personality: Volume I. Psychological aspects. New York: Columbia University Press. Atkinson, J. (1964). An introduction to motivation. Princeton: Van Nostrand-Reinhold. Averill, J. R. (1982). Anger and aggression: An essay on emotion. New York: Springer-Verlag. Barker, B. M., Barker, H. R., & Wadsworth, A. P., Jr. (1977). Factor analysis of the items

of the State-Trait Anxiety Inventory. Journal of Clinical Psychology, 33, 450-455. Barker, L. R. (1979). Personality variables as determinants of performance problems of re-

316

SPIELBERGER AND SYDEMAN cruits in the U.S. armed forces. Unpublished master’s thesis, University of South Florida, Tampa. Barlow, D. H. (1985). The dimensions of anxiety disorders. In A. H. Tuma & J. D. Maser (Eds.), Anxiety and the anxiety disorders (pp. 479-500). Hillsdale, NJ: Lawrence Erlbaum Associates. Beck, A. (1976). Cognitive therapy and the emotional disorders. New York: International Universities Press. Bendig, A. W. (1962). Factor analytic scales of covert and overt hostility. Journal of Consulting Psychology, 26, 200. Biaggio, M. K. (1980). Assessment of anger arousal. Journal of Personality Assessment, 44, 289-298. Biaggio, M. K., & Maiuro,

R. D.

(1985).

Re-

cent advances in anger assessment. In C. D. Spielberger & J. N. Butcher (Eds.), Advances

in personality assessment (Vol. 5, pp. 71111). Hillsdale, NJ: Lawrence Erlbaum Associates. Biaggio, M. K., Supplee, K., & Curtis, N. (1981). Reliability and validity of four anger scales. Journal of Personality Assessment, 45, 639-648. Booth-Kewley,

S., & Friedman,

H.

S. (1987).

Psychological predictors of heart disease: A quantitative review. Psychological Bulletin, 101, 343-362. Borkovec, T. D., Weerts, T. C., & Bernstein, D. A. (1977). Assessment of anxiety. In A. R. Ciminero, K. S. Calhoun, & H. E. Adams (Eds.), Handbook of behavioral assessment (pp. 367—428). New York: Wiley. Bromet, E., & Leonard, K. (1987). Psychological effects of lead exposure. Unpublished manuscript, University of Pittsburgh, Western Psychiatric Institute and Clinic. Brooks, M. L., Walfish, S., Stenmark, D. E., & Canger, J. M. (1981). Personality variables in alcohol abuse in college students. Journal of Drug Education, 11, 185-189. Buck, D. K. (1987). The impact of parental divorce on levels of anger and anxiety in young adults in comparison to young adults whose parents are not divorced. Unpublished master’s thesis, Ohio University, Athens. Buss, A. H. (1961). The psychology of aggression. New York: Wiley. Buss, A. H., & Durkee, A. (1957). An inventory for assessing different kinds of hostility.

Journal of Consulting Psychology, 21, 343349. Caine, T. M., Foulds, G. A., & Hope, K. (1967). Manual of the hostility and direction of hostility questionnaire (HDHQ). London: University of London Press. Campbell, D. T. (1963). Social attitudes and other acquired behavioral dispositions. In S. Koch (Ed.), Psychology: A study of a science (Vol. 6, pp. 94-172). New York: McGrawHill. Cattell, R. B. (1966). Patterns of change: Measurement in relation to state-dimension, trait change, lability, and process concepts. In R. B. Cattell (Ed.) Handbook of multivariate experimental psychology (pp. 355—408). Chicago: Rand McNally. Cattell, R. B., & Scheier, I. H. (1958). The nature of anxiety: A review of thirteen multivariate analyses comprising 814 variables. Psychological Reports, 4, 351. Cattell, R. B., & Scheier, I. H. (1961). The meaning and measurement of neuroticism and anxiety (pp. 57, 182). New York: Ronald. Cattell, R. B., & Scheier, I. H. (1963). Hanabook for the IPAT Anxiety Scale (2nd ed.). Champaign, IL: Institute for Personality and Ability Testing. Cavanaugh, D. J., Kanonchoff, A. D., & Bartels, R. L. (1987). Menstrual irregularities in athletic women may be predictable based on pretraining menses. Unpublished manuscript, Ohio State University, Department of Work Psychology, Columbus. Cook, W. W., & Medley, D. M. (1954). Proposed hostility and pharisaic-virtue scales for the MMPI. The Journal of Applied Psychology, 38, 414-418. Crane, R. S. (1981). The role of anger, hostility, and aggression in essential hypertension (Doctoral dissertation, University of South Florida, Tampa, FL, 1981). Dissertation Abstracts In-

ternational, 42, 2982B. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-335. Croyle, R. T., Jemmott, J. B. Ill, & Carpenter, B. D. (1988). Relations between four individual difference measures associated with cardiovascular dysfunction and anger coping style. Psychological Reports, 63, 779-786. Curtis, G., Kinder, B., Kalichman, S., & Spana, R. (1988). Affective differences among sub-

13 groups of chronic pain patients. Anxiety Research: An International Journal, 1, 65-73. Darwin, C. (1965). The expression of emotions in man and animals. Chicago: University of Chicago Press. (Original work published in

1872). Deffenbacher, J. L. (1991, July). Cognitivebehavioral approaches to general anger reduction. Proceedings of the International Congress on Stress, Anxiety, and the Emotional

Disorders, Braga, Portugal. Deffenbacher, J. L. (1992). Trait anger: Theory, findings, and implications. In C. D. Spielberger & J. N. Butcher (Eds ), Advances in personality assessment (Vol. 9, pp. 177-201). Hillsdale, NJ: Lawrence ‘Erlbaum Associ-

ates. Deffenbacher, J. L., Demm, P. M., & Brandon, A. D. (1986). High general anger: Correlates and treatment. Behavioral Research and Therapy, 24, 480-489. Deffenbacher, J. L., & Stark, R. S. (1990). Relaxation and cognitive-relaxation treatments of general anger. Manuscript submitted for publication, Department of Psychology, Colorado State University, Fort Collins. Deffenbacher, J. L., Story, D. A., Stark, R. S., Hogg, J. A., & Brandon, A. D. (1987). Cognitive-relaxation and social skills interventions in the treatment of general anger. Journal of Counseling Psychology, 34, 171-176. de la Torre, J. (1979). Anxiety states and shortterm psychotherapy. In W. E. Fann, I. Karacan, A. D. Polorny, & R. L. Williams (Eds.), Phenomenology and treatment of anxiety (pp. 377-388). Jamaica, NY: Spectrum. Dembroski, T. M., MacDougall, J. M., Williams, R. B., & Haney, T. L. (1985). Components of Type A, hostility, and anger-in: Relationship to angiographic findings. Psychosomatic Medicine, 47, 219-233. Deshields, T. L. (1986). Anger and assertiveness in essential hypertension. Dissertations Abstracts International, 46, 3212B. (University Microfilms No. 85-24, 330.) Duffy, F. (1941). An explanation of “emotional” phenomena without the use of the concept “emotion.” Journal of General Psychology,

25, 283-293. Evans, D. R., & Stangeland, M. (1971). Development of the reaction inventory to measure anger. Psychological Reports, 29, 412-414. Eysenck, H. J., & Eysenck, S. B. G. (1975).

STAI AND STAXI

Manual of the Eysenck Personality Questionnaire. London: Hodder and Stroughton. Foa, E. B., & Kozak,

M. J. (1985).

Treatment

of anxiety disorders: Implications for psychopathology. In A. H. Tuma & J. D. Maser (Eds.), Anxiety and the anxiety disorders (pp. 421-452). Hillsdale, NJ: Lawrence Erlbaum Associates. Freud, S. (1924). Collected papers (Vol. 1). London: Hogarth. Freud, S. (1936). The problem of anxiety. New York: W. W. Norton. Funkenstein, D. H., King, S. H., & Drolette, M.E. (1954). The direction of anger during a laboratory stress-inducing situation. Psychosomatic Medicine, Fuqua,

D.

16, 404-413.

R., Leonard,

E., Masters,

M.

A.,

Smith, R. J., Campbell, J. L., & Fischer, P. C. (1991). A structural analysis of the State-Trait Anger Expression Inventory (STAXI). Educational and _ Psychological Measurement, 51, 439-446. Gaudry, E., & Poole, C. (1975). A further validation of the state-trait distinction in anxiety research. Australian Journal of Psychology, Di NI9=1253 Gaudry, E., Spielberger, C. D., & Vagg, P. R. (1975). Validation of the state-trait distinction in anxiety research. Multivariate Behavior Research, 10, 331-341.

Gentry, W. D. (1972). Biracial aggression: 1. Effect of verbal attack and sex of victim. The Journal of Social Psychology, 88, 75-82. Gentry, W. D., Chesney, A. P., Gary, H. G., Hall, R. P., & Harburg, E. (1982). Habitual anger-coping styles: I. Effect on mean blood pressure and risk for essential hypertension. Psychosomatic Medicine, 44, 195-202. Gentry, W. D., Chesney, A. P., Hall, R. P., & Harburg, E. (1981). Effect of habitual angercoping pattern on blood pressure in black/ white, high/low stress area respondents. Psychosomatic Medicine,

43, 88.

Goffaux, J., Wallston, B. S., Heim, C. R., & Shields, S. L. (1987, March). Type A behaviors, hostility, anger and exercise adherence.

Paper presented at the 8th Annual Session of the Society of Behavioral Medicine, Washington, DC. Gorkin, L., Appel, M., Holroyd, K. A., Saab, P. G., & Stauder, L. (1986). Anger management style and family history status as risk factors for essential hypertension. Un-

317

318

SPIELBERGER AND SYDEMAN published manuscript, Ohio University, Athens. Hamilton, M. (1959). The assessment of anxiety states by rating. British Journal of Medical Psychology, 32, 50. Harburg,

E., Blakelock,

E. H., & Roeper, P. J.

(1979). Resentful and reflective coping with arbitrary authority and blood pressure: Detroit. Psychosomatic Medicine,

3, 189-202.

Harburg, E., Erfurt, J. C., Hauenstein, L. S., Chape, C., Schull, W. J., & Schork, M. A. (1973). Socio-ecological stress, suppressed hostility,

skin

color,

and

black-white

male

blood pressure: Detroit. Psychosomatic Medicine, 35, 276-296. Harburg, E., & Hauenstein, L. (1980). Parity and blood pressure among four race-stress groups of females in Detroit. American Journal of Epidemiology, 111, 356-366. Harburg,

E.,

Schull,

W.

J., Erfurt,

J. C.,

&

Schork, M. A. (1970). A family set method for estimating heredity and stress-I. Journal of Chronic Disease, 23, 69-81.

Hartfield, M. T. (1985). Appraisals of anger situations and subsequent coping response in hypertensive and normotensive adults: A comparison (Doctoral

dissertation,

University

of

California, 1985). Dissertation Abstracts International, 46, 4452B. Hazaleus, S. L., & Deffenbacher, J. L. (1985). Irrational beliefs and anger arousal. Journal of College Student Personnel, 26, 47—52.

Hazaleus, S. L., & Deffenbacher, J. L. (1986). Relaxation and cognitive treatments of anger. Journal of Consulting and Clinical Psychology, 54, 222-226. Herschberger, P. (1985). Type A behavior in non-intensive and intensive care nurses. Unpublished master’s thesis, University of South Florida, Tampa. Hodges, W. F. (1976). The psychophysiology of anxiety. In M. Zuckerman & C. D. Spielberger

(Eds.),

Emotions

and

anxiety:

New

concepts, methods, and applications (pp. 175-194). Hillsdale, NJ: Lawrence Erlbaum Associates. Hogg, J. A., & Deffenbacher, J. L.(1986). Irrational beliefs, depression and anger in college students. Journal of College Student Personnel, 27, 349-353. Jacobs, G. A., Latham, L. E., & Brown, M. S. (1988). Test—retest reliability of the StateTrait Personality Inventory and the Anger Ex-

pression

Scale.

Anxiety Research,

1, 263-

265. Janisse, M. P., Edguer, N., & Dyck, D. G. (1986). Type A behavior, anger expression, and reactions to anger imagery. Motivation and Emotion, 10, 371-385. Johnson, E. H. (1984). Anger and anxiety as determinants of elevated blood pressure in adolescents. Unpublished doctoral dissertation, University of South Florida, Tampa. Johnson, E. H., & Broman, C. L. (1987). The relationship of anger expression to health problems among Black Americans in a national survey. Journal of Behavioral Medicine, 10,

103-169. Johnson, E. H., Spielberger, C., Worden, T., & Jacobs, G. (1987). Emotional and familial determinants of elevated blood pressure in black and white adolescent males. Journal of Psychosomatic Research, 31, 287-300. Johnson-Saylor, M._ T. (1984). Relationships among anger expression, hostility, hardiness, social support, and health risk. Unpublished doctoral dissertation, University of Michigan, Ann Arbor. Kearns, W. D. (1985). A laboratory study of the relationship of mode of anger expression to blood pressure. Unpublished master’s thesis, University of South Florida, Tampa. Kendall,

P. C., Finch, A. J., Jr., Auerbach,

S.

M., Hooke, J. F., & Mikulka, P. J. (1976). The State-Trait Anxiety Inventory: A systematic evaluation. Journal of Consulting and Clinical Psychology, 44, 406—412. Kinder, B., Curtis, G., & Kalichman, S. (1986). Anxiety and anger as predictors of MMPI elevations in chronic pain patients. Journal of Personality Assessment, 50, 651-661.

Knight, R. G., & Chisholm, B. J., Paulin, J. M., & Waal-Manning, H. J. (1988). The Spielberger Anger Expression Scale: Some psychometric data. Journal of Clinical Psychology,

27, 279-281. Krasner, S.S. (1986). Anger, anger control, and the coronary prone behavior pattern. Unpublished master’s thesis, University of South Florida, Tampa. Lader, M. (1975). Psychophysiological parameters and methods. In L. Levi (Ed.), Emotions: Their parameters and measurement (pp. 341— 367). New York: Raven. Lazarus, R. S., Deese, J., & Osler, S. F. (1952). The effects of psychological stress upon per-

13 formance.

Psychological Bulletin, 49, 293-

She Lazarus, R. S., & Folkman, S. (1984). Stress, appraisal, and coping. New York: Springer. Lazarus, R. S., & Opton, E. M., Jr. (1966). The study of psychological stress. In C. D. Spielberger (Ed.), Anxiety and behavior (pp. 225262). New York: Academic Press. Levitt, E. E. (1980). The psychology of anxiety (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Martin, I. (1973). Somatic reactivity: Methodology. In H. J. Eysenck (Ed.), Handbook of abnormal psychology (2nd ed., pp. 417—456). San Diego, CA: Knapp. McMillian, S.C. (1984). A comparison of levels of anxiety and anger experienced by 2 groups of cancer patients during therapy for Hodgkin’s disease and small cell lung cancer. Unpublished master’s thesis, University of South Florida, Tampa. McReynolds, P. (1968). The assessment of anxiety: A survey of available techniques. In P. McReynolds (Ed.), Advances in psychological assessment (Vol. 1, pp. 244—264). Palo Alto, CA: Science and Behavior Books. Moses, J. A. (1992). State-Trait Anger Expression Inventory, research edition. In D. J. Keyser & R. C. Sweetland (Eds.), Test critiques (Vol. 9, pp. 510-525). Austin, TX: PRO-ED. Novaco,

R. W.

(1975).

Anger control: The de-

velopment and evaluation of an experimental treatment. Lexington, MA: Lexington Books/ D> @y Heath. Novaco, R. W. (1979). The cognitive regulation of anger and stress. In P. C, Kendall & S. D. Hollon (Eds.), Cognitive behavioral interventions, theory, research, and procedures (pp. 241-285). New York: Academic Press. Pape, N. E. (1986). Emotional reactions and anger coping strategies of anger suppressors and expression (Doctoral dissertation, University of South Florida, Tampa, 1986). Dissertation Abstracts International, 47, 2627B.

Plutchik, R. (1962). The emotions. New York: Random House. Pollans, C. H. (1983). The psychometric properties and factor structure of the Anger Expression (AX) Scale. Unpublished master’s thesis, University of South Florida, Tampa. Rosenzweig, S. (1976). Aggressive behavior and the Rosenzweig picture frustration study. Journal of Clinical Psychology, 32, 885-891.

STAI AND STAXI

Rosenzweig,S. (1978). The Rosenzweig PictureFrustration (P-F) Study basic manual and adult form supplement. St. Louise, MO: Rana. Russell, S. F. (1981). The factor structure of the Buss-Durkee hostility inventory. Unpublished master’s thesis, University of South Florida, Tampa. Schlosser, M. B. (1986, August). Anger, crying, and health among females. Paper presented at the 94th annual convention of the American Psychological Association, Washington, DC. Schlosser, M. B., & Sheeley, L. A. (1985a, August). The hardy personality: Females coping with stress. Paper presented at the 93rd annual convention of the American Psychological Association, Los Angeles, CA. Schlosser, M. B., & Sheeley, L. A. (1985b, August).Subjective well-being and the stress process. Paper presented at the 93rd annual convention of the American Psychological Association, Los Angeles, CA.

Schneider, R. H., Egan, B., & Johnson, E. H. (1986). Anger and anxiety in borderline hypertension.

Psychosomatic

Medicine,

48,

242-—

248. Schultz, S. D. (1954). A differentiation of several forms of hostility by scales empirically constructed from significant items on the MMPI. Dissertation Abstracts, 17, 717-720. Sharkin, B. S. (1988). Treatment of client anger in counseling. Journal of Counseling and Development, 66, 361-365.

Siegel, S. (1956). The relationship of hostility to authoritarianism. Journal of Abnormal and Social Psychology, 52, 368-373. Spielberger, C. D. (1966). Theory and research on anxiety. In C. D. Spielberger (Ed.), Anxiety and behavior (pp. 3-20). New York: Academic Press. Spielberger, C. D. (1972a). Anxiety as an emotional state. In C, D. Spielberger (Ed.), Anxiety: Current trends in theory and research (Vol. 1, pp. 24-49). New York: Academic Press. Spielberger, C. D. (1972b). Current trends in theory and research on anxiety. In C. D. Spielberger (Ed.), Anxiety: Current trends in theory and research (Vol. 1, pp. 3-19). New York: Academic Press. Spielberger, C. D. (1973). Manual for the StateTrait Anxiety Inventory for Children. Palo Alto, CA: Consulting Psychologists Press. Spielberger, C. D. (1976). Stress and anxiety

319

320

SPIELBERGER AND SYDEMAN and cardiovascular disease. Journal of the South Carolina Medical Association X (Suppl. 15), 72, 15-22. Spielberger, C. D. (1977). Anxiety: Theory and research. In B. B. Wolman (Ed.), Internation-

al encyclopedia of neurology, psychiatry, psychoanalysis, and psychology (pp. 81-84). New York: Human Sciences Press. Spielberger, C. D. (1979a). Understanding stress and anxiety. London: Harper & Row. Spielberger, C. D. (1979b). Preliminary manual for the State-Trait Personality Inventory (STPI). Unpublished manuscript, University of South Florida, Tampa. Spielberger, C. D. (1980). Preliminary manual for the State-Trait Anger Scale (STAS). Tampa, FL: University of South Florida, Human Resources Institute. Spielberger, C. D. (1983). Manual for the StateTrait Anxiety Inventory: STAI (Form Y). Palo Alto, CA: Consulting Psychologists Press. Spielberger, C. D. (1988). Manual for the StateTrait

Anger

Expression

Inventory

(STAX1).

Odessa, FL: Psychological Assessment Resources. Spielberger, C. D. (1989). State-Trait Anxiety Inventory: A comprehensive bibliography (2nd ed.). Palo Alto, CA: Consulting Psychologists Press. Spielberger, C. D., Anton, W. D., & Bedell, J. (1976). The nature and treatment of test anxiety. In M. Zuckerman & C. D. Spielberger (Eds.), Emotions and anxiety: New concepts, methods and applications (pp. 317— 345). New York: Lawrence Erlbaum Associates. Spielberger, C. D., & Gorsuch, R. L. (1966). The development of the State-Trait Anxiety Inventory. In C. D. Spielberger & R. L. Gorsuch, Mediating processes in verbal conditioning. Final report to the National Institutes of

Health, U.S. Public Health Service on Grants MH-7229, MH-7446, and HD-947. Spielberger, C. D., Gorsuch, R. L., & Lushene, R. D. (1970). STAI: Manual for the StateTrait Anxiety Inventory. Palo Alto, CA: Consulting Psychologists Press. Spielberger, C. D., Jacobs, G., Russell, S., & Crane, R. (1983). Assessment of anger: The State-Trait Anger Scale. In J. N. Butcher & C. D. Spielberger (Eds.), Advances in personality assessment (Vol. 2, pp. 159-187). Hillsdale, NJ: Lawrence Erlbaum Associates.

Spielberger, C. D., Johnson, E. H., Russell, S. F., Crane, R. J., Jacobs, G. A., & Worden, T. J. (1985). The experience and expression of anger: Construction and validation of an anger expression scale. In M. A. Chesney & R. H. Rosenman (Eds.), Anger and hostility in cardiovascular and behavioral disorders (pp. 5-30). New York: Hemisphere. Spielberger, C. D., Krasner, S. S., & Solomon, E. P. (1988). The experience, expression and control of anger. In M. P. Janisse (Ed.), Health psychology: Individual differences and stress (pp. 89-108). New York: Springer-Verlag. Spielberger, C. D., Vagg, P. R., Barker, L. R., Donham, G. W., & Westberry, L. G. (1980). The factor structure of the State-Trait Anxiety Inventory. In I. G. Sarason & C. D. Spielberger (Eds.), Stress and anxiety (Vol. 7, pp. 95-109). Washington, DC: Hemisphere. Stark, R. S., & Deffenbacher, J. L. (1986, April). General anger and self-concept. Paper presented at Rocky Mountain Psychological Association, Denver, Colorado.

‘Stoner, S. B. (1988). Undergraduate marijuana use and anger. Journal of Psychology, 122, 343-347. Story, D., & Deffenbacher,

J. L. (1985, April).

General anger and personality. Paper presented at Rocky Mountain Psychological Association, Tucson, Arizona. Suinn, R. M., & Deffenbacher, J. C. (1988). Anxiety management training. The Counseling Psychologist, 16, 31-49. Tavris, C. (1982). Anger, the misunderstood emotion. New York: Simon & Schuster. Taylor, J. A. (1953). A personality scale of manifest anxiety. Journal of Abnormal Social Psychology, 48, 285. Titchener, E. B. (1897). An outline of psychology. New York: Macmillan. Vagg, P. R., Spielberger, C. D., & O’Hearn, T. P., Jr. (1980). Is the State-Trait Anxiety Inventory multidimensional? Personality and Individual Differences, 1, 202-214. van der Ploeg, H. M. (1988). The factor structure of the State-Trait Anger Scale. Psychological Reports, 63, 978. van der Ploeg, H. M., van Buuren, E. T., & van Brummelen, P. (1988). The role of anger in hypertension. Psychotherapy and Psychosomatics, 43, 186-193. Vitaliano, P. P. (1984). Identification and intervention with students at high risk for distress

13 in medical school. Unpublished doctoral dissertation, University of Washington, Seattle. Vitaliano, P. P., Maiuro, R. D., Russo, J., Mitchell, E. S., Carr, J. E., & van Citters, R. L. (1986). A biopsychosocial model to explain personal sources of medical student distress. Proceedings of the 26th Annual Conference on Research in Medical Education,

26, 228-234. Welsh, G. S. (1956). Factor dimensions A and

R. InG. S. Welsh & W. G. Dahlstrom (Eds.), Basic Readings on the MMPI in psychology and medicine (pp. 264—281). Minneapolis: University of Minnesota Press. Westberry, L. G. (1980). Concurrent validation of the Trait-Anger Scale and its correlation with other personality measures. Unpublished master’s thesis, University of South Florida, Tampa. Williams, R. B., Haney, T. L., Lee, K. L., Kong,

Y., Blumenthal, J., & Whalen, R. E. (1980). Type A behavior, hostility, and coronary atherosclerosis. Psychosomatic Medicine, 42,

539-549.

STAI AND STAXI

Wundt, W. (1896). Outlines of psychology. New York: Dustav E. Stechert. Young, P. T. (1943). Emotion in man and animal. New York: Wiley. Zelin, M. L., Adler, G., & Myerson, P. G. (1972). Anger self-report: An objective questionnaire for the measurement of aggression. Journal of Consulting and Clinical Psychology, 39, 340. Zuckerman, M. (1960). Development of an Affect Adjective Check List for the measurement of anxiety. Journal of Consulting Psychology,

26, 291: Zuckerman, M., & Biase, D. V. (1962). Replication and further data on the Affect Adjective Check List measures of anxiety. Journal of Consulting Psychology, 26, 291. Zuckerman, M., & Lubin, B. (1965). Manual for the Multiple Affect Adjective Checklist. San Diego, CA: Educational and Industrial Testing Service. Zwemer,

W. A., & Deffenbacher,

J. L. (1984).

Irrational beliefs, anger and anxiety. Journal of Counseling Psychology, 31, 391-393.

321

Chapter 14 Marital Satisfaction Inventory Douglas K. Snyder Susan E. Costin Texas A&M

University

Although measures of marital and family functioning abound, most fail to satisfy standards of reliability, validity, and clinical utility (Cromwell,

Olson, & Fournier,

1976; Schumm,

1990; Snyder, 1982). The basic psychometric characteristics of popular clinical measures used in marital and family therapy are often unspecified. Therapists use pre- and postscores on measures to evaluate change without an empirical basis for attributing score differences to actual changes in respondents’ relationships versus the temporal instability of their assessment instrument. Too frequently measures are interpreted from a theoretical framework without any scientific foundation linking performance on these measures to independent observations of clients’ marital or family functioning. By contrast, research measures meeting psychometric criteria often fail to address clinical concerns of either the client or therapist. Some marital and family assessment techniques require extraordinary resources in time, equipment, or specialized training beyond those readily available to most practitioners. Other measures popular in research, because of evidence for their psychometric adequacy, fail to generate the kind of information that facilitates clinical interventions in tailoring treatment to specific needs of the couple or family. The Marital Satisfaction Inventory (MSI; Snyder, 1979a, 1981, in press) was developed to address both psychometric and clinical concerns in evaluating distressed couples’ relationships. The MSI is a multidimensional, self-report measure of marital interaction, with over

15 years of empirical and clinical study supporting its reliability, validity, and utility in assessing and treating distressed relationships. The inventory permits comparisons of husbands’ and wives’ independent evaluations of their marriage across 11 profile scales, in addition to comparison of these evaluations to profiles typical of couples in marital therapy or from the general population. The MSI facilitates the delineation and prioritizing of spouses’ concerns in the development of initial therapeutic goals. It can be used throughout therapy and at termination in the evaluation of changes in couples’ relationships or planning additional interventions. This chapter begins with an overview of the MSI, emphasizing scale structure and compo-

sition, administration and scoring, and empirical foundations of reliability and validity. Following this introduction,

322

attention shifts to clinical use of the MSI in assessment

ms

and

14

MARITAL SATISFACTION INVENTORY

323

treatment of distressed relationships. A basic interpretive strategy for configural analysis integrating both spouses’ profiles is presented, along with recommendations for incorporating assessment data into different phases of the marital therapy. Clinical use of the computerbased interpretive narrative for the MSI is also emphasized. Third, this chapter addresses use of the MSI as a measure for evaluating treatment outcome. Recent findings from a National Institute of Mental Health (NIMH)-sponsored marital therapy outcome study, in which the MSI was a central assessment technique, highlight pre- to posttreatment profile changes typical of couples in treatment and the extent to which the MSI facilitates prediction of couples’ short- and long-term responses to therapy. The chapter concludes with the presentation of a clinical case highlighting the manner in which the MSI can be incorporated into the initial design and subsequent evaluation of treatment interventions.

- Overview of the Marital Satisfaction Inventory SCALE DEVELOPMENT, STRUCTURE, AND COMPOSITION The MSI is a 280-item true—false inventory including one validity scale, one global satisfaction scale, and nine additional scales assessing specific areas of marital interaction. The MSI is administered to individual spouses separately, and requires approximately 30 minutes to complete. Individuals’ responses are scored along the 11 profile scales and are plotted on a standard profile sheet employing gender-specific norms. Profile scale content, abbreviations, and sample items are as follows.! Conventionalization

(CNV).

This validity scale assesses individuals’ tendencies to dis-

tort the appraisals of their marriages in a socially desirable direction. Items reflect denial of minor, commonly occurring marital difficulties, and efforts to describe the relationship in an unrealistically positive manner. (“There is never a moment that I do not feel “head over heels’ in love with my mate”; “My mate completely understands and sympathizes with my every mood.”’)

Global Distress

(GDS).

tion with their marriages.

This global measure assesses respondents’ overall dissatisfacItems

reflect general marital

discontent,

chronic

disharmony,

desire for marital therapy, and consideration of separation or divorce. (“My marriage has been disappointing in several ways”; “At times I have very much wanted to leave my spouse.”’) Affective

Communication (AFC).

This scale focuses on the process of verbal and nonver-

bal communication and is the best single index of the affective quality of couples’ relationships. Items reflect spouses’ dissatisfaction with the amount of affection and understanding expressed by their partners. (“My spouse doesn’t take me seriously enough sometimes”; “Sometimes I wonder just how much my spouse really does love me.”) Problem-Solving

Communication

(PSC).

This

second

communication

scale assesses

1Sample items listed are all scored in the true direction. A complete listing of item composition for each scale,

scoring direction, and response characteristics is provided in the test manual (Snyder, 1981, in press). Sample items are copyrighted © 1979 by Western Psychological Services. Reprinted by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025.

324

SNYDER AND COSTIN

couples’ general ineffectiveness in resolving differences. Items measure overt conflict, rather than the underlying feelings of detachment or alienation. (“Minor disagreements with my spouse often end up in big arguments”; “My spouse and I seem able to go for days sometimes without settling our differences.”’) Time Together (TTO). Items on this scale reflect a lack of common interests and dissatisfaction with the quality and quantity of leisure time together. (“My spouse and I don’t have much in common to talk about’; “About the only time I’m with my spouse is at meals and bedtime.’’) Disagreement About Finances (FIN). This scale assesses marital discord regarding the management of family finances. (“My spouse buys too many things without consulting with me first”; “It is often hard for my spouse and me to discuss our finances without getting upset with each other.’’)

Sexual Dissatisfaction (SEX). Items on this scale reflect dissatisfaction with both the frequency and quality of intercourse and other sexual activity. (“My spouse sometimes shows too little enthusiasm for sex”; “My spouse has too little regard for my sexual satisfaction.”) Role Orientation (ROR). This scale reflects the adoption of a traditional versus nontraditional orientation toward marital and parental gender roles. Items are scored in the nontraditional direction. (“There should be more daycare centers and nursery schools so that more mothers of young children could work”; “A wife should not have to give up her job when it interferes with her husband’s career.’’)

Family History of Distress (FAM). Items reflect individuals’ unhappy childhoods and disharmony in the marriages of respondents’ parents and extended families. (“I was very anxious as a young person to get away from my family”; “My parents didn’t communicate with each other as well as they should have.”) Dissatisfaction with Children (DSC). This scale assesses parental dissatisfaction or disappointment with children. Items reflect parent—child relationships, rather than relationships between the spouses. (“Having children has not brought all of the satisfaction I had hoped it would”; “My children rarely seem to care how I feel about things.”’) Conflict over Childrearing (CCR). Items assess the extent of conflict between spouses regarding childrearing practices and parental responsibilities. (“My spouse and I seem to argue more frequently since having children”; “My spouse doesn’t assume his (her) fair share of taking care of the children.’’) Except for Conventionalization and Role Orientation, each scale is scored so that high scores

reflect high levels of distress. Items from the first nine scales are presented in random order; items from the two child-related scales (DSC and CCR) appear last, so that couples without children complete only the first 239 items. In addition to these standard profile scales, Snyder and Regts (1982) developed two broad-band factor scales of marital distress to supplement the 11 MSI profile scales. These two additional scales, labeled Disaffection (DAF) and Disharmony (DHR), were derived from factor analysis of the 127 items constituting the Global Distress (GDS) and affective triad (AFC, PSC, and TTO) profile scales. The relative brevity of these two factor scales,

made up of a total of 44 items, facilitates their use on those occasions when situational constraints preclude administration of the entire inventory. Factor scale content, abbreviations, and sample items are as follows.

14

Disaffection (DAF).

MARITAL SATISFACTION INVENTORY

325

Scores on this factor scale reflect minimal affection or understand-

ing from one’s spouse, an absence of common

interests or shared leisure activities, general

dissatisfaction with the marital relationship, and an inclination toward separation or divorce. (“The future of our marriage is too uncertain to make any serious plans”; “I’m not sure my spouse has every really loved me.”) Disharmony (DHR). This factor scale reflects a general inability to resolve differences, characterized by misinterpretation of each other’s views, a propensity for disagreements to be perceived as personal criticism, and the escalation of minor differences into major conflicts. (“My spouse and I need to improve the way we settle our differences”; “Our arguments frequently end up with one of us feeling hurt or crying.”)

ADMINISTRATION AND SCORING Items comprising the MSI can be presented to individuals by means of an administration booklet or interactively on a microcomputer using software distributed by the test publisher.” When using an administration booklet, individuals’ responses can be recorded on standard

answer sheets and hand scored or, alternatively, can be recorded on special answer sheets to be mailed to the publisher for computerized scoring and interpretation. When hand scored, husbands’ and wives’ scores on the MSI are transferred to a profile sheet on which raw scores are transformed into linear T scores, standardized separately by gender. Hand scoring and transfer of scale scores to the profile form requires approximately 10 minutes per individual. Test reports generated by Western Psychological Services through a mail-in service are accompanied by a special ChromoGraph,? for which different interpretive scale ranges are delineated by separate color codes, reflecting different degrees of distress in each problem area, and spouses’ scores on the MSI are presented in contrasting red and blue profiles. Microcomputer software available from the publisher permits individuals to view and respond to test items interactively, with subsequent computerized scoring and interpretation in clinicians’ offices. For clinicians with limited microcomputer facilities, this same software permits transfer of previously recorded responses (e.g., from a standard answer form) for computerized scoring and generation of an automated report.

TYPES OF NORMS AVAILABLE Respondents’ raw scores on MSI scales are converted to linear T scores based on separate norms for husbands and wives. The standardization sample of 322 husbands and 328 wives (including 211 couples) reflects broad representation across sociodemographic indices, including age, ethnicity, education,

and occupation (Snyder,

1981). The MSI profile form

permits easy comparison of individuals’ scores to those “typical” of husbands or wives sampled at large from a nonclinic population. The average T score is 50, and approximately two thirds of individuals from a nondistressed sample could be expected to have scores in the range of 40—60T. 2The Marital Satisfaction Inventory (MSI) (including materials for administration and either hand or computer scoring and interpretation) is published and distributed by Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025. 3ChromaGraph is a registered trademark of Western Psychological Services.

326

SNYDER AND COSTIN

However, it often is useful to compare an individual’s scores on the MSI to the average profile for specific criterion groups. The revised MSI manual (Snyder, in press) includes group mean profiles for samples of individuals entering marital therapy, individuals completing marital therapy, couples seeking treatment at a sexual dysfunctions specialty clinic, physically battered women seeking refuge at a spouse-abuse shelter, couples in which one or both spouses are in individual treatment for nonmarital emotional and behavioral disorders, and parents of psychiatrically hospitalized children or adolescents. In drawing comparisons between an individual profile and the mean profile for any criterion group, it is essential to emphasize the high degree of profile variability across individuals comprising any specific group. Consequently, similarities between an individual’s MSI profile and the mean MSI profile for some criterion group may suggest that the respondent shares important characteristics with that group in terms of his or her marital relationship, but these should be explored carefully in the interview. Similarly, discrepancies between an individual’s MSI scores and the mean profile for some criterion group do not rule out the possibility that the respondent shares important common features of that group.

RELIABILITY OF SCALES The MSI scales possess high levels of both internal consistency and temporal stability. Cronbach alpha coefficients of internal consistency range from .80 to .97 (M = .88). Test— retest reliability analyses over a 6-week interval for a sample of 74 respondents (37 couples) yield stability coefficients ranging from .84 to .94 (M = .89). Standard errors of measurement based on test-retest correlations range from 2.45 to 4.0 T-score points. These relatively high reliability indices have important implications for test interpretation. First, changes in clients’ profiles across time most likely reflect genuine changes in the respondents’ experience of their marital relationship, rather than chance variations in test scores. Second, both variability in scores across scales for a given spouse and discrepancies between husband and wife on the same scale can be interpreted more safely as reflecting reliable, meaningful differences.

VALIDITY OF SCALES Evidence for the validity of the MSI scales derives from three sources: (a) studies of group discriminant validity, (b) correlational studies of scales’ convergent validity, and (c) actuarial

studies identifying the interpretive meaning of scores on each scale across distinct scale ranges. Discriminant Validity. Several studies of group discriminant validity have addressed the MSI’s ability to distinguish between groups that, on the basis of theory or clinical experience, would be expected to differ on the MSI scales in some specific manner. Snyder (1979b) contrasted the mean MSI profile for 30 couples in marital therapy with the mean profile for a matched sample of 30 control couples from the general population. Analyses indicated that couples in therapy differed significantly on each of the 11 MSI profile scales. As a group,

couples in marital therapy uniformly exhibited low Conventionalization (CNV) scores, high Global Distress (GDS) scores, and high scores on the affective triad (Affective Communication [AFC], Problem-Solving Communication [PSC], and Time Together [TTO]). A high degree of within-group variability existed for marital therapy couples on the remaining MSI

14

MARITAL SATISFACTION INVENTORY

scales, so that mean scores in these areas tended to be lower, although they were still elevated above what normally would be expected. Similarly, Berg and Snyder (1981) administered the MSI to 45 couples having primary complaints of dissatisfaction with their sexual relationship and seen in brief, directive con-

joint therapy at a sexual dysfunctions specialty clinic. Compared with a matched sample of 45 couples in marital therapy, the sexual dysfunctions group showed significantly higher scores on the Sexual Dissatisfaction (SEX) scale, with 75% of spouses in this sample having scores on this scale exceeding 60T. However, the sex therapy sample also exhibited considerable profile variability, with some couples exhibiting extensive generalized marital distress. Snyder (1981) suggested two criteria by which clinicians would be prudent to defer brief directive sex therapy in favor of more broadly focused marital therapy: (a) When the absolute level of global marital distress exceeds moderate proportions (GDS > 65T); or (b) when the relative elevation of SEX to other clinical scales reveals the couple to have primary distress around nonsexual aspects of their relationship, such as communication and intimacy. Two additional studies have emphasized group discrimination in addressing scale validity. In an unpublished study, Snyder, Fruchtman, and Scheer (1980) collected MSI data from 66

women who had sought protection from their physically abusive partners at a residential wife-abuse shelter. In general, physically abused women produced highly elevated MSI profiles with scores consistently 5 to 10 T points higher than women beginning marital therapy along measures of Global Distress (GDS), Disagreement About Finances (FIN), Conflict over Childrearing (CCR), and the affective triad (Affective Communication [AFC], Problem-Solving Communication [PSC], and Time Together [TTO]). Within the wife-abuse

sample, little variability was observed among individual profiles. A small number of women in the shelter sample generating lower MSI profiles also reported during the interview more satisfactory relationships, less pervasive abuse, and a stronger inclination to return to their partners upon leaving the shelter. Finally, significant group mean profile differences on the MSI have been demonstrated for parents scoring either high (= 60T) or low (= 45T) on the Family Relations (FAM) scale of the Personality Inventory for Children (PIC; Wirt, Lachar, Klinedinst, & Seat, 1984). The PIC Family Relations Scale serves as a general screening measure of instability and conflict in children’s and adolescents’ home environments. As expected, parents scoring high on this scale showed significantly higher elevations across a broad range of MSI profile scales reflecting relationship distress, including Global Distress (GDS), Affective and ProblemSolving Communication (AFC and PSC), Time Together (TTO), Disagreement About Fi-

nances (FIN), Sexual Dissatisfaction (SEX), and both child-related scales (Dissatisfaction with Children

[DSC] and Conflict over Childrearing [CCR]), (Snyder, Gdowski,

& Low-

man, 1980). Convergent Validity. Correlational studies of the MSI scales’ convergent validity have established the relatedness of these scales to a broad range of affective and behavioral

components of marital interaction. As an overall measure of relationship accord, the Global Distress (GDS) scale has been found to correlate highly with both the Locke-Wallace (1959) Marital Adjustment Test (Snyder, 1979b) and with Spanier’s (1976) Dyadic Adjustment Scale (Snyder & Wills, 1989). Smith, Snyder, Trull, and Monsma (1988) showed that high scores on the Time

Together (TTO)

scale, reflecting distress with shared

interests

and leisure

activities, correlated with a variety of behavioral measures, indicating an absence of leisure

interaction with the spouse alone and higher rates of discretionary time allocated either to individual pursuits or to interaction with others excluding the spouse.

Snyder and Berg (1983a) administered a 15-item symptom checklist to 45 couples seeking

327

328

SNYDER AND COSTIN

sex therapy and found that scores on the Sexual Dissatisfaction (SEX) scale of the MSI were related most strongly for husbands to complaints regarding their wives’ lack of response to sexual requests, their wives’ difficulty in reaching orgasm, and the absence of erectile or

ejaculatory problems for the husbands. Wives’ scores on the Sexual Dissatisfaction (SEX)

scale related most strongly to complaints of too infrequent intercourse, their husbands’ lack of response to sexual requests, and their husbands’ arousal or orgasmic difficulties.

Although much of the marital literature indicates a lack of correspondence between selfreport and observational measures of couples’ communication, spouses’ scores on the two MSI communication scales (Affective Communication [AFC] and Problem-Solving Commu-

nication [PSC]) have been found to correlate significantly with a variety of observational measures of verbal agreement/disagreement, attributional statements, and nonverbal indicators of positive/negative affect as either speaker or listener (Snyder, Trull, & Wills, 1987). Snyder, Klein, Gdowski, Faulstich, and LaCombe

(1988) administered the MSI and the

PIC to three samples of nonclinic couples, maritally distressed couples seeking treatment, and parents of psychiatrically hospitalized children or adolescents. They found that high scores on either of the two MSI child-related scales (Dissatisfaction with Children [DSC] and Conflict over Childrearing [CCR]) were related to a broad range of emotional and behavioral difficulties described by the parents regarding one or more of their children on the PIC. Snyder and Regts (1990) administered both the MSI and the Minnesota Multiphasic Personality Inventory (MMPI; Hathaway & McKinley, 1967) to three samples of nonclinic couples, maritally distressed couples seeking marital therapy, and couples in which one or both spouses were receiving psychiatric treatment for individual difficulties. They demonstrated that scores on the Conventionalization (CNV) scale of the MSI, reflecting a tendency

toward denial of common difficulties, correlated significantly with respondents’ scores on the Lie (L) scale of the MMPI. In addition, the MMPI Psychopathic Deviate (Pd) scale, shown across numerous studies to reflect a history of relationship difficulties and limited capacity for empathy, correlated with a broad range of scales on the MSI reflecting marital distress. In a cross-generational study of gender role attitudes, respondents’ views toward marital and parental roles as described on the Role Orientation (ROR) scale of the MSI have been shown to correlate significantly with their parents’ scores on this scale, particularly for men (Snyder, Velasquez, & Clark, 1991). For women, scores on the ROR scale correlated significantly with more general gender role attitudes as measured by the MMPI, the Personal Attributes Questionnaire (Spence, Helmreich, & Stapp, 1975), and Bem’s (1981) Sex Role Inventory (Snyder, Clark, & Velasquez, 1991).

Snyder and Smith (1986) cluster analyzed MSI profiles to derive an empirically based classification system of marital relationships. Responses from 89 clinic and 89 nonclinic couples were used to identify five distinct profile types for both husbands and wives. The distinctiveness of couples classified according to the MSI was subsequently confirmed by examining convergence between MSI profile classification and independent descriptions of couples provided by clinicians. Actuarial Validity. Although conceptually overlapping with studies of both discriminant and convergent validity, actuarial studies of the MSI differ in the manner in which group

differences or correlational findings are analyzed. The actuarial assessment of couples’ relationships implies that, for a given set of test scores from one or both spouses, the clinician draws on an extensive, previously established database in evaluating behaviors,

feelings, and cognitions likely to be manifested by each partner in the relationship (Snyder, Lachar, Freiman, & Hoover, 1991). The assessment relies on probability estimates derived

14

MARITAL SATISFACTION INVENTORY

from previously identified relationships between test scores and nontest criteria relevant to marital interaction, rather than on clinical intuition or the content of individual test items to which spouses have responded. Snyder et al. (1991) described four steps in the derivation of actuarially based interpretive systems for psychological measures. The first of these involves identifying statistically significant and reliable-associations between predictors and respective criteria. The second requires construction of contingent-frequency tables delineating the likelihood that some external criterion (e.g., some spousal affect, cognition, or behavior) will be observed given some range of scores on the predictor measure. A related third step requires identifying that score range at which some external criterion becomes significantly more or less likely than what would be expected by chance alone. The final step involves integration of these probability analyses, both within and across scales, to derive interpretive guidelines or narratives for various scale score ranges or across scale configurations. In addition to data accrued from studies of discriminant and convergent validity described previously, three additional studies were conducted to assist in the actuarial interpretation of profile scales on the MSI. In an initial clinical validation study of the MSI, Snyder, Wills, and Keiser (1981) examined the relationship of individual MSI scales to clinicians’ ratings of 50 couples entering marital therapy. Following an extensive conjoint interview, each husband and wife was rated separately on 61 clinical criteria assessing: (a) general presentation of self and the marriage; (b) specific areas of interaction between spouses (e.g., communication, leisure

time,

finances,

sexual

relationship);

(c) family

history

and

role

dispositions;

(d) psychiatric and physical distress; (e) spousal interactions regarding children; and (f) clinician-rated prognosis for response to marital therapy. For each item, the clinician rated the presence or absence of that criterion and, if present, whether the criterion was evident to a moderate or an extensive degree. Clinicians’ ratings of spouses in each of these domains were subsequently correlated with husbands’ and wives’ scores on the MSI. Results provided broad support for the validity of individual scales and the ability of the MSI to distinguish among levels and sources of relationship distress among couples entering marital therapy. Scheer and Snyder (1984) extended these results in a replication study using 50 nonclinic couples sampled from the general population. Following completion of the MSI and an extensive conjoint interview, each husband and wife was rated separately on a 76-item criterion checklist, including items used in the Snyder et al. (1981) clinic study plus 16 new

criteria reflecting only mild distress or unusually gratifying aspects of the marital relationship. Similar to results from the earlier clinical study, correlational findings with this non-

clinic sample lent strong empirical support to the interpretive intent of MSI scales. In addition, these findings offered empirical support for use of the MSI with nonclinic couples and with couples presenting other than primary complaints of marital distress. Finally, Snyder and Lachar (1986) conducted a national validation study based on a sample of 323 couples engaged in marital therapy with 161 therapists reflecting a broad range of professional backgrounds from all geographic regions of the continental United States. All

spouses completed the MSI and an extensive checklist on which they rated the extent of relationship difficulties in specific areas, predicted the future course of their marriages and likely successes in resolving marital difficulties, and rated both themselves and their partners on 43 descriptors of intrapersonal and interpersonal functioning. In addition, each therapist completed a checklist describing areas of relationship dysfunction, spouses’ individual emotional difficulties interfering with the marriages, and probable courses of therapy and future of the couples’ relationships. Overall, 766 descriptors of the marital relationship and individ-

ual spouses were found to correlate with MSI scales across both mixed-gender and same-

329

330

SNYDER AND COSTIN

gender samples. In addition, 90 correlates specific to either husbands or wives were identified (Snyder, Freiman, & Lachar, 1989). As in the Snyder et al. (1981) and Scheer and Snyder (1984) studies, scales’ significant correlations with spouses’ and clinicians’ ratings generally conformed to scales’ interpretive intent and with nomological nets delineated in previous validational efforts. In each of these three studies of actuarial validity, contingent-frequency tables were constructed for each of the significant scale-to-criterion correlations obtained. In all, more than 1,200 such tables were constructed. These tables delineated the probability that a given marital relationship characteristic would be observed as a function of spouses’ scores across each scale of the MSI. Findings across these tables were integrated to identify low, moderate, and high ranges for each of the MSI scales. (See Snyder et al., 1991 for annotated examples of actuarial analyses.) In developing comprehensive interpretive guidelines for the MSI, actuarial findings for all external correlates across respective scales were reviewed to derive interpretive paragraphs for individuals scoring in the low, moderate, or high ranges of each scale. Additional interpretive narratives were constructed to incorporate moderating effects on scale interpretation from both intraspousal and interspousal profile variation. These efforts culminated in a library of more than 300 interpretive paragraphs and associated decision rules analyzing individual scale elevations and configural patterns both within and across spouses. These empirically derived interpretive narratives comprise the computer-based interpretive system for the MSI (Snyder & Lachar, 1986). The actuarial approach and empirical findings, on which interpretation of the MSI is

based, distinguish this instrument from virtually every other measure of marital or family functioning reported in the literature. The remainder of this chapter addresses means by which clinicians and researchers can utilize these empirically based interpretive narratives in the assessment and treatment of distressed marital relationships.

A BASIC INTERPRETIVE STRATEGY Interpretation of the MSI profile proceeds systematically across scales to consider issues related to profile validity and global marital affect, spousal communication, specific areas of interaction, concerns regarding children (if these items are completed), role orientation, and

family history of distress. Interpretation of individual profiles incorporates both a scale-byscale and configural approach in the following steps. Step 1. Scores on Conventionalization (CNV) and Global Distress (GDS) are examined in a configural analysis of overall defensiveness regarding the marriage and global marital distress. Atypical patterns on these scales (e.g., high scores on both CNV and GDS) suggest further inquiry into the individual’s test-taking approach. Step 2. The individual’s descriptions of affective and problem-solving communication (AFC and PSC) are examined in a configural analysis. High scores on PSC relative to lower scores on AFC

suggest the usefulness of emphasizing specific problem—resolution skills. Conversely, relatively higher scores on AFC (particularly if accompanied by high scores on Sexual Dissatisfaction [SEX] and Time Together [TTO]) may indicate the need to address broader issues of intimacy, commitment, and

affection experienced in the marriage prior to interventions aimed at resolution of specific conflicts. Step 3. The specificity or generalization of marital distress is evaluated based on the respondent's descriptions of conflict in specific areas of the marriage (Time Together [TTO], Disagreement About

14

MARITAL SATISFACTION INVENTORY

Finances [FIN], Sexual Dissatisfaction [SEX], and Conflict over Childrearing [CCR]). Scores in these areas are compared with the reported level of overall marital distress (GDS). These comparisons facilitate hypotheses regarding the extent to which specific sources of relationship conflict contribute to more pervasive dissatisfaction with the marriage versus the couple’s ability to identify and contain relatively focused conflicts within an overall positive marital relationship. Step 4. Spousal contention regarding leisure time together, disagreement about finances, or sexual dissatisfaction is described based on the individual’s scores in these areas (TTO, FIN, and SEX).

Differences in levels of conflict across specific domains facilitate prioritizing areas for initial treatment interventions. Areas of relative satisfaction may be emphasized as means of building on relationship strengths and increasing resilience to distress in other domains. Step 5. If the individual has completed items on the two child-related scales (DSC and CCR), spousal conflict over childrearing (CCR) is described. The respondent’s views toward the children and distress in parental relationships with the children (Dissatisfaction with Children [DSC]) are compared with marital conflict over childrearing (CCR) in a configural analysis. Step 6. The respondent’s espoused role orientation toward marital and parental roles (ROR) is described, along with the reported history of family distress (FAM). Scores in both domains reflect dispositions that the individual brings into the relationship that may contribute to marital conflict. The individual’s espoused role orientation (ROR) should be examined in the context of actual role behaviors identified in therapy. Styles of managing affect, conflict, and intimacy experienced in the family of origin should be examined for their relationship to similar issues experienced in the respondent’s own

marriage. The interpretive strategy for MSI profiles obtained from both spouses is similar to that employed for individuals, but emphasizes interspousal configural analyses comparing respondents’ scores along each of the MSI scales, in addition to examining intraspousal patterns. Steps for interpreting couples’ MSI profiles interactively are as follows. Step 1. Both partners’ response styles (CNV) and overall descriptions of the marriage (GDS) are examined in configural analyses emphasizing both intraspousal patterns and, second, comparisons between spouses. Implications for profile validity are discussed. Differences in spouses’ overall distress are noted for their potential impact on conjoint marital therapy. Step 2. Spouses’ descriptions of affective and problem-solving communication (AFC and PSC) are compared in interspousal analyses. Differences in spouses’ concerns with emotional intimacy versus

efficacy in conflict resolution should be explored in terms of potential stylistic differences in approaches to dealing with relationship distress. Step 3. Partners’ descriptions of conflict in specific areas of the marriage (Time Together [TTO], Disagreement About Finances [FIN], Sexual Dissatisfaction [SEX], and Conflict over Childrearing [CCR]) are examined to determine the degree of congruence in spousal perceptions. The degree of similarity or divergence in partners’ views of their relationship is examined for its potential impact on

establishment of a collaborative set and spousal agreement on initial therapeutic goals. Step 4. Respondents’ reports of marital contention regarding leisure time together, disagreement about finances, or sexual dissatisfaction are examined based on interspousal comparisons in these areas (TTO, FIN, and SEX). Step 5. If both individuals have completed items on the two child-related scales, interspousal analyses examine both Conflict over Childrearing (CCR) and Dissatisfaction with Children (DSC). If

only one respondent has completed items on these scales, his or her views toward the children are compared with marital conflict over childrearing in intraspousal configural analyses.

Step 6. Interspousal analyses compare respondents’ espoused role orientations toward marital and parental roles (ROR) and their respective histories of family distress (FAM). Discrepancies between espoused versus enacted roles should be explored with each spouse, and spousal differences in role orientations should be examined for their contribution to marital distress. Similarly, styles of managing conflict and intimacy in each spouse’s family of origin and differences in this regard should be examined for their relationship to conflicts in these areas in the couple’s own marriage.

331

332

SNYDER AND COSTIN

ADDITIONAL ANALYSIS AT THE ITEM LEVEL In contrast to an instrument such as the Minnesota Multiphasic Personality Inventory-2 (MMPI-2), where scales have been derived through empirical criterion keying, the rationaldeductive

method

of MSI

scale construction

(Burisch,

1984; Hase

& Goldberg,

1967)

typically makes an examination of separate content scales or critical items less necessary to determine the basis on which an individual obtains a high score on a given scale. Nevertheless, there are several instances in which analysis of spouses’ responses to the MSI at the item level can be helpful. The heterogeneity of some scales may render individual item analysis useful. For example, it may be important to determine that a couple’s elevated scores on Sexual Dissatisfaction (SEX) result not from distress with the frequency of intercourse, but rather from dissatisfaction with the nature or variety of sexual exchanges. Second, an analysis of individual items may help to pinpoint specific situations leading to distress in an area. For example, a review of spouses’ responses to items comprising the Conflict over Childrearing (CCR) scale may highlight disagreements concerning how much time either parent spends with the children. Third, analysis of responses at the item level can sometimes serve a didactic or treatment function. For example, discussion of individual items on the Affective (AFC) and Problem-Solving Communication (PSC) scales can lead to important interventions emphasizing active-listening or conflict—resolution skills.

COMPUTER-BASED TEST INTERPRETATION (CBTI) OF THE MSI Users of the MSI can obtain computer-generated narrative reports for individuals or couples by submitting special optically scanned answer sheets directly to the publisher, or by administering the inventory or recording spouses’ responses on their own personal computer using microcomputer software distributed by the test publisher. In a recent methodological review, Snyder, Widiger, and Hoover (1990) noted that: Computerized interpretive narratives, when developed on a broad actuarial foundation of empirical findings relating test indices to relevant external criteria, offer distinct advantages including: (a) economy of processing and more effective utilization of professional resources; (b) accuracy and consistency of scoring and implementation of interpretive decision rules; (c) virtually unlimited capacity for storage, indexing, and retrieval of relevant information from the clinical and research literature regarding test-behavior relationships; (d) ability to subject test indicators to complex, configural analyses; and (e) potential for automated collection and analysis of extensive normative data bases.

(p. 470) The MSI is one of a small number of instruments for which an automated interpretive system has been developed almost exclusively on the basis of actuarially based external validation studies. Similarly, the computerized narrative system for the MSI is one of the few CBTIs to have been subjected to careful empirical scrutiny, including rigorous controls for various response sets influencing consumers’ ratings of CBTI accuracy. Hoover and Snyder (1991) conducted a national validation study of the computerized

report for the MSI in which clinicians indicated for each separate interpretive section of the computer-generated report whether that section: (a) was concise, (b) confirmed the therapist’s own clinical impressions of the couple, (c) omitted important information, (d) was useful for diagnosis and/or treatment,

information.

(e) was

accurate,

and (f) included important new

14

MARITAL SATISFACTION INVENTORY

333

Overall, clinicians’ ratings strongly supported the accuracy, clinical utility, and clarity of the MSI computerized report. The majority of clinicians (82.6%) judged the report to describe accurately the couple and their problems. In terms of clinical utility, the same percentage of clinicians (82.6%) rated the report as helpful for planning treatment, and 63.0% responded that it pointed out things about the couple that were not noticed previously. In addition, almost all clinicians (97.9%) felt that the overall computerized MSI report was well organized and clear in its presentation.

Use of the MSI for Treatment Planning GENERAL CONSIDERATIONS In planning meaningful interventions with couples, it is crucial that therapists assess each spouse’s subjective experience and appraisal of the marital relationship. Although numerous studies have delineated reliable differences in overt behaviors distinguishing distressed from nondistressed couples (Weiss & Heyman, 1990), recent research points to the importance of the affect and subjective meaning attached to these behaviors by each spouse (Baucom & Epstein, 1990; Greenberg & Johnson,

1988; Snyder & Wills, 1991).

The utility of self-report measures of marital interaction in planning treatment has been noted by Jacobson and Margolin (1979). In summarizing general advantages of self-report measures, they emphasized that such inventories: (a) present a low-cost, low-effort method of

gathering information; (b) permit sensitive information to be collected early; (c) allow couples to communicate information that spouses are eager to transmit; and (d) can be used as objective outcome criteria at termination and follow-up (Jacobson & Margolin, 1979). They added that appropriate self-report measures may establish an important basis for therapeutic rapport in attending to each partner’s unique view of the relationship. Weiss and Margolin (1977) cited the importance of marital therapists’ adopting multiple-area assessment both before and after treatment, in which data are gathered systematically “on untreated behaviors in areas germane to but not included in the marital intervention” (p. 596). As a multidimensional measure that delineates specific sources of marital distress, the MSI facilitates formulation of therapeutic interventions tailored to the specific concerns of the couple. The amount of information generated by the MSI profile is substantial and increases considerably when two profiles are assessed interactively. Snyder (1983) noted that in assessing a given individual profile, one potentially has 55 pairwise comparisons between scales to consider—231 comparisons if evaluating two profiles conjointly. Additional threeway and higher order comparisons among scales increase both the potential depth and complexity of profile analysis. A structured approach to MSI profile interpretation facilitates the integration of findings across scales, particularly when incorporating self-reports from two spouses interactively.

PREDICTING TREATMENT OUTCOME: RESEARCH FINDINGS AND CLINICAL APPLICATIONS Initial evidence for the MSI’s utility in predicting treatment response was reported by Snyder and Berg (1983b) in a study of 26 couples entering brief directive sex therapy. Couples’

pretreatment MSI scores were correlated with posttreatment ratings of dissatisfaction with

334

SNYDER AND COSTIN

the frequency of intercourse and individuals’ lack of affection for their partner. Using multiple regression analysis, pretreatment scores on Affective Communication (AFC), Time Together (TTO), Disagreement About Finances (FIN), and Sexual Dissatisfaction (SEX) were able to account for 29% of the variance in couples’ posttreatment ratings of residual sexual distress (multiple R = .54). Similarly, pretreatment scores on four MSI scales (ProblemSolving Communication [PSC], Sexual Dissatisfaction [SEX], Family History of Distress [FAM],

and Dissatisfaction

with Children

[DSC])

were

able to account

for 30%

of the

variance in couples’ posttreatment ratings of residual lack of affection for their marital partner (multiple R = .55). These results supported Roffe and Britt’s (1981) suggestion that couples characterized by flexibility in sharing intimacy, expressing affection, and empathizing with their spouse are most likely to profit from a symptom-focused and behaviorally oriented treatment program for sexual dysfunctions. Additional evidence regarding the MSI’s utility in predicting couples’ long-term response to marital therapy was obtained in a comparative treatment study of 59 couples. Snyder, Wills, and Grady-Fletcher (1991) obtained data regarding marital status and marital accord for 55 of these 59 couples 4 years after they completed marital therapy. Of these 55 couples, 11 couples (20%) had divorced, and an additional 8 couples (15%) who still were married reported significant relationship distress. Couples’ scores on the.MSI at intake were correlated with their status at termination (distressed vs. nondistressed) and their status 4 years after completing treatment (distressed/divorced vs. nondistressed) to evaluate the utility of initial MSI assessment data in predicting short- and long-term treatment response.* Results are presented in Table 14.1. Overall, pretreatment MSI scores on six profile scales and one factor scale predicted initial response at termination at p < .01; for five of these measures (CNV, GDS, AFC, TTO, and DAF), predictiveness was replicated across both husbands and wives at p < .05. The best intake predictors of initial treatment response were the Global Distress (GDS) scale (r = .54) and Disaffection (DAF) factor scale (r = .56).

Although pretreatment MSI scores on five scales predicted couples’ long-term response to therapy at 4-year follow-up, only one (Problem-Solving Communication [PSC]) was predictive for both husbands and wives at p < .05 (pooled r =

.29). Overall, these results are

consistent with previous research indicating that higher levels of negative marital affect at intake, various indicators of spouses’ disengagement, and concrete steps taken toward separation or divorce predict poorer treatment outcome (Beach & Broderick, 1983; Bennun, 1985; Crane, Newfield, & Armstrong, 1984; Crowe, 1978; Hahlweg, Schindler, Revenstorf, & Brengelmann, 1984; Jacobson, Follette, & Pagel, 1986). However, findings in Table 14.1

also suggest that, although initial therapy response relates particularly strongly to pretreatment affect, long-term response is predicted better by pretreatment conflict—resolution skills. In addition, couples’ scores on the MSI at termination were correlated with their status at 4-year follow-up (distressed/divorced vs. nondistressed) to evaluate the utility of termination MSI assessment data in predicting maintenance or deterioration in couples’ marital satisfaction following treatment (see Table 14.2). Termination scores on five profile scales and both factor scales correlated with long-term outcome at p < .01. However, only scores on Global Distress (GDS) and the two factor scales (Disaffection [DAF] and Disharmony [DHR]) replicated as significant predictors (p < .05) across both husbands and wives (pooled rs ranging from .28 to .35). Given its overall high levels of predictiveness of both short- and long-term treatment ‘Additional findings regarding non-MSI predictors of couples’ treatment response are reported in Snyder, Mangrum, and Wills (1993).

14 MARITAL SATISFACTION INVENTORY TABLE 14.1 Correlations of Intake MSI Measures With Therapy Outcome

Termination

Predictor

Husbands@

Wives@

-.40* .47* .28 oa" .29 .23

=000 dente oon

Four-Year Outcome

Combined

Husbands?

Wives?

Combined

Intake MSI CNV GDS AFC PSC TTO FIN SEX ROR FAM DSC CCR DAF DHR Note.

.46*

-.38" .54* ache .26* (oor sel

.28 31

.30* .20 29”

.26 2% -.30

ye)

aie

.48* oo4

(65;

.18 706" .20

.19 .24* .25

.24* 21

24

Asterisks denote correlations significant at p< .01; all others significant at p< .05.

an= 59.

bn=55. TABLE 14.2 Correlations of Termination MSI Measures With Four-Year Therapy Outcome Four-Year Outcome

Predictor

Termination

CNV GDS AFC PSC TTO FIN SEX ROR FAM DSC CCR DAF DHR

Husband s@

Wives2

Combined

.33*

.33*

35" .28* Lael .20 30"

MSI

24 29 .25 .29 27 23 25

20" .38* 24

ae 28"

Note. Asterisks denote correlations significant at p < .01; all others significant at p< .05. 8m = 55.

response and its routine scoring as one of the profile scales, additional actuarial analyses focused on the Global Distress (GDS) scale for its prediction of couple’s eventual divorce following treatment. Results of these analyses are presented in Table 14.3. Overall, 11 of 55 couples (20%) treated in this study had divorced within 4 years following marital therapy. However, for couples having intake GDS scores = 70T, the percentage of couples eventually

9335

336

SNYDER AND COSTIN TABLE 14.3 Predicting Divorce Four Years Posttherapy From Couples’ Scores on Global Distresss (GDS) at Intake and Termination

Couple’s Average GDS Score at Intake

Four-Year Outcome

Happily married Distressed married Divorced

70T

Ath 2 1

21 4 5

4 2 5

Couple’s Average GDS Score at Termination>

Four-Year Outcome Happily married Distressed married Divorced Note.

< 49T

50-597.

60-69T

> 70T

16 1 4

17 3 2

2 3 2

1 1 3

Base-rate for divorce = 11/55 = .20.

4 p (Divorce |(Intake GDS < 69T)) = 6/44 = .14;

p (Divorce | (Intake GDS > 70T)) = 5/11 =.45.

b p (Divorce |(Termination GDS < 59T)) = 6/43 = .14; p (Divorce |(Termination GDS > 60T)) = 5/12 = .42.

divorcing jumped to 45% in contrast to only 14% for couples with intake GDS scores = 69T. Similarly, for couples having termination GDS scores = 60T, the percentage of couples eventually divorcing reached 42% in contrast to only 14% for couples with termination GDS scores = 59T. Although pretreatment scores on GDS = 70T are unlikely to comprise a viable exclusionary criterion for accepting couples into treatment, they provide valuable information to the therapist and both spouses regarding the degree to which the marriage is at increased risk for eventual dissolution. Similarly, posttreatment GDS scores = 60T indicate residual risk for subsequent deterioration and divorce, and may suggest either continued therapy to improve the relationship, interventions to facilitate a functional dissolution of the marriage, or additional assessment and potential reinitiation of treatment 3-6 months following suspension of therapy.

INTEGRATING THE MSI WITH ADDITIONAL ASSESSMENT DATA The MSI was developed specifically to assess the quality of marital relationships. It was not

intended to supplant the clinician. Rather, MSI profiles always should be viewed in the context of additional data acquired during interview, through direct observation and in conjunction with additional responses to interpretative hypotheses provided by the couple. In addition, marital therapy often must consider important aspects of individual or family functioning not addressed by the MSI. For example, the research literature provides increasing evidence that various forms of individual psychopathology may contribute to, coincide

14

MARITAL SATISFACTION INVENTORY

337

with, exacerbate, or result from marital conflict (Gotlib & McCabe, 1990); depressive disorders have received particular attention for their interaction with marital distress (Beach, Sandeen, & O’Leary, 1990; Heim & Snyder, 1991; Jacobson, Dobson, Fruzzetti, Schmaling,

& Salusky, 1991). Similarly, considerable evidence has been accrued linking parents’ marital distress to increased vulnerability in their children and adolescents to various emotional and behavioral

disorders

related concerns

(Emery,

1988).

(Dissatisfaction

Although

the MSI

with Children

[DSC]

includes two measures

of child-

and Conflict over Childrearing

[CCR]), elevated scores on either of these scales may indicate the need for more detailed

assessment of psychological difficulties in one or more of the couple’s children. When working with distressed couples where either individual or more general family functioning issues appear salient, integration of the MSI with such instruments as the MMPI-2 or PIC may be essential to effective treatment planning. Although a wide range of assessment techniques exist for assessing emotional and behavioral disorders in children and adults, several factors argue for special consideration of the MMPI-2

and PIC. First, both

inventories are among the most widely and carefully researched instruments for assessing dimensions of personality and psychopathology in their respective targeted age groups. Second, the MMPI-2 and PIC are among only a handful of measures for which empirically derived interpretive guidelines have been established on the basis of extensive actuarial analyses similar to those described for the MSI. Finally, clinical research already has delineated patterns of test scores relating the MSI to both the MMPI-2 and PIC across both normative and clinic samples, providing an empirical basis for comparison.

PROVIDING COUPLES WITH FEEDBACK The authors of this chapter strongly endorse the philosophy that clinical assessment must be a collaborative process between client(s) and clinician. A cogent rationale for structured as-

sessment must be presented, results of the initial assessment must be provided in a comprehensible manner sensitive to the concerns of the respondent, the client should be invited to

respond to initial interpretation of findings with elaborations or challenges, and the clinician and client must work together to explore the implications of assessment findings for treatment. The relatively atheoretical structure of the MSI and the obvious relevance of its scales’ content to concerns typical of couples facilitate the provision of feedback and collaborative assessment process just noted. Various models for timing the initial administration of the MSI, providing results to couples in treatment, and incorporating this inventory in evaluation of outcome have been presented elsewhere (Snyder, 1981, 1983, 1990; Snyder, Lachar, &

Wills, 1988; Wills & Snyder, 1982). One approach is to have both spouses complete the MSI independently, either imme-

diately before or after the initial interview, to score both answer sheets during the week, and to provide the couple with a description and interpretation of results conjointly during the second session. Although affording advantages of brief assessment and immediate feedback,

this approach sometimes limits the amount of interview material incorporated into planning of therapeutic interventions and may raise pragmatic concerns regarding time constraints in test administration.

An alternative approach extends the initial assessment and interpretive phase across four sessions. During the initial session, the couple is interviewed conjointly to solicit general background

information

and problem

identification.

On the second

visit, one

individual

(usually the more distressed of the two) is interviewed individually while his or her spouse

338

SNYDER AND COSTIN

completes the MSI. On the third visit, these roles are reversed. During the fourth session, extensive feedback is provided to the couple conjointly, incorporating both interview material and test results to formulate an individualized treatment plan collaboratively. Depending on the length and progress of therapy, the MSI may be administered again at 3-6 months following the initial evaluation. This is frequently a helpful procedure for documenting changes that have occurred and establishing new directions for therapeutic endeavor. Readministration of the MSI at termination facilitates a review and integration of gains that the couple has acquired. Residual areas of marital distress may be discussed in terms of constructive alternatives the couple may adopt on their own or as potential areas for further exploration during additional treatment at a subsequent time. The authors’ general practice is to provide both spouses with their own copies of the computer-based narrative for the MSI. In initially presenting results, the MSI profile provides the basis for interpretation, because results graphically depict both intra- and interspousal differences in sources and degrees of relationship distress. When available, the MSI ChromaGraph generated during optical scanning of answer sheets by the test publisher is incorporated into feedback, because low, moderate, and high scores on each scale are readily discerned using corresponding green, amber, and orange colors to depict interpretive scale ranges. Each spouse is encouraged to review the narrative report on his or her own during the week, and to pursue reactions, questions, or additional concerns during the following ses-

sion.

LIMITATIONS IN THE USE OF THE MSI Veridicality of Self-Report. A strength of the MSI rests in its assessment of spouses’ subjective experiences of their marital relationship. However, similar to other self-report measures, the MSI also is susceptible to both deliberate and subconscious distortions influencing respondents’ appraisals of their marriage. An actuarial approach to profile interpretation and the presence of a social-desirability measure on the MSI (Conventionalization [CNV]) reduce, but do not eliminate, concerns regarding this potential response bias. As

noted earlier, it is critical that both clinicians and researchers integrate scores on the MSI with other assessment findings and consider the context in which evaluation data have been obtained. For some couples, assessment of their marriage based on results of the MSI may simply confirm relationship concerns already apparent to both spouses. Hence, the MSI may offer less unique contributions to treatment planning for couples demonstrating at intake a clear understanding of conflicts contributing both to their own and their partner’s dissatisfaction with the marriage. Other couples may reject this opportunity to engage in a collaborative evaluation of their relationship, or may use information provided by the MSI in an antagonistic, destructive manner. For such couples, the clinician’s sensitivity and skills play a critical

role in framing relationship difficulties and individual concerns to maintain the therapeutic

alliance and elicit a collaborative response.

However, for many individuals, particularly

those less able to articulate their own or their spouse’s concerns,

scores on this inventory

serve to highlight and place in perspective various sources of relationship distress contributing to more generalized marital discord (Snyder et al., 1988).

Moderator Variables. Analyses of variance and covariance have been conducted on the MSI to determine the effects of sociodemographic variables such as race, education, and stageof the family life cycle. Significant effects have been noted within the standardization

14

MARITAL SATISFACTION INVENTORY

339

sample for these moderators across a range of MSI scales (Snyder, 1981). However, although statistically significant, the magnitude of these effects tends to be quite small relative to score differences observed between various clinic and nonclinic samples. Moreover, group differences on the MSI as a function of these sociodemographic variables closely parallel similar differences reported in the literature, particularly for stages of the family life cycle. These similarities between findings for the MSI and sociodemographic effects reported in the marriage and family literature, in combination with scale-to-criterion correlations obtained for both clinic and nonclinic samples having considerable diversity across potential moderator variables, suggests that relatively small group differences within the standardization sample on the MSI scales most likely reflect valid group differences along relevant external

criteria.

Use of the MSI for Treatment Outcome Assessment EVALUATION OF THE MSI AGAINST CRITERIA FOR OUTCOME MEASURES The MSI compares favorably to previously identified criteria for evaluating outcome assessment instruments (Ciarlo, Brown, Edwards, Kiresuk, & Newman,

1986).

e Relevance: The MSI was developed specifically to assess both the source and extent of marital distress across domains of relationship concerns previously identified in both the clinical and empirical literature. Its rational—deductive approach to scale development results in test items and scale interpretations that have direct relevance to couples experiencing relationship distress. ¢ Procedures: Use of the MSI requires no special equipment, facilities, or specialized training. Scoring templates permit traditional hand scoring, and interpretive guidelines for distinct scale score ranges are provided in the test manual. Computer-assisted administration, scoring, and interpretation also are available for users with their own microcomputer equipment; alternatively, users can submit answer sheets to the test publisher for computerized scoring and interpretation. ¢ Referents: The MSI profile form facilitates objective comparison of individuals’ scores to husbands or wives sampled from the general population. Individuals’ scores also can be compared to specific criterion groups, including couples beginning or completing marital therapy, couples treated for specific sexual dysfunctions, physically battered women seeking refuge at a spouse-abuse shelter, couples in which one or both spouses are in individual treatment for nonmarital emotional and behavioral disorders, and parents of psychiatrically hospitalized children or adolescents. ¢ Multiple Perspectives: The MSI elicits the subjective experiences of both spouses; areas of both

convergence and divergence facilitate tailoring interventions specific to the needs of the couple. From an actuarial basis, each spouse’s scores are linked empirically to their own and their partner’s views of the relationship in addition to independent appraisals by marital therapists. e Treatment Linkage: In contrast to global measures of marital distress, the MSI identifies distinct areas of relationship concern and, consequently, suggests meaningful interventions most likely to result in favorable treatment outcome. ¢ Psychometric Features: The MSI scales possess high levels of internal consistency and temporal stability. The actuarial approach and empirical findings on which interpretation of the MSI is based

distinguish this instrument from virtually every other measure of marital or family functioning reported in the literature. Potential respondent bias is assessed directly by a measure of social desirability.

¢ Cost: As a self-report measure, the MSI permits a wide range of information specific to relationship distress to be obtained at minimal cost. Hand scoring can be conducted by clerical staff. Computer-

based interpretation further reduces the professional time required for clinical use. The cost of

340

SNYDER AND COSTIN administering, scoring, and interpreting the MSI for both spouses using either micro-diskettes or mail-in services typically amounts to less than 20% of the fee for one therapy session. © Consumer Acceptance: The MSI samples from domains directly relevant to couples’ typical presenting complaints; consequently, the instrument has high face validity for couples entering marital

therapy, in contrast to measures of personality or relationship measures linked to particular theoretical or conceptual models. Similarly, dimensions assessed by the MSI have high content validity for professionals outside mental health, for whom marital therapy outcome evaluation may have direct relevance (e.g., physicians, clergy, attorneys, judges, or departments of social service). ¢ Ease of Interpretive Feedback: Interpretive feedback regarding MSI test results is facilitated by the following: (a) face validity of scale content; (b) graphic display of profile scores permitting comparison of spouses’ results to each other and to couples from the general population; (c) group mean profiles for specific clinical populations presented in the test manual facilitating couples’ comparisons to these groups; (d) color-coded ranges of distress on the MSI ChromaGraph identifying high, moderate, and low scores on each dimension; (e) concise interpretive guidelines for scores on each

scale presented in the test manual; and (f) computerized interpretation for the MSI using either micro-diskettes or mail-in services providing both intra- and interspousal evaluations on each scale. ¢ Usefulness Across Clinical Functions: As a diagnostic and therapeutic procedure, the MSI is used in the initial phases of therapy in discussing couples’ presenting complaints and in formulating therapeutic goals; it can be used throughout therapy and at termination in the evaluation of change and redirection of treatment interventions. The MSI also can be used didactically: for example, reviewing

item content from the two communication scales to facilitate discussion of essential components of active listening and conflict resolution. The MSI also serves as a screening measure in clinical populations for which marital concerns may not comprise the primary reason for seeking treatment (e.g., parents of psychiatrically referred children). ¢ Theoretical Compatibility: Because the MSI is relationship specific and focuses on actual elements related to a couple’s interaction, it can be incorporated readily by clinicians across a broad spectrum of theoretical orientations. Its relatively atheoretical structure neither requires nor precludes higher order conceptualizations for incorporation in diverse treatment approaches.

RESEARCH

FINDINGS AND CLINICAL APPLICATIONS

Evaluating the effectiveness of treatment interventions with a distressed couple should involve a process continuing throughout the course of therapy. As noted earlier, the MSI can be readministered at multiple points during treatment to evaluate and consolidate gains that the couple has made and to identify residual areas of distress for further work. This idiographic approach to outcome evaluation emphasizes within-spouse change across time. In general, three approaches can be incorporated to evaluate the meaningfulness of change across time including: (a) attainment of statistically reliable change, (b) movement from one

scale score range to a different range denoting less distress, and (c) approximation of the individual’s profile to the mean profiles for couples terminating treatment or from the general population. Each of these approaches is discussed in turn.

Attainment of Statistically Reliable

Change.

Jacobson and Truax (1991) reviewed devel-

opments over the last decade for evaluating meaningful change in psychotherapy. They described a Reliable Change (RC) index, computed as 1.96 times a measure’s standard error of difference. When the absolute degree of an individual’s change exceeds the RC index, one can infer a meaningful difference that would occur by chance less than 5% of the time.

Because the standard error of difference is a function of a measure’s reliability, the higher the measure’s reliability the less change that is required to be statistically reliable. The appropri-

ate reliability index to be considered in such an application is the test-retest coefficient, because the difference in scores is one involving temporal change.

Pea

14

MARITAL SATISFACTION INVENTORY

Table 14.4 presents the minimum values of absolute differences required for each MSI scale to infer reliable change on that measure. Because couples’ average scores on each scale are more reliable than individuals’ scores on the same scale, different critical values are listed

depending on whether the clinician or researcher is evaluating individuals’ or couples’ data. For example, on the Global Distress (GDS) scale, a husband or wife is considered to exhibit reliable change if his or her score at Time 2 differs from the Time-1 score by 7.84 or more points. By comparison, in analyzing spouses’ average score on GDS, a couple is designated as showing reliable change if its averaged score at Time 2 differs from the Time-1 score by 6.79 or more points. Change in Scale Score Range. In addition to evaluating whether a respondent’s scores on the MSI reflect reliable change, one can examine whether a reliable change denotes transition from a distressed to nondistressed (or less distressed) status. Two methods exist for addressing this question; both assume that the degree of change being considered is reliable (using the RC index described previously). The first method involves determining whether scale scores on a given dimension at Times | and 2 fall within the same or different ranges, using guidelines developed on the basis of actuarial analyses of scale correlates. Scale ranges for each MSI scale (low, moderate, or high) are noted in the test manual as well as by color-coded zones on the Chroma-

Graph accompanying computer-based interpretive reports generated by the test publisher. For example, the cutoff point distinguishing moderate from low scores is 55T for Global Distress (GDS) and scales comprising the affective triad (AFC, PSC, and TTO), S5OT for Role Orientation (ROR) and measures of specific relationship distress (FIN, SEX, DSC, and CCR), and 45T for Conventionalization (CNV) and Family History of Distress (FAM). An alternative approach to identifying cutoffs for distinguishing distressed from nondistressed individuals is based on the respective means and standard deviations of these two populations and deriving that score above which individuals are statistically more likely to be

TABLE 14.4 Critical Values for Evaluating Reliable Change and Transition From Distressed to Nondistressed Status

Test-Retest Reliability

Individuals

CNV GDS AFC PSC TTO FIN SEX ROR FAM DSC CCR DAF DHR

.89 .92 84 91 .86 .87 .86 .89 94 .90 87 88 81

Couples

92 94 .87 94 .89 91 90 92 95 95 91 94 85

Minimal Reliable Change

Individuals

9.19 7.84 11.09 8.32 10.37 9.99 10.37 9.19 6.79 8.77 9.99 9.60 12.08

Couples

7.84 6.79 9.99 6.79 9.19 8.32 8.77 7.84 6.20 6.20 8.32 6.79 10.74

Cutoff for Distressed versus Nondistressed

43.06 58.05 55.70 Sia 54.60 53.61 54.08 53.76 51.26 51.61 52.62 54.31 57.05

Note. Computational procedures for derivation of the reliable change (RC) indices and cutoff points for distinguishing distressed from nondistressed scores can be found in Jacobson and Truax (1991).

341

342

SNYDER AND COSTIN

members of a distressed group and below which they are more likely to be members of a nondistressed group (Jacobson & Truax, 1991). Cutoff scores derived in this manner for each

MSI scale also are presented in Table 14.4. Approximation of Profile to Nondistressed or Posttreatment Populations. In addition to comparisons between profiles across time, it can be useful to compare respondents’ profiles to group mean profiles for couples beginning and completing marital therapy in a nomothetic approach to evaluating outcome. Snyder and Wills (1989) treated 59 couples in a controlled outcome study comparing behavioral versus insight-oriented marital therapies. Group mean profiles for these couples at intake and at termination are presented in Fig. 14.1. The mean pretreatment profile is similar to those obtained previously for couples in marital therapy (Snyder, 1981) and is characterized by a low score on Conventionalization (40T), moderate to high scores (ranging from 60—65T) on Global Distress and scales comprising the affective triad (Affective Communication, Problem-Solving Communication, Time Together), and moderate scores (5S—60T) on remaining scales of relationship distress (Disagreement About Finances [FIN], Sexual Dissatisfaction [SEX], Dissatisfaction with Children [DSC], and Conflict over Childrearing [CCR]).

WPS TEST REPORT”

— Marital Satisfaction Inventory (MSI)

ChromaGraph

jinal in Color)

_.

Conventionalization (CNV)

Nn.

Global Distress (GDS)

+

+

sequence:

Processed:

Marriage identification number: 30T

++

answer sheet(s)

as

40T

60T

s50T

a

‘ee

pa

2 Affective Communication (AFC)

ba Problem-Solving Communication (PSC) a.

Time Together (TTO)

2 Disagreement About Finances (FIN) N

. Sexual Dissatisfaction (SEX)

oo.

Role Orientation (ROR)

. Family History of Distress (FAM) 1 . Dissatisfaction With Children (DSC) _==—-

Conflict Over Childrearing (CCR)

Client Key

is {

Possible } Problem

Problem

“FIG. 14.1. Group mean intake and termination MSI Profiles for 59 couples treated in conjoint marital therapy. The MSI ChromaGraph is copyrighted © 1988 by Western Psychological Services. Reprinted by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025.

14

MARITAL SATISFACTION INVENTORY

343

By termination, these couples’ mean profile showed greatest reduction on Global Distress and scales comprising the affective triad, with scores on these measures moving to the nondistressed range (S0—55T). Couples’ mean termination profile also exhibited a small increase on Conventionalization (5 T-score points) and moderate reductions on scales assessing specific domains of relationship distress (FIN, SEX, DSC, and CCR). This group termination profile included responses from couples who subsequently divorced within a 4-year period. Overall, couples’ MSI profiles in response to successful marital therapy could be expected to approach—but most likely not reach—scores of 50T across most scales. Again, it should be emphasized that a high degree of within-group variability exists for marital therapy couples across all scales, particularly those assessing specific sources of marital conflict (FIN, SEX, DSC, and CCR), Role Orientation (ROR), and Family History of

Distress (FAM). Consequently, an integrated approach to evaluating outcome is recommended, incorporating both -idiographic and nomothetic perspectives (i.e., analysis of changes in an individual’s MSI profile across time and comparison at termination to group mean profiles for couples completing treatment).

ADDITIONAL CONSIDERATIONS Although generally one anticipates a reduction in marital distress reflected in the MSI profile as treatment continues, there are occasions when an increase in subjective distress on the

MSI may indicate important therapeutic gains. This interpretation appears most warranted when the respondent initially has completed the MSI adopting a defensive response set characterized by high scores on Conventionalization (CNV) and atypically low scores on scales assessing specific areas of relationship distress. Such profiles sometimes occur among couples entering marital therapy where the more distressed spouse has initiated treatment, but his or her partner resists acknowledging relationship difficulties. In the Snyder et al. (1988) study of parents of psychiatrically hospitalized children and adolescents, couples sometimes produced defensive MSI profiles at the time of their children’s initial hospitalization. Subsequently, during collateral therapy, these couples’ ability to examine multiple sources of family distress more openly and confront significant difficulties in their own marriage reflected important therapeutic progress.

Case Study: Mr. and Mrs. K. BACKGROUND Mr. and Mrs. K., aged 43 and 37, are a White, middle-class couple who entered into marital therapy after Mrs. K. was referred for psychotherapy by her physician to help control recurrent depression. When they began marital therapy, Mr. and Mrs. K. had been married for 31/2 years, having cohabitated for 4 years prior to their marriage. The present marriage was the third for both Mr. and Mrs. K. They had no children together, but Mr. K. had four

children from his two previous marriages; Mrs. K. electively aborted her only pregnancy during her second marriage.

DATA

344

SNYDER AND COSTIN Although Mrs. K. initially was referred for individual psychotherapy, she requested marital therapy, asserting that much of the depression she was experiencing had its roots in relationship problems with her husband. Although Mr. K. denied serious problems in the marriage, he agreed to participate in the therapy at her request. Mrs. K. reportedly had been considering divorce, but had not contacted an attorney at the time they began marital therapy. Mrs. K. blamed their marital problems, as well as her depression, on Mr. K.’s apparent preference for spending leisure time on the golf course with friends rather than with her, his “irresponsible attitude” toward finances, and a tendency to “spend money frivolously.” On the other hand, Mr. K. attributed what he described as “minor disagreements” to his wife’s moodiness, her overconcern with money, and a general lack of common

interests. Whereas

Mr. K. tended to be quite idealistic in his assessment of the marriage, denying angry or otherwise negative feelings toward his wife, Mrs. K. painted a somewhat hopeless picture of their relationship, expressing little confidence in her marriage to endure or in her husband’s ability to make desired changes. Despite their disagreements about finances and leisure time together, both Mr. and Mrs. K. described their partner as emotionally supportive and their sexual relationship as generally satisfactory.

INITIAL PROFILES ON THE MSI When this chapter was written, Mr. and Mrs. K. had been in conjoint marital therapy for 5 months. During this time, both also had participated in individual sessions focusing on individual problems that contributed to distress in the marriage. During the initial evaluation, each spouse completed the MSI separately. Results from this instrument were utilized, along with initial clinical impressions, to guide goal setting with Mr. and Mrs. K. in the subsequent session. The couple’s initial profiles on the MSI are shown in Fig. 14.2. The results of the MSI for Mr. and Mrs. K. highlighted differences in their overall evaluation of the marriage. Whereas Mrs. K. tended to be candid and realistic in her

appraisal of the relationship, and openly described problems that resulted in overall distress for her, Mr. K. presented a more idealized picture of his marriage, minimizing or denying potential problems. Mr. K.’s general style of minimizing or avoiding conflicts proved to be a source of considerable difficulty for him, as well as frustration for Mrs. K. It became apparent early in treatment sessions that Mr. K.’s method for coping with unacknowledged

problems was through periodically drinking to excess, at which times he would typically start an argument or accuse Mrs. K. of having an affair. Mr. K.’s defensiveness was accompanied by an underlying fragility and core fear of inadequacy; although his behavior protected him

emotionally, his avoidance of relationship issues contributed to Mrs. K.’s feelings of emotional isolation and anger toward her husband. The MSI also illustrated similarities in Mr. and Mrs. K’s views of their affective and problem-solving communication. Both expressed satisfaction with the affective quality of the marriage and their ability as a couple to resolve interpersonal disagreements. The MSI results accurately reflected the tendency of each to listen attentively to the other and to engage in

minimal overt hostile interactions. Indeed, both Mr. and Mrs. K.’s scores on Global Distress

and the two communication scales (AFC and PSC) reflected considerably less distress than

the typical couple entering marital therapy (see Fig. 14.1 for comparison data). This informa-

14

‘WPS TEST REPORT” ChromaGraph

MARITAL SATISFACTION INVENTORY

Marital Satisfaction Inventory (MSI)

___

(Original in Color)

BReWE renee)

4

yes

sequence

Marriage identification number: 30T

Processed: 40T

low

50T average

60T

70T

B0T

90T high

1. Conventionalization (CNV)

2. Global Distress (GDS) Mr. / Mes.

Reece

3. Affective Communication (AFC)

K.

INTAKE

4. Problem-Solving Communication (PSC) 5. Time Together (TTO) 6. Disagreement About Finances (FIN) 7. Sexual Dissatisfaction (SEX) 8. Role Orientation (ROR) 9. Family History of Distress (FAM) 10. Dissatisfaction With Children (DSC) 11. Conflict Over Childrearing (CCR) 5 30T

Husband Wite

SS

Soe

40T

lan 50T

aS

hice 60T

7") Led

Possible Problem

oo 70T

SE aang 80T

a, 90T

Problem

FIG. 14.2. Intake MSI profiles for Mr. and Mrs. K. The MSI ChromaGraph is copyrighted © 1988 by Western Psychological Services. Reprinted by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025.

tion was shared with the couple as a way to affirm their overall relationship strength and provide a more positive basis for confronting specific areas of marital discord. Although the MSI results suggested that this couple generally was adept at problem solving and supportive of each other emotionally, they also highlighted their inability up to this point to resolve disagreements about finances and leisure time together. Both Mr. and Mrs. K. reported dissatisfaction with the amount and quality of leisure time spent with their partner, with Mr. K.’s score in the moderately distressed range and Mrs. K.’s high score reflecting more general dissatisfaction with their marriage. Both partners also acknowledged disagreements about finances, although Mrs. K. reported considerably greater distress in this area. In fact, her distress over finances was higher than in any other area, including her concern over quality and quantity of time spent with her partner. When questioned further, it

became apparent that distress over finances for this couple was related closely to issues of control and trust. Mr. K.’s child-support payments to his former wives frequently led to disagreements about spending money, and often left Mrs. K. feeling cheated as well as guilty

and unhappy about being childless herself. Despite disagreements in these areas, both Mr. and Mrs. K. reported a satisfying sexual relationship, with few disagreements over frequency or variety of sexual interactions. Neither partner responded to items concerning childrearing (DSC and CCR scales). Both

345

346

SNYDER AND COSTIN

reported that they had little contact with Mr. K.’s children, who all lived some distance from him. Further inquiry revealed that the issue of children was a heated one for this couple. Mrs. K. admitted to wanting to have children of her own, but Mr. K. stated adamantly that he was unwilling to consider the possibility, although he had agreed to do so at the beginning of their marriage. The MSI showed that both Mr. and Mrs. K. endorsed a nontraditional role orientation. Mr. K.’s high score on this scale was contradictory to initial clinical impressions, in which he presented as a rather conservative, traditional male. This discrepancy was judged to reflect Mr. K.’s tendency to respond in a socially desirable direction to questions about his marriage. Finally, the MSI pointed to significant stresses in each partner’s family of origin, potentially contributing to marital distress. Follow-up of these comments revealed that Mr. K. was abandoned as an infant and adopted by a couple who later had a son of their own. His adoptive parents were described as poorly educated “hillbillies” who were quite reserved emotionally. His father had been critical and unsupportive during his formative years, whereas his mother played a passive subservient role. In addition, Mr. K. had lost both parents during the past 18 months and had experienced severe disruption in his relationship with his only brother. Mrs. K. described growing up as a “military brat,” the lone daughter of a strict disciplinarian father and a very traditional mother who espoused few ideals and aspired to few goals. Mrs. K.’s brother died in a tragic automobile accident when he was 21; subsequently, her parents divorced after 24 years of marriage and her mother married four times after that. Her relationship with both her parents had been somewhat strained for years.

COURSE OF THERAPY The couple was seen in weekly sessions of conjoint marital therapy; each spouse also was seen separately on two to three occasions. During the first 5 months, the therapist attempted to intervene in the specific problem areas highlighted by the MSI. Much of the focus was on increasing the amount of time Mr. and Mrs. K. spent together. They made and fulfilled a number of specific contracts in this regard. Both reported positive benefits from the increase in activities together. In addition, both partners as well as Mrs. K.’s physician reported that

Mrs. K.’s depression had diminished substantially. In regard to their financial problems, this couple was encouraged to discuss their feelings openly, rather than allow resentment to build. Mrs. K. admitted to feeling cheated because of Mr. K.’s monthly child-support payments; Mr. K. expressed resentment about being expected to turn over his monthly check to his wife, and felt excluded from decisions about how and where their money was being spent. By mutual agreement, during the course of therapy, Mrs. K. gave greater responsibility for

paying monthly bills to her husband. This agreement had some positive effect on Mr. K.’s

perceiving recognition from his wife as a responsible manager of their finances and appeared to diminish Mrs. K.’s feelings of being overwhelmed in this regard. A fair amount of time was spent with this couple in exploration of feelings around parenting and desires to have children. Mr. K. began to access unacknowledged feelings about childhood experiences as well as feelings of alienation from his own children. He also began to work through his loss over the deaths of his parents. More cognitively based interventions were used to challenge Mr. K.’s unrealistic standards and assumptions about his marriage and to bolster his coping strategies for dealing with stresses and changes in this relationship. Work with Mrs. K. involved encouraging her to pursue actualization of her own goals and to be more realistic in what she expected from her marriage. Several months after beginning marital therapy, Mrs. K. began to pursue an undergraduate college degree. She Ra

14

MARITAL SATISFACTION INVENTORY

9347

also attempted to initiate a more satisfying relationship with both of her parents, and to deal with guilt and sadness over her earlier abortion.

MIDTREATMENT PROFILES ON THE MSI When this chapter was written, this couple was still in treatment. Midtreatment profiles for Mr. and Mrs. K. (Fig. 14.3) revealed substantial reductions in subjective distress in several areas. Mr. K.’s Conventionalization (CNV) scale indicated a more realistic appraisal of the relationship, whereas Mrs.

K.’s score on this scale reflected a softening from her earlier,

critical evaluation of her marriage. Mrs. K. still reported feeling highly distressed over finances, although substantially less so than at pretreatment. Interestingly, at midtreatment Mr. K. admitted experiencing greater distress in this area than he had at intake. This was attributed, in part, to his having become unemployed recently and feeling greater anxiety regarding the couple’s finances and, in part, to his diminished defensiveness and denial in this area of marital concerns. Finally, although not in the distressed range at pretreatment, the MSI profiles for both spouses demonstrated further improvement in both affective and problem-solving communication and, for Mrs. K., increasing satisfaction with the couple’s sexual relationship.

TEST REPORT”

Marital Satisfaction Inventory (MSI)

_ChromaGraph"

(Original in Color)

[3

answer sheet(s):

|

i

sequence:

Marriage identification number: 30T

Processed: 40T

low

50T

60T

70T

80T

90T

average

high

eS

4

1. Conventionalization (CNV)

[enero

high

2

2. Global Distress (GDS) Mr. / Mrs:

K.

i

MIDTREAT MENT

3. Affective Communication (AFC)

3

{4

4. Problem-Solving Communication (PSC) 5. Time Together (TTO) 6. Disagreement About Finances (FIN) 7. Sexual Dissatisfaction (SEX)

8

ae

8. Role Orientation (ROR) 9. Family History of Distress (FAM) 10. Dissatisfaction With Children (DSC) 11. Conflict Over Childrearing (CCR)

Se 30T

50T

Tana 60T

Se 70T

80T

90T

"| Possible

Husband

Wile

40T

oe se

Uiderweeed Problem

+

SB

Problem

FIG. 14.3. Midtreatment MSI profiles for Mr. and Mrs. K.. The MSI ChromaGraph is copyrighted © 1988 by Western Psychological Services. Reprinted by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025.

348

SNYDER AND COSTIN

Feedback based on this couple’s midtreatment MSI profiles was used to affirm significant gains both Mr. and Mrs. K. had made in confronting relationship difficulties. Sustaining and increasing gains in the quality of their leisure time together remained a focus of marital therapy. Additional interventions were undertaken in the area of finances to address strains secondary to Mr. K.’s temporary loss of employment and remaining concerns each had regarding shared fiscal responsibilities. Additional interpretations regarding the couple’s substantial reduction in overall marital distress were used to encourage each spouse to examine their own individual issues involving residual conflicts involving the family of origin and, for Mr. K., long-standing hurts and ambivalence regarding relationships with his children from his previous marriages.

Summary and Conclusions In contrast to other measures of marital and family functioning, the clinical interpretation and application of the MSI rest firmly on extensive empirical findings for this instrument, a unique asset noted in several independent reviews of the MSI (Bascue, 1985; Boen, 1988; Dixon, 1985; Lanyon, 1984; Waring, 1985). As a clinical technique, the MSI more clearly

directs the focus of therapeutic interventions. As a diagnostic and therapeutic procedure, the MSI is used in the initial phases of therapy in discussing couples’ presenting complaints and in formulating therapeutic goals. It can be used throughout therapy and at termination in the evaluation of change and redirection of treatment interventions. A distinct advantage of the MSI is the inclusion of both broad- and narrow-band scales for assessing global distress and general response characteristics in addition to more specific sources of marital discord. The relative ease of test administration and scoring makes the MSI a cost-effective means of generating objective assessment data across a broad range of issues relevant to both clinicians and couples entering treatment. The relatively atheoretical framework of the MSI facilitates its incorporation into various therapeutic contexts adopting different theoretical orientations. With its emphasis primarily on behavioral and attitudinal components of the marital relationship, the MSI neither assumes nor precludes higher order inferences regarding either intrapsychic or systemic determinants of relationship distress. Additional assessment data from interview or other structured techniques are integrated easily with the MSI to suggest potential intrapersonal or systemic dynamics contributing to areas of marital concern. The computerized report for the MSI integrates a broad range of research findings for this instrument in both a psychometrically valid and clinically useful manner. Future investigations examining configural interpretation of MSI profiles, incorporation of content through delineation of critical items, external criterion studies of the computerized report, and predic-

tion of differential response to competing therapeutic modalities should all contribute to the clinical and research utility of this instrument.

References Bascue, L. O. (1985). Review of the Marital Satisfaction Inventory. Test Critiques, 3, 415— 418.

Baucom, D. H., & Epstein, N. (1990). Cognitive behavioral marital therapy. New York: Brunner/ Mazel.

14 Beach, S. R. H., & Broderick, J. E. (1983). Commitment: A variable in women’s response to marital therapy. American Journal of Family Therapy, 11, 16-24. Beach, S. R. H., Sandeen, E. E., & O'Leary, K. D. (1990). Depression in marriage: A model for etiology and treatment. New York: Guilford. Bem, S. L. (1981). Bem Sex Role Inventory: Professional manual. Palo Alto, CA: Consulting Psychologists Press. Bennun, I. (1985). Prediction and responsiveness in behavioural marital therapy. Behavioural Psychotherapy, 13, 186-201. Berg, P., & Snyder, D. K. (1981). Differential diagnosis of marital and sexual distress: A multidimensional approach. Journal of Sex and Marital Therapy, 7, 290-295. Boen, D. L. (1988). A practitioner looks at assessment in marital counseling. Journal of Counseling

and Development,

66, 484—486.

Burisch, M. (1984). Approaches to personality inventory construction: A comparison of merits. American Psychologist, 39, 214-227.

Ciarlo, J. A., Brown, T. R., Edwards, D. W., Kiresuk, T. J., & Newman, F. L. (1986). Assessing mental health treatment outcome measurement techniques. DHHS Pub. No. (ADM)86-1301. Washington, DC: Supt. of Docs., U.S. Govt. Print. Off. Crane, D. R., Newfield, N., & Armstrong, D. (1984). Predicting divorce at marital therapy intake: Wives’ distress and the Marital Status Inventory. Journal of Marital and Family Therapy, 10, 305-312. Cromwell, R. E., Olson, D. H., & Fournier, D. G. (1976). Tools and techniques for diagnosis and evaluation in marital and family therapy. Family Process, 15, 1-49. Crowe, M. J. (1978). Conjoint marital therapy: A controlled outcome study. Psychological Medicine, 8, 623-636. Dixon, D. N. (1985). Review of the Marital Satisfaction Inventory. In J. Mitchell (Ed.), The ninth mental measurements yearbook. (Vol. 1,

pp. 894-895). Lincoln, NE: University of Nebraska Press. Emery, R. E. (1988). Marriage, divorce, and children’s adjustment. Newbury Park, CA: Sage. Gotlib, I. H., & McCabe, S. B. (1990). Marriage and psychopathology. In F. D. Fincham & T. N. Bradbury (Eds.), The psychology of

MARITAL SATISFACTION INVENTORY

marriage: Basic issues and applications (pp. 226-257). New York: Guilford. Greenberg, L. S., & Johnson, S. M. (1988). Emotionally focused therapy for couples. New York: Guilford. Hahlweg, K., Schindler, L., Revenstorf, D., & Brengelmann, J.C. (1984). The Munich marital therapy study. In K. Hahlweg & N. S. Jacobson (Eds.), Marital interaction: Analysis and modification (pp. 3—26). New York: Guilford. Hase, H. D., & Goldberg, L. R. (1967). Comparative validity of different strategies of constructing personality inventory scales. Psychological Bulletin, 67, 231-248. Hathaway, S. R., & McKinley, J.C. (1967). The Minnesota Multiphasic Personality Inventory manual. New York: Psychological Corporation. Heim, S. C., & Snyder, D. K. (1991). Predicting depression from marital distress and attributional processes. Journal of Marital and Family Therapy, 17, 67-72. Hoover, D. K., & Snyder, D. K.

(1991).

Validi-

ty of the computerized interpretive report for the Marital Satisfaction Inventory: A customer satisfaction study. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3, 213-217.

Jacobson, N. S., Dobson, K., Fruzzetti, A. E., Schmaling, K. B., & Salusky, S. (1991). Marital therapy as a treatment for depression. Journal of Consulting and Clinical Psychology, 59, 547-557. Jacobson, N. S., Follette, W. C., & Pagel, M. (1986). Predicting who will benefit from behavioral marital therapy. Journal of Consulting and Clinical Psychology, 54, 518-522. Jacobson, N. S., & Margolin, G. (1979). Marital therapy: Strategies based on social learning and behavior exchange principles. New York: Brunner/Mazel. Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Lanyon, R. I. (1984). Personality assessment. Annual Review of Psychology, 35, 667-701. Locke, H. J., & Wallace, K. M. (1959). Short marital adjustment prediction tests: Their reliability and validity. Marriage and Family Living, 21, 251-255.

349

350

SNYDER AND COSTIN

Roffe, M. W., & Britt, B.C. (1981). A typology of marital interaction for sexually dysfunctional couples. Journal of Sex and Marital Thera-

py, 7, 207-222. Scheer, N. S., & Snyder, D. K. (1984). Empirical validation of the Marital Satisfaction Inventory in a nonclinical sample. Journal of Consulting and Clinical Psychology, 52, 88-

96. Schumm, W. R. (1990). Evolution of the family field: Measurement principles and techniques. In J. Touliatos, B. F. Perlmutter, & M. A. Straus (Eds.), Family measurement techniques (pp. 23-36). Newbury Park, CA: Sage. Smith, Gay, eonyders Dakss troll el dz Monsma, B. (1988). Predicting relationship satisfaction from couples’ use of leisure time. American Journal of Family Therapy, 16, 3—

13: Snyder, D. K. (1979a). Marital Satisfaction Inventory. Los Angeles, CA: Western Psychological Services. Snyder,

D.

sessment

K.

(1979b).

Multidimensional

as-

of marital satisfaction. Journal of

Marriage and the Family, 41, 813-823.

Snyder, D. K. (1981). Manual for the Marital Satisfaction Inventory. Los Angeles, CA: Western Psychological Services. Snyder, D. K. (1982). Advances in marital assessment: Behavioral, communications, and psychometric approaches. In C. D. Spielberger & J. N. Butcher (Eds.), Advances in personality assessment (Vol. 1, pp. 169-201). Hillsdale, NJ: Lawrence Erlbaum Associates. Snyder, D. K. (1983). Clinical and research applications of the Marital Satisfaction Inventory. In E. E. Filsinger (Ed.), Marriage and family assessment: A sourcebook for family therapy (pp. 169-189). Beverly Hills, CA: Sage. Snyder, D. K. (1990). The Marital Satisfaction Inventory: An actuarial approach to assessing relationships. In F. W. Kaslow (Ed.), Voices in family psychology (Vol. 2, pp. 261-271). Newbury Park, CA: Sage. Snyder, D. K. (in press). Revised manual for the Marital Satisfaction Inventory. Los Angeles: Western Psychological Services. Snyder, D. K., & Berg, P. (1983a). Determinants of sexual dissatisfaction in sexually distressed couples. Archives of Sexual Behavior,

12, 237-246. Snyder, D. K., & Berg, P. (1983b).

Predicting

couples’ response to brief directive sex therapy. Journal of Sex and Marital Therapy, 9,

114-120. Snyder, D. K., Clark, B. L., & Velasquez, J. M. (1991, March). Convergent and discriminant validity of the MMPI-2 gender-role scales. Paper presented at the meeting of the Society for Personality Assessment, New Orleans, LA. Snyder, D. K., Freiman, K. E., & Lachar, D. (1989, August). Convergent validity of the Marital Satisfaction Inventory: A national validation study. Paper presented at the meeting of the American Psychological Association, New Orleans, LA. Snyder, D. K., Fruchtman, L., & Scheer, N. (1980). Relationships of physically abused women: An objective appraisal. Unpublished manuscript, Wayne State University, Detroit. Snyder, D. K., Gdowski, C. L., & Lowman, J. C. (1980, October). New developments in the actuarial assessment of marital and family interaction. Symposium presented at the meeting of the National Council on Family Relations, Portland, OR. Snyder, D. K., Klein, M. A., Gdowski, C. L., Faulstich, C., & LaCombe, J. (1988). Generalized dysfunction in clinic and nonclinic families: A comparative analysis. Journal of Abnormal Child Psychology, 16, 97-109. Snyder, D. K., & Lachar, D. (1986). A computerized interpretation system for the Marital Satisfaction Inventory. Los Angeles, CA: Western Psychological Services. Snyder, D. K., Lachar, D., Freiman, K. E., & Hoover, D. W. (1991). Toward the actuarial assessment of couples’ relationships. In J. P. Vincent

(Ed.), Advances

in family

interven-

tion, assessment, and theory (Vol. 5, pp. 89— 122). London: Kingsley. Snyder, D. K., Lachar, D., & Wills, R. M. (1988). Computer-based interpretation of the Marital Satisfaction Inventory: Use in treatment planning. Journal of Marital and Family Therapy, 14, 397—409. Snyder, D. K., Mangrum, L. F., & Wills, R. M. (1993). Predicting couples’ response to marital therapy. Journal of Consulting and Clinical Psychology, 61, 61-69. Snyder, D. K., & Regts, J. M. (1982). Factor scales for assessing marital disharmony and disaffection. Journal of Consulting and Clinical Psychology, 50, 736-743. Snyder, D. K., & Regts, J. M. (1990). Per-

14 sonality correlates of marital dissatisfaction: A comparison of psychiatric, maritally distressed, and nonclinic samples. Journal of Sex and Marital Therapy, 16, 34—43. Snyder, D. K., & Smith, G. T. (1986). Classification of marital relationships: An empirical approach. Journal of Marriage and the Fami-

ly, 48, 137-146. Snyder, D. K., Trull, T. J., & Wills, R. M. (1987). Convergent validity of observational and self-report measures of marital interaction. Journal of Sex and Marital Therapy, 13,

224-236. Snyder, D. K., Velasquez, J. M., & Clark, B. L. (1991, August). Cross-generational transmission of marital and nonspecific role attitudes. Paper presented at the meeting of the American Psychological Association, San Francisco. Snyder, D. K., Widiger, T. A., & Hoover, D. W. (1990). Methodological considerations in validating test interpretations: Controlling for response bias. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 2, 470—477. Snyder, D. K., & Wills, R. M. (1989). Behavioral versus insight-oriented marital therapy:

Effects on individual and interspousal functioning. Journal of Consulting and Clinical Psychology, 57, 39-46. Snyder, D. K., & Wills, R. M. (1991). Facilitating change in marital therapy and research. Journal of Family Psychology, 4, 426-435. Snyder, D. K., Wills, R. M., & Grady-Fletcher, A. (1991). Long-term effectiveness of behavioral versus insight-oriented marital therapy: A four-year follow-up study. Journal of Consulting and Clinical Psychology, 59, 138-141.

Snyder, D. K., Wills, R. M., & Keiser, T. W.

MARITAL SATISFACTION INVENTORY

(1981). Empirical validation of the Marital Satisfaction Inventory: An actuarial approach. Journal of Consulting and Clinical Psychology, 49, 262-268. Spanier, G. B. (1976). Measuring dyadic adjustment: New scales for assessing the quality of marriage and similar dyads. Journal of Marriage and the Family, 38, 15—28. Spence, J. T., Helmreich, R., & Stapp, J. (1975). Rating of self and peers on sex-role attributes and their relation to self-esteem and conceptions of masculinity and femininity. Journal of Personality and Social Psychology, 32, 2939: Waring, E. M. (1985). Review of the Marital Satisfaction Inventory. In J. Mitchell (Ed.), The ninth mental measurements yearbook. (Vol. 1, pp. 895-896). Lincoln, NE: University of Nebraska Press. Weiss,

R. L., & Margolin,

G.

(1977).

Assess-

ment of marital conflict and accord. In A. R. Ciminero, K. S. Calhoun, & H. E. Adams (Eds.), Handbook of behavioral assessment (pp. 555-602). New York: Wiley. Weiss, R. L., & Heyman, R. E. (1990). Observation of marital interaction. In F. D. Fincham & T. N. Bradbury (Eds.), The psychology of marriage: Basic issues and applications (pp. 87—117). New York: Guilford. Wills, R. M., & Snyder, D. K. (1982). Clinical use of the Marital Satisfaction Inventory: Two case studies. American Journal of Family Therapy, 10, 17-26. Wirt, R. D., Lachar, D., Klinedinst, J. K., & Seat, P. D. (1984). Multidimensional description of child personality: A manual for the Personality Inventory for Children. 1984 Revision by D. Lachar. Los Angeles, CA: Western Psychological Services.

351

Chapter 15 Katz Adjustment Scales James R. Clopton Texas Tech University

Roger L. Greene Pacific Graduate School of Psychology

The Katz Adjustment Scales (Katz & Lyerly, 1963) were developed 30 years ago to help assess patients before their admission to a psychiatric hospital and after their return to the community. Although the scales also have been used in a variety of ways since their development, the most common use of the scales has been as a research instrument for assessing patients following their discharge from a psychiatric hospital. This chapter summarizes the development of the Katz Adjustment Scales, reviews the reliability and validity data for the scales, and samples from the extensive research using the scales. Finally, an evaluation of the Katz Adjustment Scales is provided.

Overview of the Katz Adjustment Scales SUMMARY OF DEVELOPMENT In designing the scales, Katz and Lyerly (1963) sought to assess not only the presence or absence of symptoms of psychopathology, but also whether patients and their family members were satisfied with the patients’ daily activities and social functioning. The use of ratings by family members or others familiar with the patient was considered by Katz and

Lyerly to be similar to the traditional reliance on such individuals for background information when a highly disturbed patient is admitted to a psychiatric hospital. The Katz Adjustment Scales consist of two sets of five scales, one set to be completed by a family member or other person familiar with the patient (the R scales) and the other set to be completed by the patient (the S scales). The RI Scale: Relative’s Ratings of Patient's Symptoms and Social Behavior.

The 127

items of the-R1 scale were designed to assess social behavior and symptoms of psychopathology. Examples of items assessing positive social behaviors are “Shows good judgment,” “Is independent,” and “Remembers important things.” Other items assess social behavior that

352

A.

15

KATZ ADJUSTMENT SCALES

would disturb others or be indicative of psychopathology. For example, the following items are included: “Lies,” “Says the same thing over and over again,” and “Threatens to injure certain people.” When these 127 items are completed at the time of hospital admission, the items are rated as either “present” or “absent,” but when they are completed after discharge from the hospital, they are rated on a 4-point scale (“‘almost never,” “sometimes,” “often,” or “almost always’). No rationale was given for this difference in response format for admission ratings and follow-up ratings. Katz and Lyerly (1963) were aware of the problems in using information about patients obtained from their family members. For example, they noted that family members may be reluctant to make negative judgments about the patient, such as describing the patient as unfriendly. To limit such difficulties, an attempt was made to select items for the R1 scale that describe specific behaviors and that avoid asking the family member to judge the patient. According to Katz and Lyerly (1963), phrases such as “looks like,” “acts as if,” and “says” were used with some items to emphasize that the relative is to describe the patient’s behavior. Examples of items with such phrases are “Looks worn out,” “Says that people are talking about him,” and “Acts as if he sees people or things that aren’t there.” The 127 items of the R1 scale require a sixth-grade reading comprehension and take about 25—45 minutes to complete when the 4-point scale is used (Zimmermann, Vestre, & Hunter,

1974): The R2 Scale: Level of Performance of Socially Expected Activities; The R3 Scale: Level of Expectations. The R2 and R3 scales use the same 16 items that describe self-care responsibilities and activities in the home and community, which were adapted from a list prepared by Freeman and Simmons (1958). The R2 scale asks the relative to rate each item on a 3-point scale, depending on whether the patient is “not doing” the activity described, “is doing it some,” or “is doing it regularly.” For example, relatives are asked to rate the following items: “Helps with household chores,” “Dresses and takes care of himself,” and “Goes to parties and other social activities.” The R3 scale asks whether the relative expects the patient to be doing the activity. The R3 scale was not intended originally as a separate measure, but was included so that the relative’s level of satisfaction with the patient’s performance could be assessed by examining the difference between the patient’s activities (the R2 scale) and the relative’s expectations about the patient’s activities (the R3 scale). This indirect method of assessment was intended to make it easier for a relative to indicate dissatisfaction with the level of activity and responsibility shown by the patient.

Katz and Lyerly (1963) reported that the most accurate method for using the R2 and R3 scales to assess the relative’s satisfaction with the patient’s activities was to sum the differences on corresponding items of the two forms. The R4 Scale: Level of Free-Time Activities; The R5 Scale: Level of Satisfaction with FreeTime Activities. The R4 and RS scales function in much the same way as the R2 and R3 scales, and there is some overlap in the two sets of items. However, for the R4 and RS scales,

the focus is more on hobbies and leisure activities. Examples of items are “Works on some

hobby,” “Takes part in community or church work,” and “Visits friends.” The 23 items for these two scales were modeled after the Activities and Attitudes Scale (Cavan, Burgess, Havighurst, & Goldhamer, 1949). The R4 scale asks the relative to rate each item on a 3-point scale, depending on whether the patient is doing the activity “frequently,” “sometimes,” or “practically never.” The RS scale is rated on a 2-point scale, depending on whether the relative is satisfied or dissatisfied with the patient’s participation in that activity. Dissatisfaction includes wanting the patient to do either more or less of the activity. The contrast between the indirect method of assessing satisfaction with the R3 scale and the direct

353

354

CLOPTON AND GREENE

assessment for the R5 scale reflects the authors’ belief that for the activities assessed by the R4 scale, it would be “less realistic” to ask about the relative’s expectations. The activities described in the R2 and R4 scales make them primarily applicable to schizophrenics and other chronically ill individuals. The S1 Scale: Symptom Discomfort. The S1 scale’s 55 items, which assess the patient’s discomfort with symptoms and problems, were adapted from the Johns Hopkins Symptom Checklist (Parloff, Kelman, & Frank, 1954). The patient rates each item on a 4-point scale,

depending on whether the patient has had the symptom during the past few weeks and how often the patient has been bothered by the symptom (“not had the complaint,” “a little,” “quite a bit,” or “almost all the time’). The S2 Scale: Level of Performance of Socially Expected Activities; The S3 Scale: Level of Expectations; The S4 Scale: Level of Free-Time Activities; The S5 Scale: Level of Satisfaction with Free-Time Activities.

The S2, S3, S4, and S5 scales are identical in content

and

scoring to the corresponding scales completed by relatives, except that the wording was adapted for self-ratings. Instructions for administering the Katz Adjustment Scales are provided in an appendix to the original monograph (Katz_& Lyerly, 1963), and additional instructions have been provided by Michaux, Katz, Kurland, and Gansereit (1969).

In addition to describing the development of the scales, Katz and Lyerly (1963) reported the results of two studies: an initial validity study (described later in the section on validity) and the use of cluster analysis to develop a more refined scoring method for the R1 scale. The patients in this second study were 100 newly admitted state hospital patients (48 men, 52 women), mostly patients with schizophrenia, who were referred for medication to relieve their “anxiety, agitation, and restlessness (p. 523).” Over half of the informants who com-

pleted the R1 scale were spouses of the patients; the other informants were siblings, parents, other relatives, and friends. The response format was changed from a 4-point scale to “yes”

or “no” to shorten the time that it took the relatives to complete the scale. Cluster analysis was applied to the data obtained from the relatives of the patients, with the goal being clusters that were internally consistent and relatively independent of other clusters. The 11 cluster subscales identified were: Belligerence, Verbal Expansiveness, Negativism, Helplessness, Suspiciousness, Anxiety, Withdrawal and Retardation, Nervousness,

Confusion, Bizarreness, and Hyperactivity. In addition, those R1 scale items that were correlated highly with at least five clusters became a subscale assessing general psychopathology. The internal consistency of the cluster subscales was assessed by analyzing relatives’ ratings of the R1 scale for 242 newly admitted patients who had symptoms of schizophrenia. The internal consistency coefficients for 11 of the cluster subscales ranged from .41 to .81. (A consistency coefficient for the Confusion cluster scale was not reported.) A factor analysis of the cluster subscales was performed using relatives’ ratings of the Rl scale for 404 patients from nine hospitals and treatment centers, including the 242 patients whose data had been used to determine the internal consistency of the subscales. (Again, the Confusion cluster scale was omitted.) Three factors were identified that accounted for 57% of the total variance of the subscales. Inspection of the factor loadings for the cluster subscales led Katz and Lyerly (1963) to interpret the three factors as social obstreperousness,

acute psychoticism, and withdrawn depression. The authors noted that these three factors failed to account for much of the variance in scores for three cluster subscales (suspiciousness, nervousness, and hyperactivity). Thus, the three factors identified by Katz and Lyerly " %

.

15

KATZ ADJUSTMENT SCALES

355

(1963) accounted for only about half of the variance in the R1 scale scores of patients and generally were unrelated to three of the cluster scales. The R1 scale is by far the most commonly used of the Katz Adjustment Scales. Therefore, the items for the R1 cluster subscales are presented in Table 15.1. The other R1 scale items, as well as the items for the other scales, can be found in Katz and Lyerly (1963).

DIVERSE APPLICATIONS OF THE KATZ ADJUSTMENT SCALES The greatest strength of the Katz Adjustment Scales, besides their durability in being used in research for 30 years, is the diversity of their research applications. The Katz scales have been used in cross-cultural research on schizophrenia, and also have been used to assess these diverse groups: patients in neuropsychological rehabilitation (Baird et al., 1987; Fordyce, Roueche, & Prigatano, 1983; Klonoff, Costa, & Snow, 1986; Newton & Johnson, 1985; Posthuma & Wild, 1988; Stambrook, Moore, & Peters, 1990); elderly participations in a

day-care center (Johnson & Maguire, 1989); patients in time-limited group psychotherapy (Bernard & Klein, 1979); drivers in fatal automobile accidents (Schmidt, Perlin, Townes, Fisher, & Shaffer, 1972); and participants in a cardiac rehabilitation program (Stern &

Cleary, 1981). Cross-Cultural Research with Psychiatric Patients with Schizophrenia.

Katz, Sanborn,

and Gudeman (1970) assessed differences in the symptoms of psychopathology shown by Caucasian patients and patients of Japanese ancestry admitted to Hawaii State Hospital. Most of the patients in each ethnic group had a diagnosis of schizophrenia. Several striking differences were found for these two ethnic groups during the first week in the hospital when the patients were interviewed and rated with the Mental Status Schedule (Spitzer, Fleiss, Endicott, & Cohen, 1967) and the Inpatient Multidimensional Psychiatric Rating Scale (Lorr, Klett, McNair, & Lasky, 1963). Caucasian patients were “more anxious, depressed, hostile, and excited than the Japanese” (p. 25). patients, and Japanese patients were more apathetic and disoriented than the Caucasian patients. In contrast to the striking differences on the two measures completed by professionals, there were few differences in the R1 scale ratings provided by relatives of the two groups of patients. The difference in the way professionals and relatives rated patients from the two ethnic groups was puzzling to Katz et al. (1970), but they noted that the difference could be caused, at least partly, by clinicians’ preconceptions about the symptoms typically found for patients from the two ethnic groups. Chu, Sallach, and Klein (1986) compared hospital patients with schizophrenia from urban

and rural areas on the R and S scales. There were several differences indicating that patients from rural areas had better social adjustment than patients from urban areas. Patients from rural areas expected more participation in social activities of themselves than patients from urban areas. Similarly, relatives of patients from rural areas expected the patients to partici-

pate more in social activities than did the relatives of patients from urban areas. These findings suggest that the higher expectations held by patients from rural areas and their relatives may have contributed to increased social participation and better adjustment. Neuropsychological Assessment and Rehabilitation. Stambrook et al. (1990) administered the R scales to the spouses of patients with closed head injury, and found that the severity of the head injury was related to ratings of greater social maladjustment and lowered

expectations of social behavior. Patients with severe closed head injuries were rated by their spouses as maladjusted as psychiatric patients.

TABLE 15.1 3 Items for the 12 Cluster Subscales of the R1 Scale From the Katz Adjustment Scales

1. BELLIGERENCE 28. 45. 50. 113.

Got angry and broke things Got into fights with people Cursed at people Threatened to tell people of

2. VERBAL EXPANSIVENESS 99. 100. 105. 106. 118.

Spoke very loudly Shouted or yelled for no reason Kept changing from one subject to another for no reason Talked too much Bragged about how good he was

3. NEGATIVISM 36. Acted as if he did not care about other people’s feelings 37. Thought only of himself 46. Was not cooperative 47. Did the opposite of what he was asked 48. Stubborn 51. Deliberately upset routine 56. Critical of other people 59. Lied 60. Got into trouble with law

4.

HELPLESSNESS

3.

Cried easily

74. 92. 93.

Acted helpless Acted as if he could not concentrate on one thing Acted as if he could not make decisions

5. SUSPICIOUSNESS 40. 43. 107. 108.

Thought people were taling about him Acted as if he were suspicious of people Said people were talking about him Said that people were trying to make him do or think things he did not want to

6. ANXIETY 18. 19. 23. 111. 122. 125.

Had strange fears Afraid something terrible was going to happen Got suddenly frightened for no reason Talked about people or things he was afraid of Said that something terrible was going to happen Talked about suicide

7. WITHDRAWAL AND RETARDATION 8. Just sat 17. Needed to do things very slowly to do them right

70. Quiet

76. Moved about very slowly 80. Very slow to react 84. Would stay in one position for long period of time ee

(Continued)

356

ay

TABLE 15.1 (Continued)

8 .

GENERAL PSYCHOPATHOLOGY

Acted as if he had no interest in things

ee Felt that people did not care about him 30. Acted as if he had no control over his emotions

ere

“hy aes ign. alte oe mcoayt, 5 i

ai



Penbres. 1) 110). Dota

nS

fhe phe leone

Crvetkee:

te Mediated ne Boe maleita

oi te\> te cedtow ot ip ted pal

ih oye & Vea

ae

Vertes lial

Se fi 6. @aped, eye - 2 fees tae

.

Manes

haben

v

Bape, Mw yeaa) T gowns

Ashes 6 ape veety

taping

ye

—_.,-

ont! Serva: @ Oliwngs Maton 2) veer. Oeeteb seater

ae, aa len sar Pa at! aor:

wea leg MiG. =) (rerterrais Pees eset © Sh ebt ¥

Sp

~~

+ deat ga heritage

Frings%

GF A tet

ine

$y).

bs

mp ee els a arian pe a’ am sate ailpioncionn sPeano '

rca

epee.

7

:

oa;

LS TAAMURTA oe

Peas @s

ee

ly ~mehetairee:

cae

"

p ieee) fel

‘oh

TORCH) ole Seca

SIVAD2TIOUA CAA GHD: eel

—_

byw

Schitemen

Rag; st

a

BO?

me tae

See tiind

a



:

aoueee

_ attiee eb ae a ae

1

The

*

:

3° )-

Aiea

a

erm,

ji

:

19e%

2

tout

ee

iD

fi

pips

ofc

pel

-

a

tems

ote

¥

thee

i

is

Hemi

ed

: yA

ats

pee Aas aan ai!

nine

Manyakaer ti

ner

r

Chapter 18 Minnesota Multiphasic Personality Inventory - Adolescent Robert P. Archer Eastern Virginia Medical School

The purpose of this chapter is to introduce the Minnesota Multiphasic Personality InventoryAdolescent (MMPI-A), a revision of the original MMPI specifically designed for use with teenagers. Similar to the development of the MMPI-2, the MMPI-A attempted to build on the most useful and productive aspects of the original test instrument. Thus, for example, the original MMPI basic clinical scales have been retained in the MMPI-A. However, the MMPI-A also represents an attempt to improve several aspects of the original test instrument in relation to adolescent assessment. These changes include a 16% reduction in the total length of the item pool, revision of 70 items to simplify or improve wording, collection of new national norms representing diverse geographic and ethnic groups, and development of several new scales specifically related to adolescent development and psychopathology.

Summary of Development The Restandardization Committee responsible for the development of the MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer,

1989) also was involved in initial development

of the MMPI-A. Specifically, the Restandardization Committee developed an experimental form of the MMPI (Form TX), which contained 704 items, and supervised the initial collection of normative data with Form TX in several geographic settings. On July 1, 1989, the MMPI Adolescent Project Committee was created by the University of Minnesota Press and consisted of James N. Butcher, Auke Tellegen, Beverly Kaemmer, and Robert P. Archer.

This committee made the final determination to proceed with the development and publication of an adolescent form of the MMPI and provided recommendations concerning normative criteria, item and scale selection, and profile construction to be incorporated in the adolescent form. The Adolescent Project Committee, wishing to maintain continuity between the original MMPI and the MMPI-A, sought to preserve the standard or basic MMPI scales. However, scale F was modified substantially to improve the psychometric performance of

this scale with adolescents, and scales Mf and Si were shortened to reduce the total item pool

423

424

ARCHER

length of the instrument. The MMPI basic clinical scales were developed by Hathaway and McKinley using a criterion keying method. Items were selected for scale membership based on the occurrence of item response frequencies that differentiated between a criterion group manifesting a specific diagnosis or characteristic, and a comparison group (the Minnesota adult normal sample) felt not to manifest the trait or characteristic under study. Indeed, the original MMPI is cited widely (e.g., Anastasi, 1982) as an outstanding example of this method of test construction. In addition to the traditional scales, the MMPI-A

contains four

new validity scales presented within the basic scale profile, 15 content scales and 6 supplementary scales, and 28 Harris-Lingoes and 3 Si subscales. Table 18.1 provides an overview of the scale structure of the MMPI-A, with scales organized into three broad headings corresponding to the three MMPI-A profile sheets. The new validity scales in the basic scale profile include the F', and F subscales, each containing a 33-item subset of the 66-item MMPI-A F scale. These items were selected

TABLE

18.1

Overview of the MMPI-A Scales and Subscales

Basic Profile Scales (17)

Validity Scales (7) VRIN TRIN

Variable Response Inconsistency True Response Inconsistency

FY F2 [= {.

Frequency Lie

K

Defensiveness

Clinical Scales (10) 1/Hs 2/D 3/Hy 4/Pd 5/Mf 6/Pa 7/Pt 8/Se 9/Ma O/Si

Hypochondriasis Depression Hysteria Psychopathic Deviate Masculinity-Femininity Paranoia Psychasthenia Schizophrenia Mania Social Introversion

Content and Supplementary Scales (21) Content Scales (15) A-anx A-obs A-dep A-hea A-aln

- Anxiety Obsessiveness Depression Health Concerns Alienation

A-biz

Bizarre Mentation

A-ang A-cyn A-con A-lse A-las A-sod A-fam A-sch A-trt

Anger Cynicism Conduct Problems Low Self-Esteem Low Aspirations Social Discomfort Family Problems School Problems Negative Treatment Indicators (Continued) » Va

18

MMPI-A

TABLE 18.1 (Continued)

Content and Supplementary Scales (21) Supplementary Scales (6) MAC-R MacAndrew Alcoholism-Revised ACK Alcohol/Drug Problem Acknowledgment PRO Alcohol/Drug Problem Potential IMM Immaturity A Anxiety R Repression

Harris-Lingoes and Si Subscales (31) Harris-Lingoes Subscales (28) D1 Do D3 D4 Ds Hy1 Hyo2 HY3 Hy4 Hys Pd4 Pda Pd3 Pd4

Subjective Depression Psychomotor Retardation Physical Malfunctioning Mental Dullness Brooding Denial of Social Anxiety Need for Affection Lassitude-Malaise Somatic Complaints Inhibition of Aggression Familial Discord Authority Problems Social Imperturbability Social Alienation

Pds

Self-Alienation

Pay Pa2 Pa3 Scj Sco Sc3 Sc4 Ses Scg May Maz2 Maz Ma4

Persecutory Ideas Poignancy Naivete Social Alienation Emotional Alienation Lack of Ego Mastery, Cognitive Lack of Ego Mastery, Conative Lack of Ego Mastery, Defective Inhibition Bizarre Sensory Experiences Amorality Psychomotor Acceleration Imperturbability Ego Inflation

Si Subscales (3) Sit Si2 SIZ

Shyness/Self-Consciousness Social Avoidance Alienation--Self and Others

Note. From MMPI-A: Assessing Adolescent Psychopathology (pp. 59-60) by R. P. Archer, 1992a, Hillsdale, NJ: Lawrence Erlbaum Associates. Copyright (1992) by Lawrence Erlbaum Associates. Reprinted by permission.

based on a criterion that the item was endorsed in the deviant direction by no more than 20% of males and females in the MMPI-A normative sample. The MMPI-A validity scales also include the Variable Response Inconsistency (VRIN) scale and the True Response Inconsistency (TRIN) scale, consistency measures developed using a methodology very similar to that employed in the development of the MMPI-2 counterparts of these measures. The order of appearance of validity scales, from left to right, is as follows: VRIN, TRIN, F,, F>, F, L,

425

426

ARCHER

and K. The 15 MMPI-A content scales heavily overlap with both the MMPI-2 content scales (Butcher et al., 1989) and the Wiggins Scales (Wiggins, 1966, 1969) created for use with the original MMPI. The MMPI-A content scales were developed based on a combination of rational and statistical criterion as described in the MMPI-A

manual (Butcher et al., 1992)

and by Williams, Butcher, Ben-Porath, and Graham (1992) in a book specifically the MMPI-A content scales. The six supplementary scales for the MMPI-A continuation, in modified form, of three scales used with the original form of These scales are slightly shortened versions of Welsh’s (1956) Anxiety (A) and (R) Scales,

and a revision of MacAndrew’s

(1965) Alcoholism

Scale—the

focused on include the the MMPI. Repression

MacAndrew

Alcoholism Scale-Revised (MAC-R). In addition, there are three new supplementary scales developed for the MMPI-A: the Immaturity (MM) Scale, the Alcohol/Drug Problem Acknowledgment (ACK) Scale, and the Alcohol/Drug Problem Potential (PRO) Scale. The Harris-Lingoes content scales (Harris & Lingoes, 1955) developed for the original MMPI were carried over to the MMPI-A, with a few item deletions resulting from modifications of

the item pool within the basic scales. The Si subscales are identical to the MMPI-2 Si subscales, and are presented on the same MMPI-A profile sheet with the Harris-Lingoes subscales. In addition to the 58 items deleted from the original standard scales of the MMPI (88% of these items

occurring

in relation

to F, Mf, or Si), 69 items

were

modified

from

their

appearance in the original test form. Archer and Gordon (1992) examined the equivalency of the revised form of these items in a sample of 266 13- through 17-year-old adolescents. The findings from this study indicated that the items rewritten for the MMPI-A resulted in similar response frequencies to the original versions of these items. The final version of the MMPI-A is a 478-item, true—false objective measure of psychopathology. Scoring for the instrument is accomplished through hand-scoring templates or computer programs available through organizations licensed to score the MMPI-A by the University of Minnesota Press. The scoring of the MMPI-A continues the MMPI tradition of using a simple summation of items endorsed in the critical direction for a particular scale, without the use of differential weighting formulas for items. However, it should be noted that the scoring formula for the TRIN scale, described in the test manual (Butcher et al., 1992), is more complex than that of other scales, because the endorsement of certain item pairs may result in a subtraction from the total raw score value, and because TRIN scale T-score values

must be = 50.

MMPI-A Norms The MMPI-A

normative -data were collected in eight states, seven of which also provided

normative data for the MMPI-2. Adolescent normative subjects generally were solicited by mail from the rosters of junior and senior high schools in preselected areas, and subjects were tested in group sessions usually conducted within school settings. Adolescents in all sites except New York were paid for their participation in the MMPI-A normative data collection, with subjects receiving between $10—$15 at the time of their completion of testing materials. New York subjects participated without reimbursement as part of school activities. In total,

approximately 2,500 adolescents were evaluated in data collection procedures in California,

Minnesota, New York, North Carolina, Ohio, Pennsylvania, Virginia, and Washington State. A variety of exclusion criteria were applied to these data to create the final normative set. Subjects who did not complete all data collection measures, left more than 35 items

18

MMPI-A

unanswered on MMPI Form TX, or produced a raw score value of > 25 on the F scale (using

the original item pool for this scale) were excluded. Subjects below 14 years of age, or above 18 years old, also were excluded from the normative sample. Using these criteria, the final MMPI-A norms were based on 805 male and 815 female respondents. The ethnic backgrounds of these subjects reflected a reasonably balanced sample, with approximately 76% of the data collected from White respondents, roughly 12% from Black adolescents, and the remaining 12% from adolescents representing several ethnic groups, including Hispanic and Native American groups. The MMPI-A normative sample ethnic distribution appears reasonably consistent with U.S. census figures, and several data collection sites were selected to increase sampling from diverse ethnic backgrounds (Butcher et al., 1992). Data presented in the MMPI-A manual (Butcher et al., 1992) summarize parental educational levels as reported by adolescents in the normative sample. These data show that the parents of adolescents used in the MMPI-A normative sample overrepresented higher educational levels in comparison with the 1980 U.S. census, and clearly represent a well-educated group (Archer, 1992a). Approximately 50% of fathers and 40% of mothers of adolescents who participated in the MMPI-A normative sample had obtained an educational level equal to or greater than a baccalaureate degree. In comparison, the 1980 census indicates that only 20% of males and 13% of females reported comparable educational levels. This degree of overrepresentation of better educated individuals in the MMPI-A sample is very similar to the educational bias found for the MMPI-2 adult normative sample (Archer, in press) and probably will be subject to some of the same debates focused on this issue in relation to the MMPI-2. Archer (1992a) speculated that this type of educational and occupational bias is related to the use of unselected, volunteer subjects in normative data collection, a procedure

that tends to sample from better educated components of the society. Additional descriptive data concerning the MMPI-A normative sample, including adolescents’ grade levels, parental occupational levels, and adolescents’ living situations, are reported in the MMPI-A test manual (Butcher et al., 1992).

The MMPI-A norms are based on adolescents between the ages of 14 and 18, inclusive. The mean age for male adolescents in the MMPI-A normative sample is 15.54 years (SD = 1.17 years) and the mean age for females is 15.60 years (SD = 1.16 years). The 18-year-old adolescent group overlaps with the 18-year-old subsample of the MMPI-2 norms, reflecting that an 18-year-old respondent potentially could be evaluated with either the MMPI-A or the MMPI-2. In this regard, the MMPI-A manual recommends the following criterion for determining the form most appropriate to evaluate the 18-year-old: “A suggested guideline would be to use the MMPI-A for those 18-year-olds who are in high school and the MMPI-2 for those who are in college, working, or otherwise living an independent adult lifestyle. The MMPI-A, not the MMPI-2, should always be used for those 17 and younger, regardless of whether they are in school” (Butcher et al., 1992, p. 23).

At the lower end of the MMPI-A normative sample, normative subjects were eliminated below the age of 14. Preliminary data analyses were interpreted by MMPI-A Adolescent Project Committee members as indicating that 12- and 13-year-old subjects tended to produce substantially different normative values than those in the 14- through 18-year-old

grouping, and there were concerns regarding the usefulness of MMPI-A data produced by adolescents under age 14. The MMPI-A

manual notes that the instrument can be used

cautiously with 12- and 13-year-old respondents with an awareness of the higher rate of administration difficulties found in this population. Archer (1992a) provided a set of MMPIA adolescent norms for 13-year-old boys and girls. These norms are based on linear T-score conversions, and use the same exclusion criterion employed for the 14- through 18-year-old MMPI norms developed for the test instrument. In general, preliminary studies (Archer,

427

428

ARCHER

1992a) appear to indicate that MMPI-A norms based on this 13-year-old sample tends to produce lower T-score values on most clinical scales in comparison with T scores from the 14- through 18-year-old MMPI-A norms applied to identical raw score values. The 13-yearold norm set was created to promote research with this age group and to provide the clinician with the potential to evaluate a 12- or 13-year-old adolescent who meets all administration criteria on this specialized norm set in conjunction with the standard MMPI-A norms. Such a comparison would allow the clinician to refine interpretive comments made from the use of the standard MMPI-A norms, based on elevation levels found for the 13-year-old norm set. However, the profile interpretation should be based primarily on the standard MMPI-A norms. The MMPI-A should not be employed with adolescents below the age of 12, and the 12 -and 13-year-old age group will contain many adolescents unable to successfully read and comprehend the MMPI-A item pool. For many years, a sixth-grade reading level generally was accepted as the basic requirement for MMPI administration. The MMPI-2 manual (Butcher et al., 1989) indicates a reading level of eighth grade is required for successful MMPI-2 administration. Archer (1992a) noted that over 80% of the MMPI-A item pool can be read accurately and comprehended by adolescents reading at the seventh-grade reading level. This text also reviewed a variety of methods to evaluate reading comprehension on the MMPI-A, including the use of total test administration time, VRIN scale values, and the random MMPI-A profile configura-

tion expected for the basic scales and the content and supplementary scales. In addition, Krakauer, Archer, and Gordon (1993) reported the development of two experimental subscales for the MMPI-A, the Items-Difficult (/,) and Items-Easy (/,) subscales, designed to

detect adolescents experiencing substantial reading deficits as reflected in their pattern of item endorsements. The MMPI-A, similar to the MMPI-2, employs both linear T-score and uniform T-score transformation procedures within its collection of scales. This may be contrasted with the original adult norms and the adolescent norms developed by Marks and Briggs (1972) for the original form of the MMPI, which were based exclusively on linear transformations to convert raw scores to T-score values. The MMPI-A retained the use of linear T-score conversions for all validity scales and for MMPI-A basic scales 5 and 0. Additionally, linear T-score transformations were employed for all six MMPI-A supplementary scales, and for all scales appearing on the Harris-Lingoes and Si subscales profile sheet. In contrast, eight of the clinical scales on the MMPI-A basic scale profile (/, 2, 3, 4, 6, 7, 8, and 9) and all of the 15

content scales employ uniform T-score transformations. The rationale and methods involved in uniform T-score transformation is discussed extensively in the MMPI-A manual (Butcher et al., 1992). In general, uniform T-score transformations produce T-score values that repre-

sent the average linear T score found for the scales employed in the composite distributions for the basic scales and the content scales analyzed separately by gender. The T-score values obtained by uniform transformations are similar to those obtained from linear T-score conversions for a given scale. The purpose of the uniform T-score procedure is to produce T-score values with equivalent percentile value meanings across scales for a given T score. However, this procedure also maintains the underlying positive skew in the distribution of scores from these measures, thus uniform T scores do not convert to percentile values that would be expected if scores were distributed normally (i.e., a uniform T-score value of 50 does not convert to the 50th percentile on the MMPI-A, but rather to the 55th percentile). Most of the differences found between the Marks and Briggs (1972) norms for the original form of the MMPI and the MMPI-A norms are not attributable to the issue of uniform versus linear T-score transformation procedures.

Rather, these T-score differences result from the

substantial differences in the raw score means and standard deviations produced by these two

18

MMPI-A

429

normative groups on most basic scales. The overall effect of these differences, as is discussed

later, is that MMPI-A T-score values for a given raw score tend to be lower than those produced by the Marks and Briggs (1972) traditional norms. Appendix G of the MMPI-A manual and Appendix E of Archer (1992a) provide T-score conversion tables to permit estimates of the Marks and Briggs normative values that would be produced for a given MMPI-A basic scale raw score value. This allows the clinician to evaluate the extent to which the adolescent’s item responses would have produced a similar profile on the original MMPI, in relationship to the profile obtained on the MMPI-A. This issue is relevant to the degree to which the research literature developed for the original MMPI may be generalized for use in the interpretation of MMPI-A findings for a specific adolescent.

Basic Validity and Reliability Information The MMPI-A manual (Butcher et al., 1992) provides information concerning test—retest reliability, internal consistency, and the factor structure of MMPI-A scales. The range of test—retest correlations for the MMPI-A basic scales varies from r = .49 for F,; tor = .84 for Si and is very similar to the test-retest correlations found for MMPI-2 basic scales. The typical standard error of measurement of basic scales is estimated to be two to three raw score points (Butcher et al., 1992). The internal consistency of MMPI-A

scales, as represented in

coefficient alpha values, ranges from low to moderate values found for scales such as Mf (r = .43) and Pa (r = .57), to high (r = .80) values found for basic scales Pt and Sc, many of the content scales, and the IMM scale. These latter scales were constructed using methods designed to produce high internal consistency values. The factor analytic findings for the MMPI-A are reasonably consistent with prior factor analytic findings reported in adolescent populations for the original MMPI (Archer, 1984; Archer & Klinefelter, 1991). Validity data from normal and clinical samples of adolescents also are presented in the MMPI-A manual. In addition to MMPI

Form TX, adolescents in the MMPI-A

normative data collection weze

administered a 16-item biographical information form and a 74-item life stress events questionnaire. These forms served as external correlate sources to evaluate the concurrent validity of the MMPI-A in the normative sample. The MMPI-A manual also reports validity findings for a clinical sample of 420 boys and 293 girls between the ages of 14 and 18 receiving psychological services in Minnesota. Archer (1992a) also provided MMPI-A scale correlate data from a sample of 128 adolescent inpatients collected in Virginia.

Basic Interpretive Strategy Several guides have been provided for the interpretation of the MMPI-A, including extensive recommendations in the test manual (Butcher et al., 1992) and in recommendations by Butcher and Williams (1992a) and Williams et al. (1992). Table 18.2 provides a brief

overview of the interpretive approach offered by Archer (1992a). The first two steps presented in this model emphasize the importance of consideration of the setting where the MMPI-A is administered, and the history and background information available for the adolescent who is under evaluation. As reviewed in Archer (1987), the original form of the MMPI has been used for research and clinical purposes with adolescents in a variety of settings, including public and private schools, medical groups, alcohol and drug treatment

430

ARCHER TABLE 18.2 Steps in MMPI-A Profile Interpretation

ale

Setting a b. C: d. 6.

in which the MMPI-A is administered Clinical/psychological/psychiatric School/academic evaluation Medical Neuropsychological Forensic

2:

History a. b. Cc d. 6. t.

and background of patient Cooperativeness/motivation for treatment or evaluation Cognitive ability History of psychological adjustment History of stress factors History of interpersonal relationships Family history and characteristics

ae

Validity a Omissions b. Consistency Cc. Accuracy

4.

Code type (provides main features of interpretation) a Degree of match with prototype

(1) Degree of elevation (2) Degree of definition (3) Caldwell A-B-C-D paradigm for multiple highpoints

5:

b.

Low-point scales

c.

Note elevation of scales 2 (D) and 7(Pt)

Supplementary scales (supplement and confirm interpretation) a Factor 1 and Factor 2 issues

(1) Welsh A and R b.

Substance abuse scales

C

(2) ACK Psychological maturation (1) IMM scale’

(1) MAC-R and PRO

6.

Content scales a Supplement, refine, and confirm basic scale data b. Interpersonal functioning (A-fam, A-cyn, A-aln), treatment recommendations (A-trt), and academic difficulties (A-sch and A-/as) ce Consider effects of overreporting/underreporting

Pi:

Review of Harris-Lingoes subscales and critical item content a Items endorsed can assist in understanding reasons for elevation of basic scales

Note. From MMPI-A: Assessing Adolescent Psychopathology (p. 277) by R. P. Archer, 1992a, Hillsdale, NJ: Lawrence Erlbaum Associates. Copyright (1992) by Lawrence Erlbaum Associates. Reprinted by permission.

settings, correctional and juvenile delinquency programs, and outpatient and inpatient psychiatric settings. It is always important to combine MMPI-A test data with information from other psychological tests and from demographic, psychosocial history, and psychiatric history information collected in individual and family interviews to increase the accuracy and utility of inferences derived from the MMPI-A.

The third step noted in Table 18.2 involves the evaluation of the technical validity of the MMPI-A profile. This process begins with the review of the number of items omitted in the response process, with a recommendation that profiles be viewed as invalid if greater than 30 item omissions have occurred in the response record. Further, validity assessment continues with an evaluation of the degree to which the adolescent responded in a consistent manner

18

MMPI-A

(e.g., VRIN scale scores) and in an accurate manner (F, L, and K configural pattern) using the validity assessment model proposed by Greene (1989, 1991). In this model, a distinction is made between response consistency, defined as the extent to which the respondent endorses items in a reliable pattern, and response accuracy, defined as the degree to which the respondent has overreported or underreported symptomatology. Response consistency may be viewed as a necessary, but not sufficient, condition for technical validity. The tendency to overreport or underreport symptomatology, in turn, may be seen as relatively independent from the respondent’s actual level of symptomatology (Greene, 1989, 1991). The fourth step in MMPI-A interpretation involves the review of the basic scale clinical profile. This review should examine the degree to which one or more basic scales manifest clinical-range elevations (which on the MMPI-A are T values of = 60) and the relative magnitude of these elevations. In general, the greater the magnitude of an adolescent’s T score on a particular basic scale, the more likely it is that the respondent is accurately described by the correlates typically associated with elevations on that scale. In addition, the degree of correspondence between the profile configuration and the existing two-point code type literature should be examined. In this regard, the degree of definition manifested by an adolescent’s two-point code also should be evaluated, with two-point code type definition defined by the degree of T-score elevation difference between the second and third highest elevations within the clinical profile. The greater the degree of definition for the two-point code, the more likely it is that the descriptive statements associated with that code type are likely to be accurate for a particular adolescent. If an adolescent’s MMPI-A profile does not display clearly elevated and defined two-point code characteristics, the profile may be interpreted by an approach emphasizing individual scale descriptors (Archer, 1992a; Butcher et al., 1992). Basic individual scale descriptors have been established based on the empirical literature for the original instrument in adolescent samples and for the MMPI-A summarized in the test manual (Butcher et al., 1992). In the case of an MMPI-A

basic scale profile that

displays clinical-range elevations on more than two clinical scales, the A-B-C-D paradigm developed by Caldwell (1976) also may be employed for profile interpretation purposes. This latter approach emphasizes the common descriptor characteristics generated from multiple two-point configurations. For example, a 2-4-7 code type would be broken into two-point codes and interpreted based on common descriptors found for the 2-4, 2-7, and 4-7 code types. The two-point code type correlate literature rests on the work of Marks, Seeman, and Haller (1974), and this literature has been summarized and extended by Archer (1992a) for

the MMPI-A. A review of the MMPI-A supplementary and content scales is involved in steps 5 and 6.

Supplementary scales A and R provide an overall estimate of the adolescent’s degree of maladjustment and the use of repression as a primary defense mechanism, respectively. Extensive substance abuse screening information is available through the combined use of supplementary scales MAC-R, ACK, and PRO. In particular, the adolescent’s willingness to

acknowledge substance abuse problems is reflected in ACK scale scores, whereas the adolescent’s similarity to teenagers with known substance abuse problems is assessed through responses to the MAC-R and PRO scales. The supplementary MM scale also allows for an assessment of the adolescent’s maturational level as related to cognitive processes, ability to engage meaningfully in interpersonal relationships, and degree of egocentricity and frustration tolerance (Archer, Pancoast, & Gordon,

1991).

The 15 content scales provide valuable information in refining and augmenting the interpretation of the basic scales (Williams et al., 1992). For example, scores from A-anx may be

helpful in refining interpretation of scale 7, scale A-biz may be useful in clarifying interpretation of scale 8, and scales A-con and A-fam may be useful in refining interpretations of MMPI

431

432

ARCHER

basic scale 4. Further, scale A-trt may be useful, particularly in conjunction with scales L and K, in evaluating the adolescent’s readiness to engage in a therapy process. Content scales A-fam, A-cyn, and A-aln provide valuable information concerning the adolescent’s interpersonal functioning, whereas scales A-sch and A-las provide important information concerning possible problems in the academic environment. In evaluating the findings from the content scales, it is important to consider the effects of overreporting and underreporting on profile accuracy,

because

the content scales consist primarily of obvious

items (Archer,

1992a).

Thus, content scales can be biased easily by the adolescent’s attempt to underreport or overreport symptomatology. In the seventh and final stage of profile analysis, the clinician may wish to further consider and evaluate the content of the adolescent’s MMPI-A responses. This may entail a selective review of the Harris-Lingoes and Si subscales, and also may include review of responses to critical items lists such as the Koss-Butcher (1973) and Lachar-Wrobel (1979) critical lists.

Archer (1992a) provided information on the 99 Lachar-Wrobel critical items retained in the MMPI-A item pool, and offered guidelines for content subscale and critical item interpretation. Archer and Jacobson (in press) have recently suggested, based on the high frequency of critical item endorsement by normal adolescents, that substantial caution be used in interpreting these responses for this age group.

Computer-Based Test Interpretation Two computer-based test interpretation (CBTI) packages are currently available for the MMPI-A. These are the MMPI-A CBTI report developed by Archer (1992b) and distributed by Psychological Assessment Resources (PAR) as the MMPI-A Interpretive System, and an MMPI-A CBTI report developed by Butcher and Williams (1992b) and distributed through National Computer Systems (NCS) as The Minnesota Report: Adolescent Interpretive System. The PAR report is sold as an unlimited use software product, and the NCS product is

offered as an interpretive report service. Both CBTI products are based on combinations of expert judgment and actuarial data. Butcher (1987) and Archer (1992b) provided guidelines for the evaluation and assessment of CBTI products, including the relative advantages and disadvantages associated with this approach. The use of a CBTI product in the interpretation of the MMPI-A

(or any other assessment

instrument) does not reduce clinicians’ respon-

sibility for the accuracy of their interpretation of the individual patient’s profile.

Use of the MMPI-A.in Treatment Planning RELATIVE POPULARITY Archer, Maruish, Imhof, and Piotrowski (1991) recently investigated the relative popularity of 67 assessment instruments among clinicians who routinely evaluate teenaged clients. The

respondents in this study consisted of 165 clinicians selected from the membership directories of the American Psychological Association and the Society for Personality Assessment. Table 18.3 shows the results for the top 20 assessment instruments, with findings presented by total mention as well as weighted scores that were adjusted for frequency of use. These data indicate that the MMPI was the third most frequently mentioned assessment instrument

18

MMPI-A

TABLE 18.3 Frequency of Use of Psychological Assessment Instruments With Adolescents

Usage Rating Totals Instrument

a

WISC-R/WAIS-R Rorschach Bender-Gestalt TAT Sentence Completion MMPI Human Figure Drawing House-Tree-Person WRAT Kinetic Family Drawing Beck Depression Inventory MAPI MacAndrew Alcoholism Scale CBCL Woodcock Johnson PPVT Conners Behavior Rating Beery VMI Reynolds Adolescent Depression Scale Children’s Depression Inventory

b

c

d

e

f

™M

WS

17 6 25). 519 41 a 40 18 ali 2 36) 23 61 22 Sfss 25 S9F 17 64 24 59 36 88 34 8733 90s 30 94 24 Ton oO 94 26 92 33

20 23 24 20 23 23 13 22 35 22 46 10 14 18 16 41 24 18

16 15 11 20 14 27 1s 15s 10 13 i 5 8 5 11 6 9 11

26 16 20 22 20 21 13 15 16 12 7 9 8 4 8 1 5 4

77 68 49 42 43 32 38 28 25 27 7 16 12 15 9 3 4 4

145 Sa 121 122 121 126 101 105 103 98 1OS 74 75 72 68 84 68 70

583 OO 423 416 404 394 335 314 306 290 Baret2 185 ACOre 172 166 152 141 138

103 101

13 18

3 6

4 9

8 0

59 61

122 120

31 27

Note. From ”Psychological Test Usage with Adolescent Clients:

1990 Survey Findings” by R. P. Archer,

M. Maruish,E. A. Inhof, and C. Piotrowski, 1991, Professional Psychology:

Research and Practice, 22,

p. 249. Copyright (1991) by the American Psychological Association. Reprinted by permission of the publisher. a = never; b = infrequently; c = occasionally; d = about 50% of the time; e = frequently; f = almost always.

TM = total mentions; WS = weighted score (sum of n x numerical weight of ratings:

a =

0,b=1,c=2,d=3,e=4,f=5).

with this age group and the sixth most frequently employed instrument when evaluated for frequency or intensity of use. Evaluated by either standard, the MMPI was the most widely used objective personality assessment measure with adolescents and the only objective measure included in the top 10 instruments found for survey results. Figure 18.1 provides findings concerning the relative frequency of inclusion of specific psychological instruments within standardized assessment batteries as reported by the survey respondents. This figure illustrates that the MMPI was the fifth most frequently included instrument in standard test batteries, routinely included in roughly half of such assessments. The Archer et al. (1991) survey also asked respondents who used the MMPI to indicate the primary advantages and disadvantages associated with the original instrument. Results

indicated the advantages associated with the MMPI were its usefulness in treatment planning issues, including the accuracy of interpretive statements generated from profile information, the comprehensive aspect of the measures of psychopathology assessed by the MMPI, and the extensive research literature available to assist in the interpretation process. The major disadvantages associated with the MMPI by survey respondents were the length of the item pool and administration time required, the outdated aspects of the adolescent norms, the reading requirements of the MMPI, and the inclusion of inappropriate or outdated items. The developers of the MMPI-A attempted to address most of these problem areas by reducing instrument length, providing contemporary norms, and revising many items to simplify wording and increase the appropriateness of item content. Nevertheless, the MMPI-A continues to manifest many of the same advantages and disadvantages as the original instrument.

433

434

ARCHER 100%

el

91%

80% -

40%

Users Test

+

20% BW o rFeprFanws

0% Standard

Battery Tests

Used

FIG. 18.1. Most frequently used psychological instruments in standard batteries with adolescents. From “Psychological Test Usage with Adolescent Clients: 1990 Survey Findings” by R. P. Archer, M. Maruish, E. A. Imhof, and C. Piotrowski, 1991, Professional Psychology: Research and Practice, 22, p. 250. Copyright 1991 by the American Psychological Association. Reprinted by permission of the publisher.

Despite potential improvements in the MMPI-A, the revised instrument continues to require substantial patience on the part of the adolescent to deal with the lengthy item pool, and requires a level of literacy that renders administration problematic with many adolescents. The MMPI-A, like the original MMPI, is designed as a measure of psychopathology, rather than as an assessment instrument applicable to the evaluation of normal-range personality dimensions. Thus, the information drawn from the MMPI-A is restricted in describing adaptive functioning characteristics or nonpathological dimensions beyond masculinity— femininity

(Mf), social introversion—extroversion

(Si), and possibly some

of the content

domain of the Immaturity (MM) scale. Additionally, as discussed in Archer (1992a), both the MMPI and the MMPI-A are best used as measures of the individual’s current level of functioning in relationship to standardized measures of psychopathology. Like its predecessor, the MMPI-A is limited in yielding data useful in long-range predictions of personality functioning due to the instability manifested in adolescents’ psychopathology and the consequent instability of test findings over extended periods (Hathaway & Monachesi, 1963).

RESEARCH

FINDINGS AND CLINICAL APPLICATIONS

As noted in the MMPI-A manual (Butcher et al., 1992), the original form of the MMPI was used to examine a variety of diagnostic issues among adolescents, including behavioral problems, borderline personality disorder, depressed mood, eating disorders, homicidal behavior, aggression, incest and sexual abuse, sleeping problems, medical and neurological problems, schizophrenia, and suicide. The earliest research application of the MMPI with

adolescents centered on the usefulness of this instrument in identifying groups of delinquent adolescents (Capwell,

1945a, 1945b). In a research study begun in 1947, Hathaway and

Monachesi (1963) examined the usefulness of the MMPI in predicting the onset of delinquent

18

MMPI-A

435

behaviors in Minnesota samples involving approximately 15,000 adolescents. In their research findings, these authors reported modest relationships between adolescents’ original MMPI profiles and the later onset of delinquent behaviors. Hathaway and Monachesi found that elevations on scales 4, 8, or 9, singularly or in combination, were associated with higher rates of delinquent behavior, and they labeled these three scales excitatory scales. Hathaway and Monachesi also noted much instability in the elevation pattern in adolescents’ profiles when ninth-graders were reevaluated during their senior year in high school. However, they did observe that adolescents who produced marked elevations during the ninth-grade assessment were more likely to show relative stability on those scales when reevaluated 3 years later. The issue of the relationship between MMPI data and clinicians’ diagnostic judgments has been examined

in several studies within adolescent

samples.

Archer and Gordon

(1988)

investigated the relationship between Scale 2 and Scale 8 elevations and the occurrence of clinical diagnoses related to depression and schizophrenia in a sample of 134 adolescent inpatients. The authors found little evidence of a meaningful relationship between scale 2 elevations and clinicians’ use of depression-related diagnoses. However, Archer and Gordon reported that scale 8 elevations were an effective and sensitive indicator of schizophrenic diagnoses. A criterion of T-score values = 75 on scale 8, used to identify schizophrenia in this study, resulted in an overall classification accuracy rate of .76. This level of performance

is comparable to findings reported for scale 8 in adult populations (Hathaway, 1956). Johnson, Archer, Sheaffer, and Miller (1992) investigated the relationships between characteristics of MMPI and Millon Adolescent Personality Inventory (MAPI) profiles and psychiatric diagnoses in a sample of 199 adolescent inpatients and outpatients. Results indicated low levels of congruence or agreement between MMPI-derived diagnoses and clinician judgments. This finding is consistent with those typically obtained by researchers in adult populations employing broad diagnostic groups (Hedlund, Won Cho, & Wood, 1983; Pancoast, Archer, & Gordon,

1977; Moreland,

1988). The results of these studies underscore cautions

that have been provided concerning the use of the MMPI, or any other personality measure used in isolation, in attempts to provide definitive psychiatric diagnoses for patients. Graham (1990) noted that the poor correspondence traditionally found between MMPI results and psychiatric diagnosis may be a result of the high degree of intercorrelation between standard MMPI scales, as well as the unreliability of specific diagnostic groups employed by Hathaway and McKinley in the original MMPI. These findings also reflect the well-established problems in reliability that appear to be inherent in the psychiatric nosology embodied in the Diagnostic and Statistic Manual (DSM) series.

MMPI-A SCALES RELATED TO TREATMENT PLANNING The MMPI-A includes a variety of scales relevant to a number of treatment planning issues. For example,

research by Archer,

White,

and Orvin

(1979) associated higher scores on

validity scales L and K with longer treatment durations for hospitalized adolescents. Elevations on Welsh’s Repression (R) scale and the content scale Negative Treatment Indicators (A-

trt) also appear to be relevant to evaluating the adolescent’s readiness and capacity to engage in the treatment process. Basic scale measures including scales 2 and 7, and the supplementary scale Anxiety (A), have direct relevance in estimating the degree of affective distress

experienced by the adolescent. This issue also is illuminated by content scales Anxiety (Aanx), Obsessiveness (A-obs), and Depression (A-dep). Issues related to impulse and behav-

436

ARCHER

ioral control, as noted in the discussion of excitatory scales, are related to elevations on basic scales 4, 8, and 9. This issue is relevant to findings from supplementary scale JMM and content scales, such as Conduct Disorder (A-con), Anger (A-ang), and Cynicism (A-cyn).

Potential problems can be identified in a number of specific life areas using the MMPI-A, including academic (A-sch, A-las) and family environments (A-fam). Also of note are the relative contributions of the MAC-R, ACK, and PRO scales to screening and evaluating substance abuse problems among teenagers. In addition, Marks et al. (1974) noted an association between several two-point code types and the occurrence of abuse or alcohol problems, including a 2-4/4-2 and 4-9/9-4 code types. Archer and Klinefelter (1992) demonstrated that, in a sample of 1,347 adolescents in clinical settings, certain

MMPI code types, particularly those involving elevations on scale 4 or scale 9, are much more likely to be associated with elevations on the MAC scale. As noted by Archer (1989), the MMPI has proved to be a very useful tool in treatment planning for adolescents for over 40 years. It is likely that the MMPI-A will continue and expand this tradition, particularly as more information becomes available concerning the correlate patterns for new MMPI-A scales.

INTEGRATION OF MMPI-A RESULTS WITH OTHER EVALUATION DATA Findings from the MMPI-A should be integrated routinely with results from other test instruments, clinical interview, family assessment, and psychosocial history data in deriving diagnostic and treatment planning recommendations. Gallucci (1990) reviewed the literature related to the combination of MMPI results with data from other instruments, including the Wechsler Intelligence Scales, the Rorschach, and the Million inventory in adult populations. Archer and Krishnamurthy (1993) recently provided a review of five studies that examined the interrelationships between MMPI and Rorschach data in adolescent outpatient and inpatient clinical settings. These authors also provided new data on interrelationships between 50 Rorschach variables selected from the Comprehensive System (Exner, 1974, 1986) and the 13 MMPI basic scales in a sample of 116 male and 81 female adolescent inpatients and outpatients. These studies revealed relatively modest interrelationships between these two instruments. In cases where the Rorschach and MMPI led to contradictory clinical inferences, Archer and Krishnamurthy recommended that the clinician place particular emphasis on the use of additional sources of data, including individual and family interview data and

psychosocial history findings, in reaching interpretive conclusions. Part of this recommendation was based on the classic study by Sines (1959). This investigator examined the usefulness of diagnostic interview data, biographical data, and psychometric data from the MMPI and the Rorschach in predicting to patients’ personality characteristics. Sines found that judgments based on clinical interview findings in combination with biographical data were consistently more accurate in predicting therapists’ ratings of their patients than judgments based on either the MMPI or the Rorschach when combined with biographical data, but excluding clinical interview findings.

In addition to the clinical interview of the adolescent, the assessment of parental perceptions concerning the adolescent’s functioning is very important. Several instruments, including the Child Behavior Checklist-Revised (CBCL), provide a standardized format to collect this type of information. Archer (1987, 1992a) and Williams (1986) also stressed the impor-

tance of MMPI assessment of the parents of the adolescent being evaluated to generate a

18

MMPI-A

437

greater understanding of possible family dynamics and influences that may help to shape or distort parental perceptions of their child’s functioning. Archer (1987) noted the following: The current literature supports the involvement of parents of psychiatrically disturbed children in psychiatric treatment efforts. Perhaps the clearest finding from this literature is that parents of psychiatrically disturbed children typically display substantial features of psychological distress and maladjustment. This conclusion is particularly marked for the parents of children in inpatient treatment settings. Therefore, the involvement of parents in treatment programs that are responsive to the psychological features of the parents, as well as the symptomatology of the adolescent patient, appears to have firm empirical grounding. Clearly, such treatment involvement does not require a causal assumption of a parental role in the etiology of the child’s disorder. These treatment efforts may be more parsimoniously based upon the recognition of the marked degree of psychological pain and disturbance commonly reported among parents of children experiencing deviant psychological development. (p. 178)

PROVISION OF MMPI-A FEEDBACK Archer (1992a) noted that the provision of MMPI-A feedback to the adolescent is an important factor in increasing the adolescent’s motivation to cooperate with testing procedures. The issue of MMPI test feedback has been a central emphasis in two recent texts (Butcher, 1990; Lewak, Marks, & Nelson, 1990). Also, a computer software package has been devel-

oped by Marks and Lewak (1991) to assist the clinician in providing MMPI test feedback to adolescent clients. MMPI-A feedback with adolescents should begin with an explanation of the test instrument, including the ways MMPI-A data are used to generate hypotheses concerning patients’ personality characteristics. Adolescents should be encouraged to interact with the psychologist during the feedback session. The adolescents’ input into the feedback process allows the psychologist an opportunity to appraise the adolescents’ reactions to, and acceptance of, various features of test findings. It is usually much easier for adolescents to accept test feedback when findings are presented individually, instead of the framework of a family therapy session. It is probable that many clinicians often underestimate the extent of information that adolescents are capable of usefully assimilating, particularly if the psychologist is careful to avoid the use of technical jargon and uses language and concepts understandable to patients when presenting test findings.

AREAS OF LIMITATIONS OR POTENTIAL PROBLEMS IN MMPI-A USE Several limitations or potential problems can be identified when using the MMPI-A for treatment planning purposes (Archer, 1992c). Issues similar to those regarding the generaliz-

ability of the literature from the MMPI to the MMPI-2 with adults undoubtedly will be raised concerning the applicability of adolescent research findings based on the original form of the MMPI to the MMPI-A. The two-point code type congruence rates between the MMPI and MMPI-A for adolescents in the normative sample were 67.8% for males and 55.8% for females, and 69.5% for males and 67.2% for females in a clinical sample (Butcher et al., 1992). Using a five-point code type definition requirement, the congruence rates increased to 95.2% for males and 81.8% for females in the normative sample, and 95.4% for males and

438

ARCHER

94.4% for females in the clinical sample (Butcher et al., 1992). These data are very similar to

the two-point code type congruence rates between the MMPI and MMPI-2 for normal and clinical sample of adults (Butcher et al., 1989).

In addition to issues concerning the generalizability of findings from the original form of the MMPI to the MMPI-A, there are 15 content scales, 3 supplementary scales, 3 Si subscales, and 4 new validity scales that do not have counterparts on the original form of the MMPI. These measures require ongoing validity studies to establish the correlate meanings for these measures in clinical populations. As more clinical correlate data are established, the interpretation of these scales should become less tentative and provisional. It has been noted that the MMPI-A requires a substantial amount of cognitive maturation and reading ability for successful administration, and the revision of the test instrument has not changed these administration requirements substantially. Adolescents still must have the capacity and motivation to complete a relatively long and demanding test instrument. As with the original form of the MMPI, short forms are not recommended as a way to reduce the requirements of the MMPI-A for the adolescent respondent. Butcher and Hostetler (1990) defined the term short form as “used to describe sets of scales that have been decreased in length from the standard MMPI form. An MMPI short form is a group of items that is thought to be a valid substitute for the full scale score even though it might contain only four or five items from the original scale” (p. 12). Archer (1992a) noted the potential problems

involved in the use of short forms with the MMPI-A. These problem areas center on the loss of important clinical information when short forms are substituted for the administration of the full MMPI-A item pool. Short form administrations are contrasted with abbreviated administrations, in which a clinician elects to administer the first 349 items in the MMPI-A.

This administration approach will result in item endorsements necessary to score the basic clinical scales. However, the abbreviated administration will not provide sufficient information to score the content scales, several of the supplementary scales, or the validity scales VRIN, TRIN, F, and F,. If an abbreviated format increases the motivation or cooperation of an adolescent, this option may be used with an understanding of what data the clinician can gather from such an approach and what scales and measurement areas cannot be addressed. A final area of potential limitation related to the MMPI-A is associated with the relatively low magnitude of MMPI-A basic scale elevations that are likely to occur with this revised instrument. As noted by Archer (1987), normal range mean profiles for adolescent populations often were found on the original form of the MMPI, leading to the recommendation by Ehrenworth and Archer (1985) that T-scale values = 65 be used as the demarcation point for

clinical range elevations when using adolescent norms. Archer, Pancoast, and Klinefelter (1989) found that the use of a clinical scale T-score value of 65 or greater (rather than 70) to define clinical levels of psychopathology resulted in increases in sensitivity in accurately identifying profiles produced by normal adolescents versus adolescents receiving treatment in outpatient and inpatient settings. The MMPI-A is likely to produce even lower mean T-score values for adolescent clinical samples than those found on the original form of the MMPI using the Marks and Briggs (1972) adolescent norms. The MMPI Adolescent Project Committee recognized that the revised test instrument

often produced lower T-score values for adolescents in comparison with the original test instrument. This observation led to the development of the “gray zone” or “shaded zone” on the MMPI-A profile sheets. Specifically, the use of a single “black line” value to delineate the demarcation point between normal and clinical range scores was abandoned in favor of

the creation of a range of scores that serves as a transition area between normal- and clinicalrange elevations. On the MMPI-A, this zone is placed in the range of T-score values => 60 and = 65 for all MMPI-A scales, regardless of whether linear or uniform T-score procedures

a

»

18

MMPI-A

439

were used for that particular scale. A central question requiring further study relates to the sensitivity and specificity of the MMPI-A instrument in identifying psychopathology in adolescents. Substantial research data are needed to determine whether the MMPI-A may be subject to increased problems in the accurate detection of psychopathology (sensitivity), because of the reduction of T-score values (Archer, 1992c). These issues are related directly

to the questions of how often a normal adolescent will produce T-score values within normal ranges on the MMPI basic scales, and how frequently adolescents experiencing significant psychopathology will produce one or more scales equal to or greater than a T-score value of 60 on the MMPI-A basic scales. Until this issue is resolved by the accumulation of empirical research findings, it seems reasonable to be concerned that the lower T-score values found

for the MMPI-A (in contrast to the MMPI) may result in a reduction in test sensitivity in identifying adolescents with significant psychiatric symptomatology.

Use of the MMBPI-A in the Evaluation of Treatment Outcome GENERAL ISSUES I have discussed the use of the MMPI-A in terms of objectively evaluating and describing an adolescent’s level of functioning in relation to standardized measures of psychopathology. The MMPI-A also may be used in repeated administrations to assess changes in functioning across intervals of time. This use of the MMPI-A is particularly important, because many aspects of psychopathology manifested by adolescents during this developmental stage are subject to rapid changes over time. When the MMPI-A is administered at various points in the treatment process, it can provide the clinician with a sensitive index of therapeutic

progress.

EVALUATION AGAINST CRITERIA FOR OUTCOME MEASURES Although many aspects of the MMPI-A contain new features that will require much further research and investigation, it seems possible to offer some initial speculations concerning the

probable ability of the MMPI-A to meet the criteria for outcome assessment measures as formulated by Ciarlo, Brown, Edwards, Kiresuk, and Newman

(1986) and discussed earlier

in this text. For example, it is likely that the MMPI-A will have substantially more relevance to the assessment of adolescent psychopathology than the original MMPI form because of the

inclusion in the revised instrument of items and scales specifically targeted for this population. Thus, the MMPI-A retains the benefits of the original MMPI in the assessment of a wide range of psychopathology, and extends the applicability of the instrument to the adoles-

cent age group in a manner consistent with the Ciarlo et al. Criterion 1 emphasis on the relevance of an instrument to the target group. In addition to Criterion

1, the MMPI-A

appears to hold special utility in meeting the 6th criterion related to psychometric strength of an instrument, and the 10th criterion regarding the usefulness of the instrument in clinical services. More probably is known about the psychometric properties of the original form of the MMPI than any other widely used psychopathology-related assessment instrument. For

440

ARCHER

example, Butcher (1987) estimated that over 10,000 articles and books have documented the

use of the MMPI, and Butcher and Owen (1978) estimated that 84% of all research conducted in the personality inventory domain has centered on the MMPI. Archer (1987) estimated that approximately 100 studies have been published related to the use of the MMPI with adolescents, and the MMPI-A manual provides extensive information concerning the reliability and validity of the revised instrument (Butcher

et al., 1992). The MMPI

and

MMPI-A also are particularly strong in the area of the relevance of assessment findings to the provision of clinical services. The use of the MMPI-A as a treatment outcome assessment measure provides relevant and extensive clinical information of usefulness to both the treatment team and the patient. Finally, in reference to the last criterion listed by Ciarlo et al. (1981) (i.e., Criterion 11 regarding “compatability with clinical theories and practices”), the MMPI and MMPI-A also may have a particular strength in this area. Although the original MMPI and, to a lesser extent, the MMPI-A were developed in an atheoretical and empirical fashion, these instruments clearly are compatible with a wide range of theories of psychopathology from the behavioral to the psychoanalytic. This compatibility with a broad range of clinical orientations and theories is probably one of the most important factors in the widespread popularity of this instrument in assessment practices with both adults and adolescents. Balanced against these areas of strength for the MMPI-A, it might be argued that the revised instrument, as well as the original MMPI,

are less effective in meeting other outcome

measure

criteria

developed by Ciarlo et al. (1981), including their emphasis on instruments with a simple, teachable methodology (Criterion 2); use of outcome measures that might be employed with multiple respondents (Criterion 4); and criteria related to cost factors (Criterion 7), under-

standability by nonprofessional audiences (Criterion 8), and simplicity of feedback and interpretation processes (Criterion 9). In these areas, it should be acknowledged that the MMPI-A is a complicated, extensive test instrument that requires substantial time on the part of the adolescent to respond to the lengthy item pool, and that also requires extensive training and expertise on the part of the psychologist to ensure accurate interpretation practices.

RESEARCH

FINDINGS

Systematic and controlled treatment outcome studies are not available yet for the MMPI-A.

However, much treatment outcome research data are available concerning MMPI basic and special scales in adult populations. For example, Barron (1953) developed the Ego Strength scale by identifying items that separated the response patterns of 17 neurotic patients judged to have improved after 6 months of psychotherapy versus 16 neurotic patients judged unim-

proved over the same time interval. Because of the largely contradictory results generated by studies examining the usefulness of the Ego Strength scale, however, this measure was not retained by the MMPI Adolescent Project Committee for the MMPI-A. In contrast, a revised form of the MacAndrew Alcoholism scale (MAC-R) was retained in the MMPI-A. Individuals’

scores on the MAC appeared to remain relatively stable across time (Archer, 1987, 1992a). For

example, MAC scale scores in alcoholics remained elevated following treatment in studies by Gallucci, Kay, and Thornby (1989) and others. In addition to the MAC scale, Welsh’s Anxiety (A) and Repression (R) scales were carried over from the original MMPI to the MMPI-A. Welsh (1956) created the Anxiety and Repression scales to measure the first two factors of the

MMPI. The particular usefulness of the A and R scales in the assessment of treatment outcome may be related directly to their relationship to the factor structure of the MMPI. Welsh found that the first factor of the MMPI had high positive loadings on MMPI basic » %

>

18

MMPI-A

441

scales 7 and 8, with a negative loading on scale K. This factor originally was labeled general maladjustment (Tyler, 1951) and subsequently was labeled by Welsh as Anxiety (Welsh, 1956). This first factor also has been identified in factor analyses of adolescent’s scale values on the MMPI (Archer, 1984; Archer & Klinefelter, 1991) and on the MMPI-A (Archer, Belevich, & Elkins, 1993; Butcher et al., 1992). Thus, the MMPI-A

Welsh’s A scale served

as a marker for first factor variance in the test instrument. In the MMPI-A normative sample, the A scale was intercorrelated highly with several other MMPI-A measures, including basic scales K (r =

—.72), Pt (r =

.89), and Sc (r =

.76); and content scales A-anx (r =

.83),

A-obs (r = .82), and A-dep (r = .80). Thus, T-score values on all of these measures except

scale K (which is correlated negatively to the first factor) tend to lower when an adolescent reports lower reevaluation levels of emotional distress and maladjustment as a result of successful treatment efforts. Welsh’s second factor, although less clearly defined than the first, tends to be related to

elevations on scale 3 and negatively related to elevations on scale 9. Welsh labeled this factor Repression, and this factor also has been identified in factor analytic studies of adolescents with the original form of the MMPI

MMPI-A

(Archer,

1984; Archer & Klinefelter,

1991) and the

(Butcher et al., 1992). The Repression scale is intercorrelated most highly with

scales L (r = .44), K (r = .45), and Ma (r = —.43) in the MMPI-A normative sample. All 33 items in the MMPI-A R scale are scored in the false direction, and involve the denial of

symptomatology, particularly aggressive or hostile feelings, and the expression of disinterest in sensation-seeking activities. As a component of this dimension, scale K is intercorrelated highly and negatively with several MMPI-A scales, including content scales A-anx (r = —.59), A-obs (r = —.67), A-ang (r = —.62), and A-cyn (r = —.70); and supplementary scale

A (r = —.72). This pattern implies that MMPI-A test-retest administrations often may show a pattern where reduction of factor 1 symptomatology will be associated with increased elevations on factor 2 related scales such as Repression and particularly the K scale. This pattern may be related to the observation that the K scale, in its use in adult populations, often has been seen as an indicator of psychological health, rather than a measure exclusively of defensiveness. An understanding of the interrelationships between factor 1 and factor 2 patterns in the MMPI-A will assist in interpreting individual change in test-retest MMPI-A administrations by providing a conceptual organization to the changes shown on the individual scale level.

CLINICAL APPLICATIONS Butcher and Tellegen (1978) and Ullmann and Wiggins (1962) reported that 80%—85% of the

items on the original MMPI were worded in a manner that related to trait personality features

or biographical information that should not change on retest. This estimate leaves approximately 15%—20% of the original item pool to provide information on changes in psychological characteristics. However, if only 15% of the original 550 items were capable of showing state changes in functioning, there still would be a pool of approximately 83 items capable of reflecting changes in psychological functioning. Several studies have been conducted on the stability of high-point, two-point, and even three-point code types for the original MMPI, and this literature has been reviewed by Graham (1990). Among his conclusions, Graham noted that code types are likely to be more stable when the primary scales are more elevated and when there is a greater degree of elevation of the primary scales in relationship to other scales in the profile (i.e., when the code type is well-defined). Graham also noted that, although code types may change from

442

ARCHER

one administration to another, they are likely to remain within the same broad diagnostic grouping. Pancoast et al. (1988) examined the agreement or congruence rate between discharge diagnoses rendered by psychiatrists and the admission and discharge MMPI-derived diagnoses from four diagnostic classification systems developed for the MMPI. The four classification systems included a simple high-point code based on the most elevated clinical scale in the profile, and Henrichs’ (1964, 1966) revision of the Meehl-Dahlstrom (1960) rules, the Goldberg Equation (Goldberg, 1965), and the system developed by Lachar (1974).

This study indicated a modest hit rate of between 24%-—34% for MMPI-derived diagnoses (across the various classification systems) and psychiatric diagnoses. Further, the stability of MMPI-based diagnoses from admission to discharge ranged from 48% to 51%, depending on the classification system employed. Thus, there appeared to be little difference in the accuracy or stability of profiles related to the complexity of the system used for diagnostic classification purposes. Of the several factors that may affect the evaluation of change on the MMPI-A, perhaps the most important issue relates to the concept of the standard error of measurement. As previously noted, the MMPI-A manual reports that the standard error of measurement for the MMPI-A basic scales is approximately two to three raw score points or four to six T-score points (Butcher et al., 1992). This standard error of measurement estimate indicates that if an

individual were to retake the MMPI-A within a very brief period of time with their emotional and psychopathology status remaining constant, one would expect their T-score values on the basic scales to fall within a range of plus or minus approximately five T-score points roughly 50% of the time. The standard error of measurement range on the MMPI-A places practical limits on the interpretation of small T-score differences in the evaluation of an individual’s degree of change obtained by comparing original with readministration scores from the MMPI-A. As noted in the MMPI-A manual, this standard error of measurement also has implications for code type interpretation. For example, a 2-4-7 code type, with all three scales having T-score values of 70, would be placed arbitrarily within a two-point code category (i.e., 2-4), but could be markedly different from a clearly defined 2-4 profile type with a substantial T-score difference between the second and third most elevated scales.

USE WITH OTHER EVALUATION DATA As previously noted under the discussion of the MMPI-A in treatment planning, findings from the MMPI-A should be integrated routinely with the results of other sources of information concerning the adolescent, particularly those that provide other perspectives on the adolescent functioning including reports and ratings provided by teachers, parents, and treatment team members. These external sources of information provide valuable and unique

data that supplement the types of information that the adolescent can provide in the MMPI-A self-report format.

PROVISION OF MMPI-A FEEDBACK REGARDING ASSESSMENT FINDINGS As previously noted, the provision of MMPI feedback has become a central emphasis in discussions of this instrument (Butcher, 1990; Lewak et al., 1990). Unfortunately, much of these discussions relate to feedback connected with the use of the instrument for treatment planning, rather than as a treatment outcome assessment measure. Nevertheless, it is clear

that the MMPI and the MMPI-A can provide valuable information when used in a feedback

18

MMPI-A

443

process to document the adolescent’s change over time as a result of participation in the treatment process. Used within this format, the initial testing provides a baseline against which later MMPI administrations can be compared to evaluate the degree of change in personality and psychopathology patterns over the course of treatment. The review of such test findings provides both the adolescent and the therapist with an important opportunity to explore the degree of consensual agreement between therapist, patient, and test findings concerning the amount and nature of change that has been experienced. The process of readministering the MMPI-A to evaluate treatment progress usually is not resisted by the adolescent if the adolescent has a stake in such testing, in the sense of a clear understanding that he or she will receive feedback concerning test findings. As previously noted, an adolescent is capable of receiving and understanding a great deal of information concerning the MMPI-A. However, in addition to avoiding technical jargon, the therapist should avoid the use of feedback sessions as a means of “confronting” a reluctant or resistant adolescent concerning his or her lack of treatment progress. Although such confrontations might be indicated for a particular adolescent, the use of the MMPI-A to provide the grounds for such a confrontation may reduce the adolescent’s willingness to accurately report on this instrument in future evaluations.

LIMITATIONS/POTENTIAL PROBLEMS IN MMPI-A USE As previously noted, the single greatest problem in the evaluation of change on the MMPI is related to the overemphasis of small T-score shifts that represent changes that are less than the standard error of measurement on the test (i.e., five T-score points). In addition, the MMPI interpreter often is left with the challenge of determining whether an adolescent’s improvement, as reflected in MMPI-A T-score reductions on clinical scales, represents actual positive changes in psychological functioning, or the adolescent’s use of a defensive response set in attempt to minimize report of psychopathology during the test readministration. One of the relatively unique aspects of the MMPI-A that substantially helps in this differentiation task is the presence of extensive validity scale information concerning the adolescent’s approach to the response process. Using the original form of the MMPI, Herkov, Archer, and Gordon (1991) examined the relative efficacy of the traditional validity scales and the Wiener-Harmon Subtle-Obvious subscales in identifying fake-bad and fake-good response sets among adolescents. This study involved 403 adolescents from a nonpatient adolescent group, a normal group instructed to fake bad, and a psychiatric inpatient group instructed to fake good. The results of this study indicated that elevations on scale L were a highly sensitive indicator of adolescents’ attempts to fake good, whereas elevations on MMPI scale

F were quite sensitive in accurately identifying adolescents’ attempts to overreport symptomatology on the test instrument. In general, the use of the MMPI and MMPI-A validity scales should provide important assistance to the interpreter in determining the accuracy of the adolescents’ reports of change in symptomatology across MMPI-A administrations.

Clinical Case Example Examples of MMPI-A interpretation principles can be found in the test manual (Butcher et al., 1992) as well as in Archer (1992a) and Butcher and Williams (1992a). The following clinical case example was selected from Archer (1992a) to illustrate the use of the MMPI-A

for the purposes of personality description and treatment planning.

444

ARCHER Deborah, a 17-year-old White female adolescent, was admitted to an acute inpatient unit

in a psychiatric hospital. This patient had a history of antisocial behavior and legal violations that included loitering, petty larceny, vagrancy, possession of drugs, and possession of drugs with intent to distribute. Her psychiatric symptomatology at the time of hospitalization included anger, hostility, and depression. Upon admission, the treatment team DSM-III-R diagnoses

for this patient included

dysthymic

disorder

(300.40);

conduct

disorder,

un-

differentiated type (312.90); and psychoactive substance abuse (305.90). She had an extensive history of alcohol and substance abuse, including hallucinogens, marijuana, cocaine, and barbiturates. Immediately prior to hospitalization, Deborah required emergency hospitalization for an unintentional drug overdose from her . 2 of a combination of Valium and cocaine. This adolescent was an only child from an upper socioeconomic class background. Deborah’s father was an executive vice president for a multinational corporation, and his job responsibilities resulted in multiple relocations of the family to a variety of Western European countries. Approximately 1 year prior to the patient’s current psychiatric admission, she had been arrested by British authorities for the possession and sale of narcotics. Deborah’s parents reported a long history of difficulty controlling their daughter’s behavior, and indicated that Deborah had an extensive history of school truancy and episodes of running away from home. Her parents also indicated their suspicions regarding their daughter’s possible use of prostitution to support and maintain her drug use. Deborah’s academic records indicated a history of underachievement, with grades in the average to below-average range. Results of administration of the Wechsler Adult Intelligence Scale-Revised (WAIS-R) produced a Verbal IQ score of 110, a Performance IQ score of 124, and a Full Scale IQ score of 116. The Child Behavior Checklist (CBCL),

developed by Achenbach

and Edelbrock

(1983), was administered to Deborah’s mother with resulting elevations on the Delinquent and Hyperactive scales. Staff ratings on the Devereux Adolescent Behavior (DAB) rating scale, developed by Spivack, Haimes, and Spotts (1967), showed elevations on the Unethical

and Defiant/Resistant behavior factors.

Deborah’s MMPI-A basic scale profile is shown in Fig. 18.2. This profile displays T-score values based on MMPI-A norms (Butcher et al., 1992) and on the norms developed

by Marks and Briggs (1972) for the original form of the MMPI and found in Appendix G of the MMPI-A manual (Butcher et al., 1992). The third step of the interpretive model noted in

Table 18.2 involves the evaluation of the technical validity of the MMPI-A profile. This step is undertaken by review of scales and raw score values appearing on the left side of the basic scale profile sheet. One might begin by noting that Deborah omitted only one item on the Cannot-Say scale, a value clearly within acceptable limits for profile interpretation. The response consistency measures

VRIN (T =

43) and TRIN (T =

54) also produced values

within acceptable limits for valid profile interpretation. One also might note that there is relatively little difference between the T-score elevations on scales F, (T = 66) and F, (T = 53), providing evidence that Deborah did not respond to the latter part of the test booklet in a random manner. The validity scale configuration produced by MMPI-A scales F, L, and K also are within acceptable limits and are consistent with a meaningful and useful interpretation of MMPI-A clinical scale findings.

The fourth step shown in Table 18.2 involves a review of the basic scale clinical profile. Deborah’s basic scale profile is a well-defined 4-9 code type. The term definition, as applied to two-point code type, refers to the degree of T-score difference between the second scale

(scale 9) and third (scale 8) most elevated clinical scales. The 4-9 code type commonly is found among adolescents in clinical settings on both the original form of the MMPI and on the MMPI-A (Archer, 1987, 1992a). In the Marks et al. (1974) description of two-point code

Saree: Se et ee ee Ge ee RE PO

Ae

A

CE

CC

'

aye

ea

ea

a

lrtrelicerbovtrdooerdiedieeed

NERY

aS

vee

voue

on 2

|

2

1

|

mt

mya

co 2

um

[

a

"|

3

yee 2

wt

2—

epeeeperepreny

rote

(He

=os

pet

3

Sc Ts Si Ma

fa

21 30 28 14 12

30 Mf Pd Pa Pt

Se Pt Pa Mf Pd Ma SiTa

Hy

33 19

Hy

18 He He

P30

GT

LONI

proud

Us

ae

5 42929

'

=

F2

F2

TRIN

al

TRIN

clinical for sheet profile scale basic MMPI-A (Deborah). example Fig. 18.2. case permission. (©) Copyright Regents the by reproduced sheet profile 1992 University of permission. by Reprinted Minnesota. of

6 8 210 VRIN Is Ts:T Ts:F F141

PNET

2

CY

2

eA

2

AL

8

ACRE

8

«

CR

eg l|[s

A

AT

2

ae

©

TE

©

te

382 ©

I

&

(IE

8

Pla

¢

Tels

=

Vea

#

B

F1 Ts:T Ts:F VRIN Ts

RawScore Raw ”Score

445

446

ARCHER

types, the 4-9 code was found for adolescents who were described as defiant, impulsive, disobedient, and school truant. Marks et al. also noted that these adolescents were likely to be runaways and often were described by their parents as difficult to control. The chief defense mechanisms of the 4-9/9-4 adolescent was acting out, and therapists described these adolescents as resentful of authority, insecure, socially extroverted, and capable of initially arousing liking in others. Marks et al. (1974) referred to these adolescents as “disobedient beauties” and provided descriptors including “seductive, provocative, and handsome” (p. 221). The clinical correlate data for the 4-9/9-4 code type indicate individuals with this MMPI pattern are often in trouble with their environment, because of antisocial behaviors. In

the adult literature, individuals with this code type pattern often receive a diagnosis of antisocial personality disorder and are described as selfish, impulsive, and self-indulgent. As noted in the model for profile interpretation, it is often useful to review values for scales 2

and 7 to assess the overall degree of affective distress. Deborah’s scores on these measures are markedly low for an adolescent recently admitted to inpatient treatment, and are equivalent to those found for the MMPI-A normative population. This adolescent’s absence of emotional or affective distress is a negative prognostic indicator for Deborah and may reflect a lack of necessary motivation (i.e., emotional distress) to engage in the therapeutic change process. Steps 5 and 6 in the profile interpretation process involve a review of supplementary and content scales. This adolescent’s T-score values on the content and supplementary scales of the MMPI-A are presented in Fig. 18.3. Consistent with the absence of affective distress reflected in the basic scale profile, Deborah’s score on Welsh’s A scale (T = 51) suggests little distress or discomfort at the time of her MMPI-A assessment. Further, her score on

Welsh’s R scale (T = 46) reinforces the findings from her 4-9/9-4 code type in suggesting that acting out, rather than repression, is her primary defense mechanism. A review of Deborah’s supplementary scale scores also provides a number of interesting observations related to potential substance abuse problems. This adolescent’s raw score value of 30 on the MAC-R would result in a classification of this adolescent as a probable substance abuser, a finding that also is consistent with her elevated scores on the PRO scale (T = 84) and her psychosocial history findings. Additionally, research by Archer, Gordon, Anderson, and Giannetti (1989) indicated that adolescents with elevated MAC scores are much more likely to receive diagnoses related to conduct disorder. In contrast, Deborah’s scores are within normal limits on the ACK

scale (T =

56), a measure

of this adolescent’s willingness to

acknowledge or discuss alcohol or drug use symptoms and problems. Thus, Deborah may have many more problems in the area of drugs and alcohol than she will admit in clinical interview. Finally, Deborah also shows a marginal elevation on the JMM scale, a measure of

deficits and problems in the area of ego maturation, self-awareness, and the ability to form meaningful and nonexploitive relationships with others. Archer et al. (1991) found that female adolescents who produce elevations on the MM scale have poor relationships with their parents and frequently have histories of school truancy. Deborah’s content scale profile, consistent with her low score on the Welsh’s A scale,

produced normal-range values on measures of affective distress and internal symptoms. This is reflected in her normal-range values on scales A-anx, A-obs, A-dep, A-hea, and A-biz. Deborah is likely to have substantial difficulty in interpersonal functioning as reflected in her

substantial elevation on the A-fam scale, which indicates the presence of marked family conflict and discord as well as marginal elevations on A-ang and A-cyn (T = 55 to T S 60).

Deborah also shows a marginal elevation on the A-con scale, indicative of problem behaviors involving unlawful actions or attitudes and behaviors that violate societal standards. Deborah’s score on the A-trt content scale probably reflects a negative attitude toward mental 4

ee?

a4

ces

Sa

‘inl

KS

ke

fen

fou
equaysy "(0661

JBYyeO

|OOYDS

JBYOeO

Sp410994

|OOYDSS1qSP40994 BYR MaIM9zU!

SPV I seyoee

edyl |OOYDS

duL

1SIKxXy jualed suioday |

AIEVL bebe sejdwexy Jo

e6enbueq s}so}

“(41661 ‘Yoequeysy) Woy UoHTeAsaSQQ 1991C]g ‘1661 ‘yoequaysy) WO Woday sseyree],

518 jeo|bojoiney wexa

YINOAp ‘Yoequaysy) (PI661 ”Hoday-jlag

21 After the items were

CBCL AND RELATED INSTRUMENTS

519

finalized for a particular instrument, the instrument was used to

assess a large number of children who had been referred for professional help with behavioral/emotional problems. The referred children were drawn from diverse mental health and special educational settings to avoid the biases inherent in the case loads of individual settings. Within samples of children of each gender in particular age ranges, correlations were computed among the problem items, scored on a particular instrument. To identify syndromes of problems that tend to co-occur, principal components analyses were performed on the correlations among problems found in each sample. (The term syndrome is used in the generic sense of problems that tend to occur together. It does not imply any assumptions about the etiology or diagnosis of disorders.) To identify syndromes that were statistically robust, varimax rotations were performed on different numbers of principal components for each analysis. Sets of items that remained together throughout multiple rotations were retained as a basis for syndromes characterizing children of a particular age and gender. Historically, this research program began with data derived from child psychiatric case histories (Achenbach,

1966). Thereafter, it led to the development of rating forms to be filled

out by each relevant type of informant, including parents, teachers, youths, observers, and interviewers. In the initial derivation of syndromes, each syndrome was based entirely on the items found to occur together in the analyses of children of each gender in a particular age range scored on a particular instrument (Achenbach,

1978; Achenbach & Edelbrock,

1983,

1986, 1987). Profiles were constructed for scoring the syndromes obtained for each gender in each age range on each instrument in relation to normative samples of the same gender and age. Some of the syndromes had counterparts in virtually all analyses. For example, versions of a syndrome designated as Aggressive Behavior were found for both genders in all age ranges, as rated by each type of informant. Other syndromes were more variable, being found for only one gender, limited age ranges, or particular informants. As the use of our instruments spread, clinicians and researchers increasingly sought to compare findings for children of both genders and different ages, rated by different informants. They also sought to compare findings for the same child rated at different ages and/or by different informants. The variations in syndromes by gender, age, and informant impeded such comparisons.

Cross-Informant Syndromes Derived from the CBCL/4—18, TRF, and YSR In 1991, we made major revisions in the profiles Behavior Checklist for ages 4-18 (CBCL/4—18), Form for ages 5—18 (TRF), and self-reports on the For all three instruments, the 1991 profiles are

for scoring parents’ reports on the Child teachers’ reports on the Teachers’ Report Youth Self-Report for ages 11-18 (YSR). designed to score eight cross-informant

syndrome constructs that have been derived from data on both genders in multiple age ranges, rated by different informants (Achenbach, 1991b, 1991c, 1991d). The crossinformant syndrome constructs are defined by sets of problems that were found to co-occur in

a majority of samples of boys and girls of different ages, as rated by different types of informants. The problems are assessed by having the informants rate items such as cruel to animals and nightmares on 3-step scales, where 0 = not true (as far as you know), 1 =

somewhat true or sometimes true, and 2 = very true or often true. The eight syndrome constructs that met our criteria for robustness across gender, age, and informant are listed in

Table 21.2.

520

ACHENBACH TABLE 21.2 Cross-Informant Syndrome Constructs Scored From the CBCL/4-18, TRF, and YSR

Internalizing

Neither Internalizing nor Externalizing

Withdrawn Somatic Complaints Anxious/Depressed

Social Problems Thought Problems Attention Problems

Externalizing Delinquent Behavior Aggressive Behavior

The Internalizing and Externalizing groupings of syndromes shown in Table 21.2 were formed from second-order factor analyses of scores on the eight syndromes (Achenbach, 1991a). Children can be scored on scales comprising all the Internalizing items and all the Externalizing items, as well as on the syndrome scales and a scale of total problem scores.

PROFILES FOR SCORING THE CBCL/4-18, TRF, AND YSR The 1991 profiles for the CBCL/4-18, TRF, and YSR include scales for scoring the eight cross-informant syndrome constructs. In addition, the profiles also preserve important gender, age, and informant variations. This is done in three ways. First, items that were strongly

associated with a cross-informant syndrome in ratings by only one type of informant are included in the scale for scoring that syndrome from that type of informant. For example, an item concerning suicidal thoughts was strongly associated with the Anxious/Depressed syndrome only in self-ratings by youths on the YSR. Because this item was not strongly associated with the Anxious/Depressed syndrome in parents’ ratings on the CBCL/4—18 or teachers’ ratings on the TRF, it is not included in the cross-informant construct for the Anxious/Depressed syndrome, nor in the scales for scoring this syndrome. However, it is included in the scale for scoring the Anxious/Depressed syndrome on the YSR profile. A second way in which gender, age, and informant variations are preserved is that syndromes that were robust only in particular analyses are included in the profile for the relevant group and type of informant. For example, in boys’ YSR ratings, we found a syndrome that was designated as Self-Destructive/Identity Problems. There was no counterpart to this syndrome in girls’ YSR ratings, nor in parent or teacher ratings. To preserve this syndrome, which is specific to boys’ self-ratings, it is scored on the YSR profile for boys. A second example

is a syndrome

designed as Sex Problems,

which was

found in parents’

ratings of 4- to 11-year-old boys and girls, but not in parents’ ratings of 12- to 18-year-olds, nor in teacher ratings or self-ratings. The Sex Problems syndrome is therefore scored from the CBCL for both genders at ages 4—11, but not from the CBCL at ages 12—18, nor from the TRF or YSR. A third way in which gender, age, and informant variations are preserved is by providing norms for each syndrome scale that are based on children of a particular gender in a particular age range rated by a particular type of informant. The normative samples of each gender and type of informant were drawn from a nationally representative sample of children that excluded those who had received mental health services or special remedial school classes in

the preceding 12 months. An individual child’s standing on each syndrome is displayed in

21

CBCL AND RELATED INSTRUMENTS

terms of percentiles and T scores derived from a normative sample of nonreferred children of the same gender and age range, rated by the same type of informant. Figure 21.1 displays a computer-scored version of the 1991 CBCL profile for 14-year-old Paul. Percentiles of the normative sample for the child’s gender and age are indicated on the left side of the profile, whereas T scores are indicated on the right side. The raw scale score

and T score for each syndrome scale are printed below the item scores for the scale. The broken lines across the profile at T scores of 67 and 70 demarcate a borderline clinical range. Syndrome T scores below 67 are in the normal range, whereas syndrome T scores above 70 are in the clinical range. The borderline range is included in the profile to emphasize the quantitative nature of a child’s standing on each scale. Rather than indicating

Internalizing

-| =} -|

17

=}

16

1991

7

27 26

16

tens -| “| | =|

14

-|

12

|

14

24

14

23

|

22 21 20

| | | |

13

13 12 11

19

|

11

10

18

|

| | “| “| = Ell

50,

15

15

13

-|

CBCL

| | |

Profile

-

Boys

1S:

12-18

21 20

12

11

19

10

18 17

12

0

1 WITHDRAWN 0 42.Rather BeAlone

Il SOMATIC

|- 1D# 118655

38

|- IN: PAULSMOM .DAT

37

|-95

Boy

AGE:

36

|5 | |-90 |= | |-85 |2 | |-80 |= |a |-75 |3

DATE

FILLED:

34 33 31

29 28

Ill

IV

Vv

VI

SOCIAL

THOUGHT

ATTENTION

PROBLEMS

PROBLEMS

Dizzy

0 12.Lonely

154.

Tired

0 14.Cries

0 56a.Aches 0 56b.Headaches 0 56c.Nausea 0 5éd.Eye

0 0 1 2 0

PROBLEMS 0 1. Acts

0 80.Stares

0 56e.Skin

0 45.Nervous

Liked

0 56f.Stomach

0 50.Fearful

0 55.0ver-

0 56g.Vomit

0 52.Guilty

1 TOTAL

0 71.SelfConsc

0 62.Clumsy

2 89.Suspic

0 64.Prefers

53 T SCORE

2 111.Withdrawn

0 112.Worries

4 TOTAL

5 TOTAL

58 T SCORE

Things

0 80.Stares* 0 84.Strange Behav 1 85.Strange

Young

Ideas

1 TOTAL 50 T SCORE

0 1. Acts Young

0 40.Hears 1 8. ConcenThings trate 0 66.Repeats 1 10.Sit Acts still 0 70.Sees 0 13.Confuse

Weight*

0 103.Sad

Mind off

31.FearDoBad 0 11.Clings 32.Perfect 1 25.NotGet 33.Unloved Along 34.OutToGet 0 38.Teased 35.Worthless 0 48.Not

0 102.Underactive

0 9.

Young

0 88.Sulks

0 103.Sad

0-2

ANXIOUS/

051.

1 TOTAL 57 T SCORE

58 T SCORE

0 17.Day-

dream 1 41.Impulsv

not

on Cross-Informant

Construct

Not

Vermont

Burlington,

VT

in Total

0 2.Allergy

Problem

Score

36

TOTSCORE

51

ToT

65++

T

INTERNAL

10

INT

58

T

EXTERNAL

EXT T

33 T3++

Clinical

+ Borderline OTHER 05.

0 0 0 -

PROBS

ActOppSex

6. BM Out 15.CruelAnim 18.HarmSelf 24.NotEat

0 28.EatNonFood 0 29.Fears

0 30.FearSchool 1 36.GetHurt 0

44.BiteNail 47.Nightmares

BEHAVIOR

0 0

2 26.NoGuilt

2 3.

Argues

0 53.0vereat

2 39.BadCompan

1 7.

Brags

0 56h.0therPhys

VII DELINQUENT

AGGRESSIVE

BEHAVIOR

49.Constipate

1 43.LieCheat 1 63.PrefOlder

2 16.Mean 2 19.DemAttn

0 58.PickSkin 0 59.SexPrtsP

0 67.RunAway

0 20.DestOwn

0 60.SexPrtsM

0 72.SetFires 1 81.StealHome 0 82.StealOut

0 1 0 0

1 90.Swears

0 2 1 0

21 DestOthr 22.DisbHome* 23.DisbSchl 27.Jealous

0 96.ThinkSex*

2 37.Fights

73.SexProbs 76.SleepLess 77.SleepMore 78.SmearBM

0 79.SpeechProb

1 101.Truant

0 57.Attacks

0 83.StoresUp

0 68.Screams

0 91.TalkSuicid

1 61.Poor

1 106.Vandal* 10 TOTAL

2 74.Show0f f 2 86.Stubborn

0 92.SleepWalk 0 98.ThumbSuck

70 T SCORE

2 87.MoodChng

0 99.TooNeat

1 93.TalkMuch 1 94.Teases

0 100.SleepProb 0 107.WetsSelf

School 0 62.Clumsy

2 95.Temper

0 108.WetsBed

1 97.Threaten

0 109.Whining

0 104.Loud

0 110.WshOpSex

23 TOTAL

Prospect

# ITEMS

0 105.AlcDrugs

55 T SCORE

T.M.Achenbach

36

0 45.Nervous

0 80.Stares

1991

0-5 VIIT

0-1

Mother

CARDS 02,03

0 46.Twitch*

4 TOTAL

Copyright

BY:

AGENCY

-70 ++

| | |-65 || |-60 ||| | | |-50

14

04/04/91

|-

=|

0

DEPRESSED

0 65.Won't Talk 2 69.Secretive 0 75.Shy

1S.

22 21

39

9

02211

COMPLAINTS

Univ.

25 24

| | | | | |

=[e_' 6-2.

*Items

T Score

Externalizing

73 T SCORE

O 4.Asthma

05401-3456

FIGS 210k Computer-scored CBCL problem profile for 14-year-old Paul. From Integrative Guide for the 1991 CBCL/4—18, YSR, and TRF Profiles (p. 53) by T. M. Achenbach, 1991a, Burlington, Vermont: University of Vermont. Copyright 1991 by T. M. Achenbach. Reprinted by permission.

0 113.0therProb

521

522

ACHENBACH TABLE 21.3 Derivation Samples for CBCL/4-18, TRF, and YSR Syndromes and Norms

Syndrome Derivation

Normative Sample*

Instrument

N

Sources

N

CBCL/4-18

4,445

52 mental health services

2,368

TRF/5-18

2,815

58 mental health and special education services

1,391

YSR/11-18

1,272

26 mental health

ge

services

Sources

Parents in national home interview survey Teachers of children in national survey

ee

Youths in national

home interview survey

*Children were drawn from a sample selected to be representative of the 48 contiguous states with respect to socioeconomic status, ethnicity, region, and urban-suburban-rural residence.

Children were excluded if they

were handicapped, had received mental health services or special remedial school classes in the previous 12 months, or lacked an English speaking parent or parent surrogate. Details of the samples, SES, ethnicity, region, and informants are presented by Achenbach (1991b, 1991c, 1991d) and McConaughy, Stanger, and Achenbach (1992).

whether a child is sick or well, the scale scores indicate whether the number of problems

reported for the child is low, intermediate, or high relative to the problems reported for normative samples of peers by the same type of informant. For the Internalizing, Externalizing, and total problem scores, which encompass broader ranges of problems than the individual syndrome scales, the borderline clinical range is defined by T scores from 60 to 63. Table 21.3 summarizes the clinical samples from which the scales were derived and the normative samples on which the percentiles and T scores are based.

COORDINATING PARENT, TEACHER, AND SELF-RATINGS To facilitate comparisons among parent, teacher, and self-ratings for clinical or research purposes, the problem portions of the CBCL/4—18, TRF, and YSR profiles are laid out in a uniform format, as illustrated in Fig. 21.1. Thus, if ratings are obtained on one or more CBCLs,

one or more

TRFs,

and the YSR,

profiles from each of them can be directly

compared to identify syndrome scales that show agreement or disagreement among the informants. This can be done using either hand-scored profiles or profiles scored from IBMcompatible or Apple II programs that are available from the author of this chapter. In addition, an IBM-compatible cross-informant program is available that offers the following features: 1. The user can enter data from any combination of five CBCLs, TRFs, and YSRs for the same child, therefore the separate CBCL, TRF, and YSR programs are not needed. 2. Profiles can be scored and printed from each informant. 3. The scores obtained from all informants for 89 items common to the three instruments can be displayed side by side. 4. T scores obtained from all informants for the eight syndrome scales, Internalizing, Externalizing, and total problems can be displayed side by side.

21

CBCL AND RELATED INSTRUMENTS

5. Q correlations are printed between the 89 item scores and between the eight syndrome scale scores from various combinations of informants who have rated the child. Q correlations are also printed from large reference samples of each combination of informants. 6. All the raw scores, T scores, and Q correlations can be stored and used as input for statistical analyses.

The 1993 IBM-compatible programs also compute intraclass correlations between a child’s profile and profile types identified via cluster analyses (Achenbach, 1993). Figure 21.2 illustrates the portion of a cross-informant printout that displays Paul’s T scores from ratings on CBCLs completed by both his parents, the YSR completed by Paul, and TRFs completed by two teachers. The printout provides side-by-side listings of the T scores for the eight syndrome scales common to all three instruments, Internalizing, Externalizing, and total problems. It prints a single cross beside each score that is in the borderline clinical range and two crosses beside each score that is in the clinical range. To provide a quantitative index of agreement between pairs of informants, the program computes Q correlations between the item scores obtained from the informants and also computes Q correlations between the eight syndrome T scores obtained from pairs of informants. The Q correlations are obtained by using the Pearson product-moment formula to compute the association between a set of scores obtained from one informant and the corresponding set of scores obtained from a second informant. For example, the scores obtained from Paul’s YSR on the 89 items common to all three instruments are correlated with the scores obtained from his mother’s CBCL

on the same 89 items. Similarly, the T

scores obtained from Paul’s YSR on the eight syndrome scales are correlated with the T scores obtained from his mother’s CBCL on the eight syndrome scales. As Fig. 21-2 shows for the syndrome scales, the Q correlations between informants are

printed beneath the side-by-side listings of the syndrome scale scores. In addition, to provide

T Scores for 8 Syndrome Scales Common to CBCL, YSR and TRF TRF.1

TRF.2

1, Withdrawn

55

58

61

50

51

2. Somatic Complaints 3. Anxious/Depressed 4. Social Problems

Scale

Fa.CBCL Mo.CBCL YSR53 56 $5

$3 58 50

65 50 53

T2+4+ 50 Tl++

64 55 64

5. Thought Problems

$7

57

55

50

68+

6. Attention Problems

58

55

$1

T3++

63

7. Delinquent Behavior 8. Aggressive Behavior

69+ 70+

70+ 7T3++

59 57

70+ Tl++

70+ 70+

55 Tl++ 62+

58 73++ 65++

59 58 56

61+ Tl++ 644+4+

55 Tl++ 61+

Internalizing Externalizing Total Problems +Borderline Clinical Range

++Clinical Range

Q Correlations Between 8 Syndrome Scales For Reference Samples 25th Mile Mean 75th %ile = 87 35 58 B89 = 05 -.11 26 -60 = 27 -14 .23 60 =52 -.14 23 60 = 27 -11 26 60 = .04 -14 23 60 = 47 -14 23 60 = -15 17 50 =-17 15 17 50

For this Subject Fa.CBCL x Mo.CBCL Fa.CBCL x YSR Fa.CBCL x TRF.1 Fa.CBCL x TRF.2 Mo.CBCL x YSR Mo.CBCL x TRF.1 Mo.CBCL x TRF.2 YSR x TRF.1 YSR x TRF.2

Conclusions Agreement Agreement Agreement Agreement Agreement Agreement Agreement Agreement Agreement

between between between between between between between between between

Father and Mother is average. Father and Youth is average. Father and Teacher 1 is average. Father and Teacher 2 is average. Mother and Youth Is average. Mother and Teacher 1 is average. Mother and Teacher 2 is average. Youth and Teacher 1 Is average. Youth and Teacher 2 is below average.

FIG. 21.2. Cross-informant printout of scores obtained by 14-year-old Paul on scales common to the CBCL, YSR, and TRF. From Integrative Guide for the 1991 CBCL/4—18, YSR, and TRF Profiles (p. 91) by T. M. Achenbach, 1991a, Burlington, Vermont: University of Vermont. Copyright 1991 by T. M. Achenbach. Reprinted by permission.

523

524

ACHENBACH a basis for evaluating the level of agreement for particular pairs of informants, the program prints out the 25th percentile, mean, and 75th percentile Q correlation obtained in large reference samples of similar pairs of informants. As shown in Fig. 21.2, Q correlations that

are below the 25th percentile are considered to indicate below average agreement, those between the 25th and 75th percentiles are considered to be in the average range, and those

above the 75th percentiles are considered to be above average. Thus, for both clinical and

research purposes, the cross-informant program enables users to identify cases for which agreement between particular pairs of informants is low, average, or high. It also enables users to identify individual informants whose reports are especially discrepant from reports by other informants about the same child. Unless their reports can be substantiated, the informants’ whose reports are discrepant in important ways from those of all the other informants may be targeted for interventions to change their perceptions of the child’s behavior.

Competencies Scored from the CBCL/4—18, TRF, and YSR A key goal of our multiaxial empirically based approach has been to derive syndromal constructs as a basis for a taxonomy of child and adolescent disorders that can be assessed from multiple sources of data. However, children’s need for help may depend not only on the problems they manifest, but also on their competencies or lack thereof. For example, a child

who has a high score on the Attention Problems syndrome, but who has good social skills, an IQ of 130, and works hard in school, may not need the kind of help needed by a child with the same score on the Attention Problems syndrome, but poor social skills, an IQ of 90, and

no interest in school. To combine the assessment of competencies with the assessment of behavioral/emotional problems, the CBCL/4—18 and YSR include items for assessing the amount of quality of involvement in sports, nonsports activities, organizations, jobs and chores, social relationships, and school performance. The TRF has items for assessing academic performance in terms of grade level, plus adaptive functioning in terms of how hard the child is working, how appropriately he or she is behaving, how much he or she is learning, and how happy he or she is. These items are scored on scales that compare a child’s standing with scores obtained by the same national samples as the problem scales are normed on. As shown on the hand-scored version of Paul’s profile in Fig. 21.3, the competence scales scored from the CBCL/4-18

are designated as Activities, Social, and School. A total competence score is

also computed by summing the three scale scores. Like the problem portion of the profile, the competence portion displays percentiles on the

left and T scores on the right that are based on a national normative sample. The broken lines printed across the profile demarcate the borderline clinical range. However, unlike the problem scales, scores below the bottom broken line (indicating a lack of competencies) are in the clinical range. Scores above the top broken line (indicating more competencies) are in the normal range. Analogous profiles are provided for scoring competencies from the YSR and adaptive functioning from the TRF.

‘uoissiuued Aq pejuudey ‘yoequeuoy ‘W ‘L Aq L661 JYubuAdOD “jUOWaA JO AyISI@AIUL) JULIE, ‘UO}HUILINg

role

7”

uonjedioived

Ut|114S‘WALT “g SaIIANIE “I

81-2

J “W'IA JONEYOR YPM S18Y}O “g ueay gof Ayyenb —7~

bh-9

69< 8%

0S

le

9b

l

z

: OLD

suods jo# ‘V'| io

SFILIAILDY

ee

‘ 4 pue vol} Bie Bin a

jo sgoljo# ueay

aby

aby

1vID0$S

bt-9

Suolyez|ueb10 JO #WIC

suorjeziuebio

“OE “G 1o1lAeYyag auoe

ul uonedioiyed joueaw ‘a —O

10} a|!}0O1d

8t-2b

JOGO

7

oO Be

12]01 sea Ym Sjoejuod jo Aouanbaiy ‘2 st spuat} jo# “LA

ssejo jeioads 2

swajqoid jooy9g p apei6 payeaday ‘¢

— sAog

aby

TOOHDS

BQURWOJEd UBB ‘L “IIA _S%/

, bh-9

L66L sajeos aouajadwiog

$2

; 81-2

pue

“El? ‘Ol

“EL66L ‘yoequeyoy “W ‘L Aq (6€ “d) Sajyold JUL PUB ‘YSA ‘8L—F/TDED LEEL 84} 40j aPINH

+ jooyos

0€

SE

Ov

Sb

$$ 81095 J

0S

02

spuats)

aanesbaju] Woi4 “|Neq pjo-seeA-y| 10} ajoud eoue}odWOod 7OqO pesoos-pueyH

$aje9S adua}adwod UO palods JON

saiyayjoe sodsuou jo saquinn ‘yil

wesc

121905

je}o,

ail!

hig Tt

7s Be

le-h-h 4089 a1eq

Aq no tad] W palit} IOGO

9109S aouajadwod

ee

ajqey woy

sauanoy

JE}O,

LIPIOL

Ore

c .

aBues edit] auljiapsiog= saul] ueyo1g

7

sm

eBuey |ewoN

925

526

ACHENBACH

Other Empirically Based Instruments The CBCL/4—18, TRF, and YSR are designed to obtain similar types of data in a similar format from parents, teachers, and youths at ages where two or all three types of informants are relevant. Our empirically based approach has also been used to develop analogous assessment instruments for 2- and 3-year-olds and young adults. In addition, the approach has been extended to observational assessment of children in group settings, such as classrooms, and individual clinical interviews.

THE CHILD BEHAVIOR CHECKLIST FOR AGES 2-3 (CBCL 2/3) There has been relatively little research on behavioral/emotional problems occurring prior to age 4. Neither clinical assessment procedures nor nosologies such as the American Psychiatric Association’s (1987) Diagnostic and Statistical Manual (DSM)

provide differentiated

pictures of disorders among toddlers. To extend empirically based assessment to younger ages, we developed the CBCL/2-3. The CBCL/2-3 has 100 problem items that are scored on 3-step scales like those for the CBCL/4—18,

TRF, and YSR problem items. Fifty-nine of the problem items have counter-

parts on the CBCL/4—18. Other than the kinds of developmental achievements measured on standardized tests, it is difficult to specify what characteristics should be considered to represent social competence among toddlers. Consequently, the CBCL/2-—3 does not include competence items. However, it does have an open-ended item for describing the best things about the child, as do the CBCL/4—18,

TRF, and YSR.

The 1992 revision of the profile for scoring the CBCL/2-—3 (Achenbach,

1992) follows a

format like that shown in Fig. 21.1 for the CBCL/4—18. It provides percentiles and normalized T scores based on a randomly selected general population sample and it demarcates a borderline clinical range from T scores of 67—70 on the syndrome scales. The syndromes were derived from principal components/varimax analyses of 2- and 3-year-olds who had been referred for mental health services or who scored in the top 50% of nonreferred children, as summarized in Table 21.4. Six syndromes were identified. Those designated as Anxious/Depressed, Withdrawn, Somatic Problems, and Aggressive Behavior have approximate counterparts in the cross-informant syndromes identified for older ages. The remaining CBCL/2-—3 syndromes—designated as Sleep Problems and Destructive Behavior—do not have clear counterparts among the syndromes identified for older ages. Second-order factor analyses yielded an Internalizing grouping consisting of the Anxious/Depressed and Withdrawn syndromes, and an Externalizing grouping consisting of the Aggressive Behavior and Destructive Behavior syndromes.

THE DIRECT OBSERVATION FORM (DOF) The DOF was developed to apply our empirically based approach to the assessment of problem behavior observed in school classrooms and other group settings. The DOF has 96 problem items, plus an open-ended item for entering additional problems. The items are designed to capture problem behavior that can be observed in 10-minute observational samples. Seventy-two of the problem items have counterparts on the CBCL/4—18, whereas

85 have counterparts on the TRF. The DOF also records whether the child is on task or not at

21

CBCL AND RELATED INSTRUMENTS

TABLE 21.4 Derivation Samples for CBCL/2-3, DOF, and SCIC Syndromes and Norms

Syndrome Derivation Instrument

N

Normative Samples

Sources

N

Sources

CBCL/2-3

546

Seven mental health and special education services, plus children having high problem scores in other samples

368

Home interview surverys of parents in national sample and Worcester, MA area

DOF

212

Children referred for mental health or school psychological services from 45 public and parochial schools

287

Classroom control children in 45 public and parochial schools

SCIC*

108

Children referred to outpatient psychatry service or school psychologists

108

Same as sample from which syndromes were derived

*Revised profile derived from larger samples will be published in 1994.

the end of each | minute of observation. The on-task scores are summed to provide a score of 0—10 for each 10-minute observational session. To provide a record of what is seen in each observational session, the observer writes a

narrative description of the child’s behavior and interactions over a 10-minute interval. The narrative is written in space provided on the DOF near the list of items to be rated. To take account of the context, the behavior of others toward the child, and characteristics of the

child that might not be captured by precoded items, the observer describes the actual stream of behavior, including the events impinging on the child. At the end of each 10-minute observation, the observer scores the items on a 4-step scale ranging from 0 = not observed to 3 = definite occurrence with severe intensity or greater than 3 minutes duration. To obtain a stable index of problems and on-task behavior, 10-minute samples of behavior should be obtained on three to six occasions, and the scores should be averaged over these

occasions. To provide a baseline for the behavior of peers in the same context, it is recommended that the DOF be completed for one control child of the same gender observed just before the target child and a second control child of the same gender observed just after the

target child. To facilitate comparisons of the target and control children across multiple occasions, the scoring program for the DOF prints out mean scores for the target and control children averaged over two to six occasions. The program displays these comparisons for ontask behavior, six empirically derived syndromes, Internalizing, Externalizing, and total problem scores. The following syndromes were derived from principal components/varimax analyses of 212 clinically referred 5- to 14-year-old children: Withdrawn-Inattentive, Nervous-Obsessive, Depressed, Hyperactive, Attention Demanding, and Aggressive. Percentiles for the problem scores are based on 287 5- to 14-year-old children observed as controls for referred children in regular classrooms of 45 public and parochial schools of 23 school systems located in Vermont, Nebraska, and Oregon. Hand-scored profiles are avail-

able for on-task, Internalizing, Externalizing, and total problem scores. However, because

527

528

ACHENBACH the syndrome scales are too laborious to average by hand, they are scorable only from the DOF computer-scoring program. Mean scale scores for referred and nonreferred children are

provided by Achenbach (1991b), whereas reliability and validity data have been reported by

Achenbach and Edelbrock (1983), McConaughy, Achenbach, and Gent (1988), and Reed and Edelbrock (1983).

THE SEMISTRUCTURED CLINICAL INTERVIEW FOR CHILDREN (SCIC) Interviews are probably the most widely used clinical assessment procedures. They typically form the centerpiece of clinical contacts in that they provide the clinician with direct impressions of clients and their responses to the clinician’s probing. What the clinician gleans from interviews is apt to be more influential than other types of assessment data, because the clinician may filter other data through the first-hand interview impressions. Despite their popularity and influence, clinical interviews of children have not been subjected to much research to determine what data can be best obtained at what ages and how such interviews can be integrated with other data. Since the advent of DSM-III (American Psychiatric Association, 1980), a prominent approach to clinical interviews has been to ask children whether they have each of a large number of symptoms that are used as criteria for DSM disorders (Reich & Welner, 1989; Shaffer, Fisher, Piacentini, Schwab-Stone, & Wicks,

1989). Diagnoses are then based on whether the symptoms affirmed by the child meet the criteria for DSM disorders. Structured interviews have increased the rigor with which diagnoses of adults are made in epidemiological surveys (Helzer et al., 1990). Yet, there is considerable evidence that children do not reliably and validly report DSM symptoms when asked about them in structured interviews. For example, many of the symptoms that children affirm in an initial interview are denied in a repeat interview several days later (Edelbrock, Costello, Dulcan, Kalas, &

Conover, 1985). This results in low test-retest reliability and a large decline in the number of diagnoses made from the first interview to the second interview with the same children. A lack of validity has been indicated by the generally poor agreement found between diagnoses made from structured clinical interviews with children and diagnoses made from interviews with their parents or from comprehensive clinical evaluations (Costello, Edelbrock, Dulcan, Kalas, & Klaric, 1984; Shaffer et al., 1988).

To apply empirically based assessment to interviews with children, we developed an interview geared to the cognitive levels and interactive styles of 6- to 11-year-old children. Instead of asking children whether they have particular symptoms, the Semistructured Clinical Interview for Children (SCIC; McConaughy & Achenbach, 1990) provides open-ended questions designed to elicit children’s reports and views on various important areas of their lives, including family, friends, school, activities, concerns, and fantasies. It also includes a

kinetic family drawing, brief achievement tests, screen for fine and gross motor abnormalities,

and probe questions

about problems

attributed to the child by others,

such as

parents and teachers. The SCIC focuses not only on what the child says in response to questions, but also on the child’s nonverbal behavior. While administering the SCIC, the interviewer notes on the

protocol what the child says and does. Immediately after the interview, the interviewer scores the 117 observational items and 107 self-report items, as well as additional unlisted problems

that were observed. Each item is scored on a 4-step scale like that of the DOF. Principal components/varimax analyses have yielded four syndrome scales based on the SCIC obserx

*

21

CBCL AND RELATED INSTRUMENTS

vational items, designated as Anxious, Withdrawn-Depressed, Inattentive-Hyperactive, and

Resistant. The analyses also yielded four syndrome scales based on the self-report items, designated as Inept, Unpopular, Family Problems, and Aggressive. The sample from which the syndromes were derived is summarized in Table 21.4. The syndrome scales, Internalizing, Externalizing, and total problem scores are entered on the SCIC profile using either a computer-scoring program or hand-scoring forms. Supplementary questions and scoring items are available for use with adolescents through age 18. A revision of the SCIC profile based on larger referred samples, plus nonreferred normative samples, is planned for 1994. Table 21.5 summarizes the scale names for all the instruments discussed to this point.

TABLE 21.5 Scales Scored From the CBCL/2-3, CBCL/4-18, TRF, YSR, DOF, and SCIC

Instrument

Scale*

CBCL/2-3

CBCL/4-18

-

-

TRF

YSR

DOF

SCIC



'



os

'

'

Adaptive/Competence . . . .

Academic performance Activities Adaptive On-task School Social . Total competence =NOORWN

'

+(+0 '

454 ++

Problems

. . . . .

Aggressive Behavior Anxious/Depressed Attention Demanding Attention Problems Delinquent Behavior Destructive Behavior Family Problems Inept Nervous-Obsessive . Resistant . Self-Destructive/Identity —_ =OOMNONAWN=

Problems

. Sex Problems . Sleep Problems . Social Problems . Somatic Complaints . Thought Problems . Withdrawn NOOBROP

oie 2NeCM

goes :oe

OP te obhe

+ Ft +

.

+ +s 1+

+1) FO ae

'

'





-

-

-

+ -

=

=

-

boys

-

+ +

ages 4-11 -

a -

-

-

'

+

++

+eete ++tt

+ + +

+ + 2 s +

“ + ™ FS +

ao a ee

iste air: Sti

Broad Band

1. 2. 3. 4. 5.

Internalizing Externalizing Observational Self-report Total problems

Note.

+ + a 3 +

+ + . : +

et (+t

+ indicates that the scale is scored from the instrument; - indicates that the scale is not scored from

the instrument. *Scale names differ somewhat among the instruments.

529

530

ACHENBACH

ASSESSMENT OF YOUNG ADULTS It has long been assumed that many adult disorders have their origins in childhood, and that childhood problems may predict adult disorders. Yet, there is a large gap between the procedures and criteria for assessing childhood problems on the one hand, and the procedures and criteria for assessing adult disorders on the other hand. This gap becomes especially obvious when one conducts longitudinal and follow-up research on relations between child and adult problems. Between the late teens and mid-20s, neither problem behavior nor competencies can be judged by the standards of either the preceding or subsequent years. Young people follow diverse developmental pathways from adolescence to adulthood. Many may manifest behavior that would be quite deviant in earlier or later years but that is not so deviant during the transition to adulthood. To extend empirically based assessment to this period, we developed the Young Adult Self-Report (YASR; Achenbach,

1990b), which has a format similar to that of the YSR.

Many YASR items have counterparts on the YSR, but items have been modified or added to tap functioning related to marriage and similar relationships, work, higher education, and other situations relevant to the period from age 19 to the mid-20s. For young adults who maintain sufficient contact with their parents, we developed the Young Adult Behavior Checklist (YABCL; Achenbach,

1990a). The format of the YABCL

is generally similar to that of the CBCL. Many items have counterparts on the CBCL, with modifications and additions appropriate for young adulthood. As research on these instruments progresses, scoring profiles, as well as reliability and validity data, will be published.

Reliability and Validity The manuals for the CBCL/2—3,

CBCL/4—18,

TRF, and YSR present extensive data on

test-retest reliability, internal consistency, interinformant agreement (where relevant), and stability over various periods (Achenbach, 1991b, 1991c, 1991d, 1992). The manuals also present extensive data on validity, including content validity, construct validity, and criterion-

related validity. Additional correlates of the eight cross-informant syndromes have been surveyed by Achenbach (1993). Data on the reliability and validity of the DOF, as well as scale scores for referred and nonreferred children, have been published by Achenbach (1991b), Achenbach and Edelbrock (1983), McConaughy et al. (1988), and Reed and Edelbrock (1983). The Guide for the Semistructured Clinical Interview for Children Aged 6-11 (McConaughy & Achenbach, 1990) presented reliability data that encompass test—retest, interinterviewer, and interrater facets by being derived from scores provided by different interviewers rating children on the basis of the test-retest interviews they conducted at intervals averaging 12 days. Validity data are also presented in terms of significant associations with

CBCL/4—18 ratings by parents and TRF ratings by teachers. Additional supporting data will be presented in the SCIC manual to be published in 1994. Reliability and validity data are

summarized in Table 21.6 for all six instruments. In addition to our own data on reliability and validity, hundreds of studies published by

others have demonstrated reliability and validity with respect to many criteria. A bibliogra% ty

21

CBCL AND RELATED INSTRUMENTS

531

TABLE 21.6 Reliability and Validity Data

Instrument

Reliability?

CBCL/4-18

.88

Validity

1. All scales discriminate between referred and nonreferred at p< .01. 2. Significant rs with corresponding scales of

Conners (1973) and QuayPeterson (1987) instruments. TRE

91

1. All scales discriminate between referred and nonreferred at p< .01. 2. Significant rs with corresponding scales of Conners Revised Teacher

Rating Scale (Goyette, Conners, & Ulrich,1978). YSR

CBCL/2-3

Ages 11-14 .67 Ages 15-18 .83

1. All problem scales discriminate between referred and nonreferred at p< .01.

.87

1. All problem scales discriminate between referred and nonreferred p< .01. 2. Significant rwith Richman

(1977) instrument at p< .01. DOF

Total problems .91b

1. All scales discriminate

On task .g25

between referred and randomly selected controls at p< .01.

SCIC

71¢

1. Significant associations

with CBCL and TRF scores.4

Note. Many other reliability and validity data are presented in the manual for each instrument and in hundreds of studies listed in the Bibliography of Published Studies Using the Child Behavior Checklist and Related Materials(Brown & Achenbach, 1993). 4Unless otherwise indicated, reliability is mean of rs between all scale socres obtained over 7- to 15-day intervals, as reported in the manuals for the respective instruments. bReliabilities for DOF are rs between two observers scoring behavior observed during the same intervals (McConaughy, Achenbach, & Gent, 1988). Reliability for SCIC is mean of rs between all scale scores obtained from two interviewers independently interviewing and rating children at a mean interval of 12 days (McConaughy & Achenbach, 1990). dThe SCIC manual to be published in 1994 will present more extensive validity data.

phy of over 1,000 published studies including topic listings, author listings, and bibliographic references is updated as of April 1 of each year (Brown & Achenbach, 1993).

Basic Interpretive Strategy In keeping with our empirically based approach, our instruments are designed to provide

psychometrically sound, standardized descriptions of children’s functioning, as seen by a particular informant at a particular point in time. Rather than being interpreted to reveal

532

ACHENBACH

hidden entities, the descriptions obtained from each informant are to be compared with data from other sources. In the assessment of an individual child, data from multiple sources can be used to form a mosaic picture of how the child is seen by different people in different contexts.

Our cross-informant computer program facilitates comparisons among father, mother, teacher, and self-reports obtained on the same child. The program also indicates whether agreement between particular pairs of informants is below average, average, or above average for those types of informants. Where agreement is especially poor, the reasons should be explored to determine whether the child’s functioning differs markedly from one context to another, or whether characteristics of the informants markedly affect their reports. Differences between the pictures obtained from different sources can be clinically useful, because

they provide specific foci for asking informants about their perceptions of the child in those particular areas, the circumstances under which problem behaviors occur, and possible eliciting factors. For example, if a mother’s CBCL yields a much higher Aggressive Behavior score than does the father’s CBCL, the clinician can inquire about the circumstances in which the mother sees the aggressive behaviors she reports, whether the father is also present at the same time, whether one parent elicits different behavior than the other parent, whether the parents have different standards for judging the same behavior, and so on. For purposes of research, such as treatment outcome evaluations, data from each source can be analyzed separately to determine whether they point to similar or different conclusions. Data from multiple sources can also be aggregated by converting raw scale scores from each source to z scores within the research sample, and then computing a mean or weighted combination of scores to use as a composite variable. The scoring of our instruments is designed to facilitate analyses at the levels of items, narrow-band competence and syndrome scores, broad-band Internalizing and Externalizing scores, and more global total competence and problem scores. In addition, the 1993 versions of the IBM-compatible scoring programs enable users to determine the degree of similarity between a child’s CBCL/4—18, TRF, or YSR profile and profile types derived from cluster analyses (Achenbach, 1993). The manuals for our instruments provide numerous examples of practical applications to individual cases and research use of our instruments to gain knowledge that extends beyond the individual case (Achenbach, 1991b, 1991c, 1991d, 1992;

McConaughy & Achenbach, 1990).

Treatment Planning All of our instruments can be used to provide baseline assessments of children’s functioning. The specific problems reported and the scores on the various scales can be used to determine whether intervention is warranted and to identify specific targets for intervention. The CBCL/2-—3, CBCL/4—18, TRF, and YSR can all be obtained from the relevant informants as

part of the clinical intake process. The SCIC can be used by clinicians for their initial interviews with children. For school psychologists and others who work in school settings, the DOF can be used to obtain observations in classrooms and other group contexts, such as recess. Because professional qualifications and elaborate training are not needed to obtain data on the DOF, teacher aides and other school personnel can obtain the observational data. Many states require classroom observations as part of the evaluation process for special education services. For those who do not work in school settings, it may be possible to employ their own observers

21

CBCL AND RELATED INSTRUMENTS

or to arrange for school personnel to collect DOF data. We have found that many schools and teachers permit observations for this purpose, if permission is obtained from the parents of the target child and if the control children are not identified by name.

RESEARCH APPLICATIONS AND FINDINGS Our instruments can be used in diverse research contexts to select subjects having particular problem patterns and levels of deviance. For example, several studies have used the CBCL to identify children who had high scores on particular scales such as the Aggressive Behavior or Delinquent Behavior scale (Kazdin, Esveldt-Dawson, French, & Unis, 1987). Children who

were initially deviant in the target areas were then assigned to different treatment conditions. Following treatment, comparisons were made between changes in scores among children receiving the different conditions. Table 21.7 lists treatment-related topics for which published studies have reported use of our instruments. Because research on the treatment of child psychopathology is still in an early stage of development, it has focused largely on determining which interventions work at all for fairly broad classes of problems, such as aggressive behavior, attention problems, and depression. This type of research is clearly needed to determine whether particular treatments are effective even for broad classes of problems and to weed out treatments that are ineffective. However,

as research on child treatment advances,

it should focus increasingly on more

differentiated patterns of problems. Profiles of scale scores make it possible to identify permutations of problems and competencies that may not be adequately captured by broad diagnostic categories. If particular profile patterns are found to be shared by significant numbers of children, children grouped according to these patterns can be compared on variables relevant to treatment planning. For example, cluster analyses of profiles scored from the CBCL/4—18, TRF, and YSR have identified several patterns that characterize substantial proportions of clinically referred children

(Achenbach,

1993;

Achenbach

& Edelbrock,

1983;

Edelbrock

& Achenbach,

1980). Children classified according to these patterns have, in turn, been found to differ significantly on variables relevant to treatment planning, such as teacher ratings, behaviors

TABLE 21.7 Examples of Treatment-Related Topics for Which Studies Have Been Published on the CBCL and Related Instruments

Abdominal pain Anxiety Asthma Attention deficit disorder Colitis Conduct disorder Delinquent.behavior Diabetes Divorce Drug studies Eating problems Encopresis Enuresis Epilepsy Fire setting

Gender problems Headaches Lead toxicity Learning problems Obesity Obsessive-compulsive behavior Oppositional disorder Outcomes of problems Pain Parent-child relationships Parent management training Parental perceptions Parental psychopathology Peer interaction Posttraumatic stress disorder

Psychotherapy Schizophrenia School refusal Seasonal affective disorder Self-concept Self-esteem Separation Sexual abuse Sleep disturbance Stress Suicide Teacher perceptions Temperament Tourette syndrome Treatment

534

ACHENBACH observed in school, and scores on ability and achievement tests (Achenbach,

1993; Mc-

Conaughy et al., 1988).

CLINICAL APPLICATIONS In planning treatments for individual children, data from multiple sources should be compared to identify areas of agreement and disagreement. Disagreements between different sources should not be regarded as errors, but as clinically informative. For example, if profiles scored from CBCLs completed by a child’s mother and father show major disagreements, the parents can be interviewed to determine the possible reasons. If the clinician judges the parents to be sufficiently sophisticated, they can be shown the profiles scored from CBCLs. Interviews may reveal reasons such as the following for discrepancies between reports by a mother and father: one parent has too little contact with the child to be a good informant; only one parent sees the child in certain situations where the problems occur; one parent provokes problem behaviors by the child; the parents differ in their thresholds for noticing or reporting particular behaviors. If a teacher’s report is quite discrepant from reports by other informants, a classroom observer can complete the DOF to determine whether what the teacher reports is also evident to the observer. The DOF may confirm that the child’s behavior is markedly different in the teacher’s class than elsewhere, or may indicate that what the observer sees differs from what the teacher reported. If a discrepant report by a particular informant appears to reflect idiosyncracies of that informant’s view of the child, that informant’s view of the child may be chosen as an important focus of treatment. Similarly, if the behavior of a particular informant toward the child appears to provoke problems not seen in interactions with others, that informant’s behavior may be targeted for change. After the clinician has determined which problems are general across situations and which are specific to a particular situation, treatment can be planned to deal with the syndromes and/or competencies that indicate the greatest need for help. If profiles scored from multiple informants agree in showing deviance on several syndromes, interventions should be designed to change broad aspects of the child’s functioning across multiple situations. On the other hand, if the data indicate different types of problems in different contexts, such as home and school, then different interventions may be needed for each context. The particular configuration of problems found for a child provide an important basis for planning the types of interventions, as well as for selecting the contexts for intervention. For example, two children might both have scores in the clinical range on the Aggressive Behavior scale. However, a different treatment plan would be appropriate for a child who also scores in the clinical range on the Anxious/Depressed syndrome, but in the normal range on the competence scales than for a child who is not deviant on the Anxious/Depressed syndrome, but is in the clinical range on the competence scales.

USE WITH OTHER EVALUATION DATA Table 21.1 listed examples of other data that typically may be obtained in the comprehensive evaluation of children. Such data include developmental histories, family‘ background, school records, cognitive test results, physical assessment, and—for adolescents—tests of

self-concept and personality. All of these types of data may be relevant to determining what interventions are feasible and desirable for particular clients. According to our multiaxial approach, the variations in children’s functioning that are manifest in different assessment x

*

21

CBCL AND RELATED INSTRUMENTS

535

procedures may argue for a variety of interventions to address different problems in different

contexts.

FEEDBACK REGARDING ASSESSMENT FINDINGS In the assessment of children, adults such as parents and school personnel are usually the consumers of assessment findings. The form in which the findings should be conveyed depends on the relationship between the clinician and consumer, as well as on the consumer’s level of sophistication about standardized psychological assessment. If a clinician is working directly with fairly sophisticated parents, the profiles scored from their CBCLs can be shared with them. Within the limits of confidentiality assurances given to the informants, profiles scored

from TRFs,

DOFs,

and CBCLs

may

be shared

with those who

oversee

special

education to help them plan educational interventions. Because

children and youths,

as well as adults, should be assured of confidentiality,

completed SCICs and YSRs and their profiles should be carefully protected from family and teachers. No scored profiles from any of our instruments should be provided to classroom teachers, nor should completed forms or scored profiles be placed in students’ school records where teachers, office workers, and even other students might see them.

LIMITATIONS/POTENTIAL PROBLEMS

IN USE

Users of our instruments are admonished not to label individual children in terms of their scores on profile scales, nor to equate scale scores with diagnoses. Many labels, administrative categories, and diagnostic terms are applied to children without adequate empirical support. Our approach is to obtain empirical data on specific problem items from informants who interact with children in various contexts, to aggregate the data into scales, and to norm the scales as a basis for judging the degree of deviance indicated by a particular informant’s report relative to what is reported by similar informants for large normative samples. We have provided our scales with descriptive labels intended to summarize their content. However, it is not the labels, but the problems actually reported and the comparisons with normative samples, that provide the basis for judging a child’s functioning and need for treatment. The empirically derived scales should be regarded as ways to describe problems

reported for children. The scales are not assumed to represent disease entities or inherent attributes of the children. Furthermore, because all assessment procedures are subject to error, no single procedure or set of scores should be the sole basis for decisions about treatment. Instead, multiple sources of data should be compared to identify findings that are consistent across sources, those that are inconsistent but may validly reflect functioning in particular contexts, and those that may reflect errors of measurement or respondent characteristics that might themselves be in need of change.

Outcome

Assessment

Just as all our instruments can be used to provide baseline assessments, all of them can also

be used to assess outcomes after treatment or no treatment. The CBCL/2—3, CBCL/4—18,

TRF, and YSR can be economically readministered to the same informants who originally

536

ACHENBACH completed them. The SCIC and DOF can be repeated by interviewers and observers after treatment.

EVALUATION AGAINST IDEAL CRITERIA FOR OUTCOME MEASURES Eleven criteria for the assessment of mental health outcome measures have been proposed by Ciarlo, Brown, Edwards, Kiresuk, and Newman

(1986). Table 21 .8 lists the 11 criteria and

evaluates our instruments according to each criterion.

RESEARCH APPLICATIONS For rigorous tests of the effectiveness of particular treatments, appropriate research designs are needed. In group comparison designs, it is desirable to randomly assign similar groups to different treatment and control conditions, via completely randomized or randomized blocks designs. Where more than one type of treatment is plausible for a particular condition, it is desirable

to assign similar groups to Treatment

A, Treatment

B, and a placebo control

condition. Testing the relative superiority of one treatment to another, as well as the superiority of both treatments to a control condition, can be far more

informative than merely

finding that a particular treatment is followed by better outcomes than no treatment. To avoid confounding the assessment of treatment effects with “hello—good-bye” effects or with differences between the durations of different treatments, it is important to select a uniform

interval for the reassessment

of all subjects.

For example,

all subjects

should

initially be assessed at a similar point of entry to the study, before the variations in treatment can have effects. They should then be assessed again at a uniform interval after the initial assessment. If a treatment is planned to last no more than 4 months, the first outcome assessment might be scheduled 6 months after the initial assessment, to avoid “good-bye”

effects associated with termination of treatment. Follow-up assessments could be scheduled for 12, 18, and 24 months after the initial assessment to track longer term effects of treatment. If treatment groups are selected for deviance, regression toward the mean may affect all kinds of subsequent assessment. That is, subjects who have very deviant scores at one point in time are likely to have less deviant scores when reassessed, because deviant scores are partly a function of random influences that will not all operate in the same direction on subsequent occasions. Even in nondeviant groups, interviews, questionnaires, and checklists

tend to show fewer problems on a second administration shortly after the first (Achenbach, 1991b; Edelbrock et al., 1985; Evans, 1975; Miller, Hampe, Barrett, & Noble, 1972). This practice effect (Milich, Roberts, Loney, & Caputo, 1980) has been found even in diagnostic

interviews designed to obtain lifetime diagnoses, where it is illogical to expect lifetime diagnoses to decline from one occasion to another (Robins, 1985). Although the exact reason for declines in reported problems is not known, its pervasiveness, plus regression of deviant scores toward the mean, argue strongly for applying exactly the same assessment sequences to groups receiving different treatment conditions. Even if all

groups show declines in problems, comparisons among them can test whether the declines are significantly greater for some groups than for others. For mental health interventions that lend themselves to single subject, ABAB, or multiple baseline designs, it is important to have enough replications over different subjects to ensure

that treatment effects are robust across individuals. Although treatment of an individual may

TABLE 21.8 Evaluation Against Ideal Criterial for Outcome Measures ae

EE

eS

PE

ae

ea

Be a

Se,

Criteria

5SE

ee

ee

eee

Comments

Sa NB a See

ed

eee

ee

1. Relevant and appropriate to client groups

Items and scales were developed on children referred for mental health and special education services for diverse problems seen in diverse settings, including both genders, multiple ages, different ethnic groups, and all SES levels.

2. Simplicity and uniformity of procedures

The same standardized profiles are usable in all done by clerical workers, obtain observational data the SCIC.

3. Clear and objective referents

Quantitatively scored descriptive items are summed to yield

forms, scoring procedures, and relevant settings. Scoring can be although training is necessary to via the DOF and interview data via

scale scores that are converted to standard scores based on

normative samples. 4. Reflect the perspectives of relevant participants

Instruments are designed to obtain analogous data from parents, teachers, self-reports, direct observation, and clinical interviews.

5. Data on how treatments produce

Because the instruments are designed for many different applications, they do not focus on specific treatment processes. However, comparisons among the diverse scale scores and informants can reveal differential treatment effects.

effects

6. Psychometric adequacy

Table 21.6 summarizes reliability and validity data. Manuals present further details, plus data on distributions of item and scale scores in normative and clinical samples, longer term stabilities, standard deviations, standard errors, coefficient alpha, and correlations among scales. Instruments can be completed by informants blind to treatment conditions. Studies listed by Brown and Achenbach (1993) have demonstrated sensitivity to treatment-related change.

7. Inexpensive

Forms cost 32 cents each. Can be hand scored or computer scored on site in a few minutes by clerical workers. IBM and Apple microcomputer programs provide unlimited runs.

8. Understandable

and sensible

Items use ordinary descriptive language understandable without special training. Scales have descriptive titles to summarize content. Scores are displayed on easily understood profiles.

9. Quick, easy feedback and norms

Profiles and designation of normal, borderline, and clinical range provide quick, easy, norm-based feedback. Scores are displayed in relation to percentiles for normative samples in addition to raw scores and T scores. Manuals display means, standard deviations, and standard errors of scores for demographically matched referred and nonreferred samples.

10. Useful in clinical service

Instruments are designed for routine clinical assessment of individual cases in diverse settings. Data can be used in evaluation reports, treatment planning, and outcome assessments. Forms and profiles provide documentation for case records.

11. Compatible with diverse theories

The standardized descriptive data are compatible with virtually all theories and treatments.

@Criteria are from Ciarlo, Brown, Edwards, Kiresuk, and Newman (1986).

537

538

ACHENBACH be deemed successful if the individual improves, research on treatment effectiveness should produce knowledge that is generalizable to many individuals. Even when single subject designs are optimal for testing efficacy, findings must be replicated on new individuals to ensure that they are generalizable. All of the competence, adaptive functioning, syndrome, Internalizing, Externalizing, and total problem scales of our instruments can be used for assessing outcomes. For individuals, profiles obtained before and after treatment can be compared visually. For statistical analyses of groups randomly assigned to different treatment conditions, appropriate statistics include repeated measures analyses of variance (ANOVAs) and multivariate analyses of variance (MANOVAs) to, compare pretreatment with posttreatment scores obtained by each group. Group profiles can also be constructed from the mean syndrome scores obtained by all members of one group for comparison with the mean syndrome scores obtained by all members of another group. The competence scales of the CBCL/4—18 and YSR and the academic performance scale of the TRF may be less likely than problem scales to detect short-term changes in response to treatment. This is because they encompass characteristics that are apt to take longer to change, such as involvement in activities and organizations, number of close friends, and academic

achievement.

However,

individual competence

items, such as how well a child

gets along with parents, siblings, and other children, can show changes over relatively brief periods.

PREDICTION OF OUTCOMES FROM CASE CHARACTERISTICS Besides controlled studies of treatment effects, our instruments can be used to determine

whether initial case characteristics predict differences in outcomes. This can be done for untreated as well as treated cases. As an example, we have carried out 3- and 6-year followups of over 2,000 children who were initially assessed in a national home interview survey. Some of the children were identified as deviant on various empirically derived syndromes. Outcomes were tested for children who were deviant on each syndrome in conjunction with a variety of family variables and other factors, such as stressful experiences and receipt of mental health services (McConaughy,

Stanger, & Achenbach,

1992; Stanger, McConaughy,

& Achenbach, 1992; Stanger, Achenbach, & McConaughy, 1993). The outcome variables included scores on the CBCL/4—18, TRF, and YSR, our young adult parent and self-report forms (YABCL, YASR), and reports of suicidal behavior, referral for mental health services,

contacts with the police, academic and behavioral problems in school, and parents’ judgment that the child needed additional professional help for behavioral/emotional problems. The results are too complex to present here, but can be found in the reports of the 3-year followup cited earlier, plus forthcoming reports of the 6-year follow-up.

In a parallel study, we are doing long-term outcome assessments of 2,000 children who were referred to our child psychiatry service. This study is designed to test the power of

CBCL/4—18 scores and other case characteristics to predict outcomes according to follow-up CBCL/4—18,

YABCL,

TRF, YSR, and YASR

scores, as well as other outcome data. Be-

cause so little is known about which children really need treatment, which ones have poor long-term outcomes even after treatment, and which ones would have good outcomes with-

out treatment, studies of this sort can help determine where to focus efforts to improve treatment.

21

CBCL AND RELATED INSTRUMENTS

539

CLINICAL APPLICATIONS For the clinical evaluation of treatment in individual cases, it is always useful to readminister the assessment instruments periodically to detect unexpected and hoped-for changes. Because it is designed to obtain 10-minute samples of behavior, the DOF can be used multiple times at any point during or after treatment. Although the SCIC also samples functioning within a relatively brief interval of time, it obtains the child’s reports of functioning over a longer period than spanned by the interview itself. The CBCL/4—18 and YSR both specify a 6-month rating period for behavioral/emotional problem items, whereas the CBCL/2-—3 and TRF specify a 2-month rating period. These periods can be shortened if desired to permit outcome assessments over shorter intervals. However, because the instruments are designed to assess fairly stable aspects of functioning, sufficient time must be allowed between assessments and between the onset of interventions and the outcome

assessment for changes in the child’s functioning to occur, stabilize, and

become evident to the informants. The standard 2-month rating period for the CBCL/2-3 is keyed to the rapid rate of change in young children, whereas the 2-month rating period for the TRF is designed to make multiple ratings possible within a single academic year. These 2-month intervals can be shortened to as little as 1 month for individual outcome evaluations, although the 2-month

rating period is likely to provide a more stable index of functioning. The standard 6-month rating period for the CBCL/4—18 and YSR can be shortened to as little as 2 months for individual outcome evaluations, although longer periods are more desirable. Because they depend on cumulative accomplishments, the competence items may be less sensitive to short-term changes than the problem items. However, to gain a full picture of the impact of treatment on competencies as well as on problems, outcome assessments should include follow-ups over a period of 1 or more years whenever possible. In the clinical evaluation of outcomes for individual cases, users can estimate the statistical significance of changes in scale scores on the basis of the standard error of measurement for the relevant scale. The manuals display the standard error of measurement for each scale for referred and nonreferred children in each gender/age group for which the scales are normed. For example, suppose that a 10-year-old boy referred for mental health services

initially obtained a raw score of 21 on the Externalizing scale of the CBCL/4—18. At the outcome assessment, the boy’s score dropped to 11. As shown in the manual for the CBCL/4—18 (Achenbach, 1991b), the standard error of measurement is 3.4 for clinically referred 10-year-old boys on the Externalizing scale. The decline of 10 points from 21 to 11 is thus equal to 2.9 standard errors, which would be considered statistically significant at p


zs co

69

0s>

eBuey |BuON

1dWOOD JLVNOS /SNOIXNY

811%86

WejqoidajJO1d 40}

vb

LHONOHL SW3IT8Oud

Paloos-pueH 4H,

se

_ OL Ponournrs

‘plz

b b-S St

St 9e

‘Os

OL

Ze

= SLNIV

bL-S 81-2ob ge

G3ISsaudjIG_

JOV 6€

SWI18Oud

81-2

”"

542 ov

u i]

Buizyjeusajuy -S sb Bk-2E_ =

bb-S 8-Zhov POnowrons

SWI718OUd NOILNJLLY

TWIDO0S

NMVUGHLIM

9

iSees OL

Mm

4

at

8 S@ l@

ep St ey

yay ly lv

0

t

09:a j

pjO-se9A-G “BOW

0Z-8b

z

s

.

t

9

LOE

Zb-Ob SL $9

St ez OL

HA LN3INONITIA JAISSIYOOV YOIAVH3E

0

p1é$$

,ed vt ‘

-§ ———6-el —2b91— ——------

ze

ib

3

oy

6z 8% SL

ve se

ge 6&

6¥ 67

Burzijeusa}x3

ee Sete ee 5 -ae

9 4 9%

6Z v4 8%

6z oe 8

OL ee ve

se eb Ze 58

eb 6E ov

06 (47 vL ay

St vy

SP 8y Zt

00L os BL os 8L

l 81-2 bL-S 8t-Zh bb-S

0z

Le

MIA

0

YOIAVH3a

t



v

0S

08

56

VIIA

eM

ty

a

bL-OL ee

eweNn

JOV

bk-b

3| 86 19 0

wo

oD a

4LN4ANONI13G YOIAVH3a

+

bas co

a wo

pjo-seeA-G "BO

£b —~—— gb —-=——=—

Wajqoidajijoud 40}

9

IA

Hi

$9

JAISSIYINOV YOIAVH3Ad

bE 09 or

peioos-pueH 7OgD

Z eb

‘OI “S*LZ

6 8

Zt 6L-8L

IIA

0

ce x4

Sl

eae ae 02

56

Wi

L

ON

NOILNFLLV SW3I18d0ud

1S

+

IA

Pe

on

LHONOHL SW3I7180ud

Ss

On

veCl £%

A

eS bkiz08 9% Sc

TWIDOS SW3I180ud

Sa

D>

8%

Al

a

ro Nw ze te G8 of 6z

/SNOIXNV g4assiu¥d3iag

ey

ON AN

ee06

UT] JILVWOS SLNIVIdWOD

ee

W9/QOld SO[EIS 8l-Zh wr © Buizjeusayxz N NN 9f Se

Wl

Bulzijeusayuy a ee

SAID —

Be

Zt

Sl-v/1DGD 9[!JO4d 10} 8k-ch 001 We 6€ SS 0S

IaZowen

LE6L BD J

8h-éh

!

NMV¥YCGHLIM

S uw W

obuey jeWwON

543

544

ACHENBACH

had occurred during the week. The therapist worked with Erica to highlight ways in which

she may have contributed to the situations and how she might handle them differently. The therapist also worked on reinforcing Erica’s interest in science and encouraged Erica to apply her considerable cognitive ability to science-related school work. In addition, the therapist met periodically with Erica’s teacher to obtain feedback on Erica’s behavior and to instruct the teacher on reinforcing adaptive social and academic behaviors. The treatment continued for the 6 months remaining in that school year. During this period, Erica became more outgoing and developed a friendship with a classmate, but she also showed outbursts of temper in the classroom. In the following school year, the therapist met with Erica’s fifth-grade teacher to encourage continued reinforcement of specific social and academic behaviors. The therapist also met with Erica a few times during the year. Figure 21.6 shows profiles scored from follow-up TRFs completed by two of Erica’s seventh-grade teachers who had not been involved in any intervention efforts. (The profiles are both drawn on one hand-scored TRF profile form.) Although social problems were still reported, the total problem scores and all scale scores, including academic performance and adaptive behavior (not shown in Fig. 21.6), were now in the normal range.

Figure 21.7 shows the profile scored from a CBCL completed by Erica’s mother at this

time. The CBCL total problem score remained in the clinical range, but it and the previously deviant syndrome scores were now considerably lower. The case of Erica briefly illustrates some of the ways in which our instruments can be used to identify specific targets for intervention. The choice of treatment for dealing with the identified problems was determined partly by the feasibility of a school-based intervention and by obstacles to a family-based intervention. The fact that problems of withdrawal and interpersonal relationships were similarly evident to the teacher, parent, and direct observer made these appropriate targets for treatment, no matter where the treatment was based. The follow-up TRFs indicated that Erica’s school behavior was well within the normal range by seventh grade. The follow-up CBCL indicated improvement, but still some problems in the clinical range, as seen by Erica’s mother. If direct work with the family had been feasible, perhaps the problems reported by Erica’s mother would have shown more improvement. However, comparisons of profiles from Erica’s teachers and mother highlighted the differences between the outcomes

at school, where treatment was focused,

and in the family,

where treatment had not been feasible.

Divergent/Conflicting Data from Multiple Sources Our multiaxial empirically based approach is designed to take account of the fact that multiple sources of data are needed for the comprehensive assessment of children. This approach emphasizes that multiple sources do not necessarily converge on a single diagnostic construct or disease entity. Discrepancies between data from different sources do not mean that one source is right and another is wrong, or that the data are unreliable. Instead, different sources may reliably and validly yield different pictures of a child’s functioning in different contexts, as perceived by different informants. Our approach is designed to enable users to explicitly compare data from multiple sources, to identify the specific agreements and disagreements, to quantify the overall level of agreement, and to compare this with the level of agreement obtained in relevant reference samples. For both clinical and research purposes, the obtained level of agreement may be an important variable. For example, children for whom there is high agreement among all Ra

COoar

LHONOHL SWI180U"d

NOILNFLLV SW3I180Ud

1A LN3INONITIA YOIAVHIG

09

Bujzyjeusajxz Bulzijeusajuy Ss = 8h-ch BLUES be-S—Bh-@k BE-@L Bh-ZEsdAE-S BE-ZEsA-S EEG) AAS) SS

onrt

dn-mojjO} S4yt40} BOW ye ee “|| UByO1g oul] -Ipul SeyedipU! SOY SolWOU0De s,sayoee}‘sBulyes

A

S9

WIS0S SW3I1890Ud

Sl

‘Ol ‘9'LZ SEIHO1g Pe1ODSWO OM} Saye BOUEIDS S,J9YOeS}‘sHuljyes pIjOs Sul)

Al

9% L@

/SNOIXNY G3asszudIG

82

DILVHOS SLNIVIdWOD =

56

08

0

bh-S

LS

SO/BIS W9]qOld — Sd! 10} 9[!fO1d 4Y1 66h

DV

001 1

euen Bb-7k

f= geet 58 0S

HA YOIAVH38 JAISSIYOOV

8b-Zh

NAMVUOHLIM

eBuey jeuON

545

bL-p

86 aII% Al TWID0S

A

oO a

zs co

LHINOHL

as

IA

8L-v/1DGD W9/qOld SOJEIS 9[JOld 40} SHID L66L —

a> wo

oss

ebuey jewO0ON

Os

HA

v

MIA JAISSIYDOV

BHuszieusayx bb8b-cb

LN3INONITIG

8h-cl

0S

bk-¥

BOUZye Be "L,

NOILNILLV

Sr] |

SS

dn-mo}jo}JOGO 10)

4

s

09

aIJOlg pasoosWoy

/SNOIXNV

|

$9

“Old *Z'1Z

UH]

|

08

JILVWOS

ay

58

i

Buizieusajuy A]

06 02

8h

546 56 $2

WS IAZ owen

4dV

001

g3ssjaudid SW3I180ud SWI180ud SW3180ud YOIAVH3I98 SLNIVTIAWOD YOIAVH3Id

ra

|

NMVUYGHLIM

21

CBCL AND RELATED INSTRUMENTS

547

informants may be less affected by contextual variations than children for whom there are large differences among informants’ reports. On the other hand, if data from one informant are highly discrepant from data obtained from all other sources, and if that informant’s reports are not verified, that informant’s perceptions may be targeted for change. Both consistencies and discrepancies among multiple sources should be regarded as potentially informative. In dealing with individual children, the clinician’s job is to synthesize from multiple sources an understanding of the case in all its complexities and to formulate treatment plans that take account of both the consistencies and discrepancies. Comprehensive assessment may often yield a mosaic of pieces from which a complex picture is constructed, rather than a single, seamless image.

For the researcher, the task is to use multiple kinds of data to construct generalizable knowledge that can be applied to many new cases, despite all the variations that make each bit of data and each case in some respects unique. The specific ways in which multisource data are used will depend on the research questions to be answered. For some purposes, data from each source should be analyzed separately to determine whether similar conclusions are supported by all sources. For other purposes, a taxonomic decision tree can be used to identify cases for which each of several possible combinations of agreement and disagreement occurs (Achenbach, 1993). For still other purposes, scores from multiple sources can be aggregated by converting to z scores within the samples from each source and averaging

the z scores across sources.

Summary and Conclusions Multiaxial empirically based assessment is an approach to the assessment of children’s problems and competencies that obtains empirical assessment data from multiple sources. The sources relevant to most children include parents, teachers, cognitive and achievement tests, physical examinations,

and direct assessment

of the child,

such as observations,

interviews, and self-reports. This chapter described instruments for obtaining standardized data on child and adolescent problems and competencies, including the CBCL/2-3, CBCL/4—18,

TRF, YSR, DOF, and SCIC, plus the YABCL

and YASR

for young adults.

Syndromes of co-occurring problems have been derived empirically via principal components/varimax analyses of scores obtained by clinically referred children on each instrument. Large general population samples have been used to norm profiles of scales for scoring the syndromes, Internalizing, Externalizing, total problems, competencies, and adaptive behavior. Eight cross-informant syndromes are similarly scorable from parents’ CBCL/4—18 rat-

ings, teachers’ TRF ratings, and adolescents’ YSR ratings. A cross-informant computer program enables users to enter data for the same subject from mother, father, teacher, and

self-ratings, and prints out separate profiles scored from each informant’s data. The program also displays the item scores and scale scores from all informants side by side, computes Q correlations to provide a quantitative index of agreement between informants, and displays the corresponding Q correlations for large reference samples. These features enable the clinician to identify specific agreements and disagreements among informants and to judge how the overall level of agreement compares with that

typically found for similar combinations of informants. This information, in turn, provides a basis for planning interventions in relation to different contexts and interaction partners.

548

ACHENBACH

All the empirically based instruments can be used to obtain baseline data for treatment planning, reassessments to monitor changes over the course of treatment, and outcome data

for evaluating the effects of treatment. Considerations relevant to clinical and research applications were presented. Because the empirically based procedures are designed to explicitly display variations in data from different sources, the inconsistencies as well as the consistencies between findings from different sources are useful for both clinical and research purposes.

References Achenbach, T. M. (1966). The classification of children’s psychiatric symptoms: A factoranalytic study. Psychological Monographs, S0(No. 615). Achenbach,

T. M.

(1978).

The Child Behavior

Profile: I. Boys aged 6-11. Journal of Consulting and Clinical Psychology, 46, 478-488. Achenbach, T. M. (1990a). Young Adult Behavior Checklist. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T. M. (1990b). Young Adult SelfReport. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T. M. (1991a). Integrative guide for the 1991 CBCL/4—-18, YSR, and TRF profiles.Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T. M. (1991b). Manual for the Child Behavior Checklist/4—18 and 1991 Profile. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T. M. (1991c). Manual for the Teacher’s Report Form and 199] Profile. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach,

T.

M.

(1991d).

Manual

for

the

Youth Self-Report and 1991 Profile. Burlington, VT: University of Vermont, Department of Psychiatry. _ Achenbach, T. M.

(1992).

Manual.for the Child

Behavior Checklist/2—3 and 1992 Profile. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T. M. (1993). Empirically based taxonomy: How to use syndromes and profile types derived from the CBCL/4—18,

TRF, and

YSR. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T. M., & Edelbrock, C. (1983). Manual for the Child Behavior Checklist/4—

16 and Revised Child Behavior Profile. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T. M., & Edelbrock, C. (1986). Manual for the Teacher's Report Form and Teacher Version of the Child Behavior Profile. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T. M., & Edelbrock, C. (1987). Manual for the Youth Self-Report and Profile. Burlington,

VT:

University of Vermont,

De-

partment of Psychiatry. Achenbach, T. M., McConaughy,

S. H., & How-

ell, C. T. (1987). Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin, 101, 213— PB. American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disorders

(3rd ed.).

Washington,

DC:

Author.

American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd ed. rev.). Washington, DC: Author. Brown, J. S., & Achenbach, T. M. (1993). Bibliography of published studies using the Child Behavior Checklist and related materials: 1993 edition. Burlington, VT: University of Vermont, Department of Psychiatry. Ciarlo, J. A., Brown, T. R., Edwards, D. W., Kiresuk, T. J., & Newman, F. L. (1986). Assessing mental health treatment outcome measurement techniques. DHHS Pub. No. (ADM)86-1301. Washington, DC: U.S. Gov-

ernment Printing Office. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Academic Press. Costello, A. J., Edelbrock, C., Dulcan, M. K.,

21 Kalas, R., & Klaric, S. H. (1984). Report on the Diagnostic Interview Schedule for Children (DISC). Pittsburgh, PA: University of Pittsburgh, Department of Psychiatry. Edelbrock, C., & Achenbach, T. M. (1980). A typology of Child Behavior Profile patterns: Distribution and correlates for disturbed children aged 6—16. Journal of Abnormal Child Psychology, 8, 441—470. Edelbrock, C., Costello, A. J., Dulcan, M. K., Kalas,

R.,- &

Conover,

N.

C.

(1985).

Age

differences in the reliability of the psychiatric interview of the child. Child Development, 56, 265-275. Evans,

W.

R. (1975).

The

Behavior

Problem

Checklist. Data from an inner city population. Psychology in the Schools, 12, 301-303. Goyette, C. H., Conners, C. K., & Ulrich, R. F. (1978). Normative data on revised Conners Parent and Teacher Rating Scales. Journal of Abnormal Child Psychology, 6, 22\1-—

236.

:

Helzer, J. E., Canino, G. J., Yeh, E. K., Bland,

R. C., Lee, C. K., Hwu, H. G., & Newman, S. (1990). Alcoholism—North America and Asia: A comparison of population surveys with the Diagnostic Interview Schedule. Archives

of General

Psychiatry,

Kazdin, A. E., Esveldt-Dawson, H., & Unis, A. S. (1987).

47, 313-319. K., French, N. Problem-solving

skills training and relationship therapy in the treatment of antisocial child behavior. Journal of Consulting and Clinical Psychology, 55, 76-85. McConaughy, S. H., & Achenbach, T. M. (1990). Guide for the Semistructured Clinical Interview for Children Aged 6-11. Burlington, VT: University of Vermont, Department of Psychiatry. McConaughy, S. H., Achenbach, T. M., & Gent, C. L. (1988). Multiaxial empirically. based assessment:

Parent,

teacher,

observational,

cognitive, and personality correlates of Child Behavior Profiles for 6—11-year-old boys. Journal of Abnormal Child Psychology, 16, 485-509. McConaughy, S. H., Stanger, C., & Achenbach, T. M. (1992). Three-year course of behavioral/emotional problems in a national sample of 4- to 16-year-olds: I. Agreement among informants. Journal of the American Academy of Child and Adolescent Psychiatry, 31, 932940.

CBCL AND RELATED INSTRUMENTS

Milich, R., Roberts, M., Loney, J., & Caputo, J. (1980). Differentiating practice effects and Statistical regression on the Conners Hyperkinesis Index. Journal of Abnormal Child Psychology, 8, 549-552. Miller, L. C., Hampe, E., Barrett, C. L., & Noble, H. (1972). Test—retest reliability of parent ratings of children’s deviant behavior. Psychological Reports, 31, 249-250. Quay, H. C., & Peterson, D. R. (1987). Manual for the Revised Behavior Problem Checklist. Coral Gables, FL: University of Miami, Department of Psychology. Reed, M. L., & Edelbrock, C. (1983). Reliability and validity of the Direct Observation Form of the Child Behavior Checklist. Journal of Abnormal Child Psychology, 11, 521-530. Reich,

W.,

&

Welner,

Z. (1989).

DICA-R-C.

DSM-III-R version. Revised version of DICA for children ages 6-12. St. Louis, MO: Washington University, Department of Psychiatry. Richman, N. (1977). Is a behaviour checklist for preschool children useful? In P. J. Graham (Ed.), Epidemiological approaches to child psychiatry (pp. 125-136). London: Academic Press. Robins, L. N. (1985). Epidemiology: Reflections on testing the validity of psychiatric interviews. Archives of General Psychiatry, 42, 918-924. Shaffer, D., Fisher, P., Piacentini, J., SchwabStone, M., & Wicks, J. (1989). Diagnostic Interview Schedule for Children-2. New York: Columbia University, Division of Child Psychiatry. Shaffer, D., Schwab-Stone, M., Fisher, P., Davies, M., Piacentini, J., & Gioia, P. (1988). A revised version of the Diagnostic Interview Schedule for Children. New York: Columbia University, Division of Child Psychiatry. Stanger, C., Achenbach, T. M., & McConaughy, 'S. H. (1993). Three-year course of behavioral/emotional problems in a national sample of 4- to 16-year-olds: III. Predictors of signs of disturbance. Journal of Consulting and Clinical Psychology, in press. Stanger, C., McConaughy, S. H., & Achenbach, T. M. (1992). Three-year course of behavioral/emotional problems in a national sample of 4- to 16-year-olds: II. Predictors of syndromes. Journal of the American Academy of Child

950.

and Adolescent

Psychiatry,

31,

941-

549

Chapter 22 Conners Rating Scales C. Keith Conners Duke University Medical Center

The parent and teacher scales, now collectively known as the Conners Rating Scales (CRS), originated at the Harriet Lane Clinic of the Johns Hopkins Hospital in the 1960s. The original intent of the scales was to provide a comprehensive checklist of behavior problems commonly noted by parents and teachers of school-aged children referred for outpatient psychiatric treatment. The scales were part of a comprehensive multidisciplinary evaluation. The parent scales often formed the basis of a detailed interview about the children’s problems with the

parents.

Overview of the Conners Rating Scales: Summary of Development MAIN FUNCTION AND PURPOSE OF THE ORIGINAL SCALES The original version of the parent scale contained 78 items grouped under 24 “problem behavior” headings, such as Problems Sleeping, Problems Eating, Problems with Temper, Problems Keeping Friends, Problems in School, and so on. This author and colleagues later

added a category of Additional Problems. These items included 15 of the cardinal characteristics for hyperkinetic, impulsive, and inattentive children. A separate 39-item teacher checklist provided behavioral and academic information from the school setting. The teacher form included 21 items related to Classroom Behavior, 8 items related to Group Participa-

tion, and 10 items related to Attitude Toward Authority. At that time, I was carrying out controlled trials of medication, psychotherapy, and brief

consultation for children with behavior and learning problems. Therefore, it seemed appropriate to investigate the use of the checklist as measures of drug treatment and psychotherapy outcomes. The first study on the teacher rating scales (Conners, 1969) found adequate test—

550

:

22

CONNERS

RATING SCALES

551

retest reliability (0.72—0.91) over a 1-month period. There was good sensitivity to the drug treatment effects for the five scales emerging from a factor analysis of 103 drug-study patients. The original purpose of measuring therapeutic drug effects has been the most common use of the scales. The National Institute of Mental Health (NIMH) adopted the scales as part of their Early Clinical Drug Evaluation Unit protocol (National Institute of Mental Health, 1973), furthering the perception that they were primarily measurement tools for psychopharmacologic treatment research. Because most pediatric psychopharmacology involved hyperactive children, the scales gradually assumed the role of hyperactivity scales, despite their inclusion of a wide range of child behavior problems. However, the scales have found a general place in the assessment literature as well as in treatment outcome studies for nondrug therapies. Numerous

studies have examined correlates of the scales, as well as their varia-

tions among different cultures. Although the teacher-and parent scales proved quite useful for pre- and posttreatment measures in typical drug trials, it became apparent that more frequent monitoring of children at home and school was desirable in many studies, and that informants generally were reluctant to fill out long forms more frequently than weekly or monthly. Therefore, I decided to use the 10 most highly loaded items from the factor analyses of parent and teacher scales as a composite scale for more frequent monitoring. The abbreviated scale soon became known as the hyperactivity index, because it proved highly successful in distinguishing hyperactive children from anxious and nonclinic children. However, its original intent was to provide a general psychopathology index.

RATIONAL VERSUS THEORETICAL PERSPECTIVE As noted, I grouped scale items under Problem headings, making it clear that groups of items under each heading were part of a related set. These headings undoubtedly influenced the respondents to see certain items as part of a factor. Some of the items probably would load differently in factor analyses if the rational category headings were absent. For example, under Temper, two of the items are “Throws himself around” and “Pouts and sulks.”! It is doubtful that either of these items would mean the same thing if not identified clearly as part of a temper problem group. Although factor analyses made it feasible to use factor scores instead of the rational headings, many clinical studies continued to use total symptom scores, or did not score the scales at all. Authors often found the rational problem groupings sufficiently useful for

clinical purposes, whereas researchers working on diagnosis or classification relied more on the empirically derived and continuously distributed factor scores.

DESCRIPTION The CRS are symptom checklists with a 4-point Likert format. Parents and teacuers rate cach item on a 4-point scale of how much a particular behavior has been a problem in the past month: not at all, just a little, pretty much, and very much. The CRS currently consist of a

1From Manual For Conners Rating Scales (p. 13) by C. K. Conners,

1989, Tonawanda,

Systems. Copyright (1989) by Multi-Health Systems. Reprinted by permission.

NY. Multi-Health

552

CONNERS

93-item Parent Rating scale (PQ-93), a 39-item Teacher Rating scale (TQ-39), a 48-item Parent scale (PQ-48), and a 28-item Teacher scale (TQ-28). The latter two scales from a later revision are not a perfect subset of the earlier versions, although scale content and wording are very similar to earlier versions. Items are single attributes (e.g., destructive) or brief phrases (inattentive, easily distracted). Scoring is based on summing the unit weights of individual items according to the factor structure of the scales. All forms are quick-score forms, in which choices carry through to a second page for adding columns to obtain raw scores. The reverse side of the page provides a table for translating raw scores to T scores. The factor analyses of the various scales retained items with loadings of 0.40 or greater. This procedure results in some built-in correlation of factor scales containing loadings on more than one factor. For example, several items load both the PQ Conduct Disorder (CD) and Hyperactivity (HA) factors. The use of unit-weighted items in factor scoring has led to some confusion over the question of whether hyperactivity and conduct disorders are independent nosologic entities because of the typically high correlation between CD and HA factors. However, using factor-score coefficients instead of unit weights demonstrates that the overlap between CD and HA is in fact quite negligible, with a Pearson r = 0.20 (see Blouin, Conners, Seidel, & Blouin, 1989, for an empirical demonstration of this point). It

was felt that the simplicity of scoring, as well as the tendency of coefficients to be unstable across samples, justified the use of unit weights. Because the 10-item abbreviated scale contains the items with the highest loadings on the scale factors, it is not surprising that a factor analysis of the 10-item scale produces more than one factor. Milich, Loney, and Landau (1982) developed the “Iowa Conners Scale” out of

concern that the so-called hyperactivity index contained both aggressive and hyperactive items. Although it certainly is reasonable to differentiate these two conditions, criticizing the abbreviated scale (e.g., Ullmann,

Sleator, & Sprague,

1985) because it contains a mixed

factor structure represents a misunderstanding of its purpose.

NORMATIVE GROUPS The norms for the PQ-93 originally included 316 clinic patients and 365 age, gender, and SES-matched normal children. Parents attending PTA meetings in the city of Baltimore provided the latter data. No claim for the representativeness of the normal sample is possible. However, the original article on the parent scale (Conners, 1970) showed that SES and other demographic factors had little influence on the factor structure. In the original development of the TQ-39, Conners (1969) factor analyzed responses from a clinical sample of 82 boys and 21 girls. Werry, Sprague, and Cohen (1975) replicated the factor analyses in a group of normal children, and subsequently developed norms for the TQ-39 for a New Zealand sample (Werry & Hawthorne,

1976). Trites, Blouin, and Laprade

(1982) developed the most comprehensive norms, using a stratified random sample of 9583 Canadian school children. A stratified random sample of parents interviewed in Pittsburgh provided the norms for the TQ-28 and PQ-48 (Goyette, Conners, & Ulrich, 1978). In that study, 518 mothers and

373 fathers completed the PQ-48. The items in the PQ-48 were the highest loaded items from earlier factor analyses, with a few items slightly reworded for readability. Norms for both the TQ-28 and PQ-48 are included for children from 3 to 17 years of age. Norms are available for various national groups, including Brazil, Hong Kong, Italy, New Zealand, China, Spain, and West Germany (see Conners,

1989, for references). Ry

Types of Norms Available MAJOR DEMOGRAPHIC

CHARACTERISTICS

Data for the TQ-39 include both gender and age norms, in groupings from 3 to 5, 6 to 8, 9 to 11, and 12 to 14 years. The PQ-93 norms vary only slightly as a function of age and SES (Conners,

1970), so that factor score norms combine results for ages 6-12.

Trites et al. (1982) provided norms for 4- to 12-years-olds, separated by gender. Based on their analysis of the 39 items, the Conners Teacher Rating Scale-39 (CTRS-39) includes scales (item-group descriptors) of Hyperactivity (HA), Conduct Problem (CD), Emotional (E), Anxious-Passive (AP), Asocial (AS), and Daydreaming-Inattention (DA).

The TQ-28 includes scales for (a) Conduct Problem, (b) Hyperactivity, and (c) Inattentive-Passive. Normative data for the TQ-28 are based on a study of 383 children, aged 3-17, separated by gender. Initial results on the revision of the TQ-39, the TQ-28, were

presented by Goyette et al. (1978). This short form is not a strict subset of the TQ-39. Instead, the TQ-28 was developed after careful consideration of accumulated evidence on the

psychometric properties of the original versions, and represents a more abbreviated formulation of child behavior problems.

MAJOR SELECTION CRITERIA FOR INCLUSION The Canadian norms from the Trites et al. (1982) studies comprise a total population of all primary schools in the city of Ottawa. The Pittsburgh norms came from a stratified random sample based on census information available at the time. Neither sample excluded children on the basis of known psychopathology or learning problems.

RESTRICTION IN USE OF THE INSTRUMENT As stated in the manual (Conners,

1989), users of the Conners scales should understand the

basic principles and limitations of psychological testing and interpretation. They should be familiar with the standards for educational and psychological testing jointly developed by the American Psychological Association, the American Education Research Association, and the National Council on Measurement in Education. In addition, users should belong to some professional organization that endorses a set of standards for the ethical use of psychological or educational tests, or be licensed professionals in psychology, education, medicine, social work, or an allied field.

In my experience, the most common error of unqualified users of the Conners Scales is diagnosing a child as having a psychiatric disorder, such as Attention Deficit Hyperactivity Disorder (ADHD) or Conduct Disorder, on the basis of scale information alone. I strongly advise users to consider scale results along with other information about a child.

953

Basic Validity and Reliability Information VALIDITY STUDIES Concurrent, Convergent, and Criterion-Related Studies.

Table 22.1 presents a number

of studies correlating the 10-item Hyperactivity Index with various forms of criterion information. Of particular interest are the high correlations with a DSM-III-R rating scale for ADHD (r = 0.92), and the high correlations with the externalizing scales from Achenbach’s Child Behavior Checklist (CBCL). The fact that the 10-item index correlates with the latter scales almost as well as they correlate with themselves, demonstrates the strength that this small set of items has for capturing externalizing psychopathology. It seems clear that the 10item list is as efficient as the entire DSM-III-R criterion set for diagnosing ADHD. Table 22.2 presents validity studies for the Revised Parent and Teacher Scales (PQ-48 and TQ-28). Again, it is noteworthy that the Hyperactivity, Inattentive, and Conduct problem

scores from the scales all correlate with the DSM-III-R rating scale for ADHD. This suggests both high concurrent validity and some lack of discriminant validity for these scales. The Halperin et al. (1988) study showed some discriminant validity with different types of errors on the Continuous Performance Test (CPT). Previous investigations that did not classify the

types of errors typically have shown little or no relationship with teacher ratings of inattentiveness or hyperactivity. Discriminant Validity. The TQ-39 has shown high discriminant validity in differentiating hyperactive children and their normal peers (Homatidis & Konstantareas, 1981). Children diagnosed with Attention Deficit Disorder (ADD) without Hyperactivity according to DSMIII criteria score significantly lower on the Conduct Problem and Hyperactivity factors of the TQ-39 than boys diagnosed with Attention Deficit with Hyperactivity (King & Young, 1982). In a similar study of school-aged boys, the Conners scales effectively discriminated between

children with ADD,

specific learning disabilities, and matched normal controls,

with the PQ-93 outperforming all other measures employed in the study (Kuehne, Kehle, & McMahon,

1987; see also Kazdin, Esveldt-Dawson, & Loar, 1983; Wynne & Brown, 1984).

Dalby (1985) reported that these scales successfully discriminated between ADD boys and developmental reading disordered boys, whereas others have shown the TQ-39 to be highly accurate as a means of identifying psychopathology in children referred for possible placement in classrooms for the emotionally disturbed (Mattison, Humphrey, Kales, & Wallace, 1986; Taylor & Sandberg, 1984). Milich and Fitzgerald (1985) showed that teachers were

able to differentiate between different types of externalizing behaviors in the classroom using the TQ-39. Four of the five factors of the TQ-39 (excluding Anxious-Passive) discriminated between boys brought to juvenile court for truancy and a group of normal controls (Berg, Butler, Hullin, Smith & Tyrer, 1978). The Hyperactivity Index also has been shown to discriminate behavior disordered children from normals (Margalit, abled peers (Wynne & Brown, 1984).

1983) and learning dis-

As part of the process of defining distinctive characteristics of various diagnostic groups, the discriminating power of the CTRS-28 has been investigated in many studies. I cite only a few here.

Stein and O’Donnell (1985) found that children with independently derived diagnoses of Attention Deficit Disorder (ADD) and/or Conduct Disorder (CD) were given higher ratings on the CTRS-28 Hyperactivity factor compared with control children and children with a diagnosable physical disorder. Newcorn et al., (1989) compared children who met criteria for DSM-III diagnosis of ADD and/or the DSM-III-R diagnosis of (ADHD). No group

554



TABLE 22.1 Validity Studies With the Hyperactivity Index

Sample

Authors

Findings

Representative sample of S. F. school children (N = 5212)

Behavior & Temperament Survey correlated 0.89

(1979)

Mixed population, but mainly regular education classroom children

David Hyperkinetic Scale r=0.

Sandoval (1981)

Normal school children

Lambert,

Sandoval,

and

Sassone (1978) Zentall and Barack

Behavior & Temperament

Survey r= 0.89 (N = 672); School Behavior Survey r= 0.76 (N= 95) Prinz, Connor,

and

Wilson (1981)

68 first through third graders deemed most disruptive, and 136 normal controls

Hyperactivity rating on Daily Behavior Checklist r= 0.87;

Aggression rating on DBC r= 0.65 Christie, de Witt, Kaltenbach,

and Reed (1984)

34 children referred for impulsivity/hyperactivity

Safer & Allan’s Classroom Teachers Behavior Checklist r= 0.81; Werry-Peters BRS (parent) r= 0.15; MFFT no sign, correlations. Direct observations of out-of-seat behavior

Horn, Conners, Wells, and

Shaw (1986)

20 inpatient ADHD/Conduct Disorders

r= 0.44

Abikoff Classroom Observation Coding: Interference r= 0.83; Solicitation r= 0.60; Gross Motor r= 0.58; Minor Motor,

NS; Off-task r= 0.51

Reynolds and Stark (1986)

132 fourth through sixth graders in classrooms

No correlation with MFFT

Whalen, Hencker, and Finch (1981)

Hyperactive children in summer treatment programs

Staff: positive correlation with negative incidents, improved handwriting and naming; Teacher: positive correlation with same as above except handwriting

Whalen, Henker, Collins, Finck, and Dotemoto (1979)

Hyperactive boys in quasinaturalistic classroom settings

Correlation with direct observation

of discrete

behavioral acts and verbalizations; disruptive, offtask inattention Edelbrock and Rancurello (1985)

104 “disturbed” boys

CBCL CBCL CBCL CBCL

AGG N-O INA EXT

r= 0.82; r= 0.60; r= 0.58; r= 0.87

Newcorn et al. (1989)

85 predominantly Black and Hispanic children

DSM -Ill-R Rating Scale:

r= 0.92

555

556

CONNERS TABLE 22.2 Validity Studies of TQ-28 and PQ 48

Authors

Edelbrock & Rancurello

Sample

104 “disturbed” boys

(1985)

Findings

CBCL AGG with TQ-28 CD r = 0.90; CBCL

N-O

with TQ-28 HA r = 0.62; CBCL AGG with TQ-28 HA r = 0.83; CBCL INA with TQ28 I-P r = 0.76; Total TQ-28 with Total CBCL r = 0.85. Total TQ-28 with CBCL EXT r = 0.89; with CBCL INT r =0.34

Cohen (1988)

135 consecutive referrals to neuro

clinic

TQ-28: CD with RBPC CD r = 0.87; HA with RBPC

AP

r = 0.65; ANX with RBPC A-W r = 0.70; HA with RBPC CD: r=0.77 Newcorn et al. (1989)

85 predominantly Black and Hispanic children

DSM-III-R Rating Scale and TQ-28 factors: HA r = 0.89; I-P r = 0.79; CP r =0.79

Kazdin Esveldt-Dawson, & Loar (1983)

32 inpatients

High negative correlations of CP, HA, and I-P with teacher and rater-observed on-task rating and positive correlations with ratings of disruptive ehavior

Halperin et al. (1988)

72 nonreferred children from

__I-P correlated with CPT

Grades 1-6

Omissions and X-only Commissions;

CP correlated

with CPT A-not X Commissions; HYP correlated with CPT A-not X Commissions

differences were found on the Hyperactivity factor for these two versions of the DSM diagnosis. : TQ-28 ratings correlated with directly observed classroom behavior, including on-task and disruptive behavior (Kazdin et al., 1983). Halperin et al. (1988) found that Conduct

Problem and Hyperactivity factor scores correlated positively with commission errors (“A not X” errors, which are accompanied by relatively fast reaction times) on a sustained attention task. The Inattentive-Passive factor scores correlated positively with sustained attention task omissions and the “X-only” commission errors, which are accompanied by relatively slow reaction times. These findings suggest that the TQ-28 scales distinguish between sustained attention (Inattentive-Passive) versus impulsivity (Hyperactivity). Regarding diagnostic group discriminability, the PQ-93 has differentiated between groups of school-aged boys with ADD, specific learning disabilities, and matched normal controls (Kuehne

et al., 1987; see also Kazdin

et al., 1983; Wynne

& Brown,

1984).

Koriath,

Gualtieri, Van Bourgondien, Quade, and Werry (1985) found little evidence for diagnostic specificity in comparing PQ-93 scales with groups of pervasive and situational hyperkinesis,

conduct disorder, emotional disorder (i.e., anxious, oppositional), and comorbid hyperkinesis with emotional or conduct disorder. Only the Anxious-Shy and Psychosomatic factors

22

CONNERS

RATING SCALES

were elevated significantly above the overall mean in the emotional disordered and hyperkinetic-emotional disordered groups. No other scales showed discriminability. This was also the case for the TQ-39 and other measures traditionally associated with hyperactivity. Therefore, the lack of specificity or discriminability may be a function of diagnostic criteria, rather than inadequate properties of measures. Previous research has demonstrated

the CRS-93’s sensitivity between outpatient and inpatient populations (Therrien & Fischer, 1979), patients versus

controls

and neurotics

versus

hyperkinetics

(Conners,

1970), and

hyperkinetics diagnosed by pediatricians versus other hyperkinetics (Plomin & Foch, 1981). The PQ-48 discriminated between groups of hyperactive and conduct-disordered children more effectively than the TQ-39 (Sandberg, Wieselberg, & Shaffer, 1980). Paul and Cohen

(1984) found an infrequent endorsement rate in a group of aphasics on the Conduct Disorder, Psychosomatic,

and Anxiety factors (0%,

10%,

and 20%, respectively).

In addition,

al-

though 55% of the aphasics obtained Hyperactivity factor scores greater than zero, these scores were not correlated with performance IQ or measures of language production or reception. Furthermore, no differences in Hyperactivity factor ratings were observed with aphasics grouped according to high versus low IQ. Newcorn et al. (1989) found that children meeting criteria for either ADHD only or ADHD and ADDH obtained higher Conduct Problem ratings than controls. In another study using a different factor structure, Cohen (1988) found that emotionally handicapped children (i.e., children who received a diagnosis of conduct disorder, depression, and/or adjustment reaction) received higher Conduct Disorder, ADDH, and Anxiety ratings compared with children with diagnoses of LD or ADD. LD children received higher ratings on the ADDH factor than non-LD children. Regarding associated features of the PQ-48 scales, Schaughency and Lahey (1985) reported a relationship between maternal depression and externalizing behavior problems on the PQ-48. Specifically, maternal depression was correlated positively with PQ-48 Conduct Problems and a derived Externalizing score. In addition, mothers’ marital satisfaction was correlated negatively with PQ-48 Conduct Problems, Psychosomatic, and Externalizing scores. Fathers’ marital satisfaction was correlated negatively with PQ-48 InattentiveHyperactive and Antisocial. Mothers with a history of child abuse or neglect rated their children as having more conduct problems than a control group using the PQ-48 (Rohrbeck & Twentyman,

1986).

Investigators have found that the Hyperactivity Index (HI) items may represent more than one behavioral dimension. For example, Margalit (1983) reported two factor-analytic subscales in a group of Israeli LD children—Restlessness and Emotional Lability. In a mixed group of regular/special education students, Furlong and Fortman (1984) described two derived factors among the 10 HI items—Attention/Hyperactivity and Emotional Overindulgence. Boys in the special education programs had higher scores on both factors than did other gender-specific special or regular education students. Predictive Validity. Gittelman (1980) used the Hyperactivity and Inattention factors of the TQ-39 on 61 children in a treatment study with methylphenidate. These children all had

specific learning disorders (SLD) and were screened to eliminate those with behavior problems. Children rated as less hyperactive by teachers showed more improvement in classroom

reading, whereas children rated more hyperactive were most improved on arithmetic achievement and global ratings of math improvement. Ullman et al. (1981) found that the HI

was predictive of later hyperactivity in children not otherwise obviously hyperactive during a pediatrician’s visit. Prinz and Loney (1986) used the TQ-39 and HI on 135 boys taking part in a 3-year follow-

557

558

CONNERS

up study. The TQ-39 predicted aggressive behavior. The Anxiety factor was negatively predictive of response to medication (high anxiety predicted poor medication response). The HI has proved useful as an early screening device. Sigman, Cohen, Beckwith, and Topinka (1987) found that persistence, completion of work, and ability to wait for help when needing assistance at age 5 were predicted by HI ratings at age 2. Satin, Winsberg, Monetti, Sverd, and Foss (1985) examined the HI for its utility as a primary screen for ADHD. Ninetytwo 6- to 9-year-old boys underwent a clinical assessment, and their parents completed the HI. Scores identified 90% of DSM-III diagnosed cases 1 year later. The total HI proved useful for primary screening, and a 5-item subset correctly classified 91% of hyperactives and 73% of the nonhyperactives. Both screens were superior to longer scales rated by parents and teachers at the time diagnoses were made. Construct and Factor Analytic Studies. The TQ-39 has been studied most frequently for its factor structure. The original factor analysis by Conners (1969) on 103 children produced five factors: Aggressive Conduct, Inattentive/Daydreaming, Anxious-Fearful, Hyperactivity, and Sociability. Test—retest reliabilities for unit-weighted factor scores were 0.91, 0.72, 0.81, 0.84, and 0.79, respectively. Werry and Hawthorne (1976) studied a random sample of

418 Auckland (New Zealand) elementary school children who were rated by their teachers on the TQ-39. Norms were comparable to those of a New York sample on four scale factors (i.e., Conduct Problem, Hyperactivity, Inattentive-Passive, and Tension); Sociability, a fifth factor, included in the original article, also appeared. It is concluded that, despite some differences, the original factor structure appeared sufficiently stable across studies not to change the scale’s present scoring system except to add the Sociability factor. Arnold, Barnebey, & Smeltzer (1981) used data from 10 teachers of 225 first graders. They found that the Hyperactivity and Inattention factors were merged in this sample. A factor labeled Shy-Inept appeared to be similar to the Sociability factor of the original analysis, whereas their Rebellious-Unsocialized factor was identical to the Conduct factor. They also reported a factor called Antisocial-Immature. The study by Trites et al. (1982) ona sample of 9,583 Canadian school children generally was considered the definitive factor analysis. It produced six factors: Hyperactivity, Conduct Problem, Emotional-Indulgent, Anxious-Passive, Asocial, and Daydreaming-Inattentive. Thorley (1983) factored data from 110 London children attending an outpatient clinic. His Defiance-Conduct factor correlated 0.96 with the original factor from the Conners (1969) study. Thorley’s Hyperactivity factor correlated 0.82 with the original, the Social Isolation factor correlated 0.94 with the original Sociability factor. A factor labeled Antisocial Conduct Disturbance correlated 0.47 with the original Conduct Disorder factor. Some evidence has indicated that the factor structure may differ in children with special education needs. A study of 138 children subsequently placed into special education classes revealed factors for Conduct Disorder, an ADD/Hyperactivity factor that merged the original Hyperactivity and Inattention factors, and Anxiety and Depression factors. A subsequent study on a larger sample (Cohen, DuRant, & Cook,

1988) administered the TQ-39 to 581

special education children and 45 regular education children (all aged 5-19 years, 60% White, 73% male) to examine the effects of age, gender, and race on factors derived from their earlier factor analysis. The findings of an age by group interaction effect on the conduct

disorder and depression subscales in the emotionally disturbed. children, and a gender by group interaction effect on the conduct disorder subscale for the emotionally disturbed and behavior disorder subgroups, were taken as supporting the contention that the Cohen and

Hynd factor analysis provides the user with a more clinically meaningful description of a child’s overall behavior pattern.

22

CONNERS

RATING SCALES

559

The factor structure of the TQ-39 has been studied extensively in various foreign samples. Luk and Leung (1989) obtained teacher ratings of 495 male and 419 female primary school children (aged 6—12 years) from Hong Kong. Both interrater and test-retest reliability were satisfactory. On all scales, scores were on the high side when compared with results of Western countries. Factor analysis showed that the main difference from other studies was that the Conduct Problem factor and Hyperactivity factor were combined. Leung, Luk, and

Lee (1989) obtained data on 1,746 Hong Kong special education students, primarily mentally retarded, brain-damaged, and physically disabled children. They concluded that the factor structure was similar to that found in other studies (Conners, Taylor & Sandberg,

1984; Thorley,

1983; Werry & Hawthorne,

1969; Luk & Leung,

1989;

1976), and that the scales

were useful in decisions to refer for special services. A number of studies have factor analyzed the HI. The study by Milich et al. (1982) showed that two major factors, Hyperactivity and Aggression, could be separated in the 10 items of the scale. Margalit (1983) factored the HI on 605 7- to 15-year-old Tel Aviv children in learning disability resource programs and found two factors that she labeled Restlessness (alpha = 0.87) and Emotional Lability (alpha = 0.86). These factors were similar to those reported by Furlong and Fortman (1984) on 108 third to fifth graders in special or regular education programs. Since the development of the TQ-28, researchers have continued to explore the factor pattern of the TQ-28

with various samples of children.

For example,

Yao, Solanto,

and

Wender (1988) factor analyzed the TQ-28 on 282 immigrated Chinese—American children (first to sixth graders). Compared with the Goyette et al. (1978) normative study, similar Hyperactivity and Conduct Problem factors emerged, although the original InattentivePassive

factor separated into two factors.

In addition,

a factor reflecting uncooperative

behavior (Interpersonal Problem) emerged. In another study on 354 referred children, a factor analysis performed by Wilson and Keissling (1988) yielded item groups reflecting the following behavior problems: Hyperactivity, Conduct Problems, Unsociability, Inattentive, Passive, and Learning Problems. The Hyperactivity and Conduct Problems factors showed a high concordance with those obtained by Goyette et al (1978). The Conduct Problems factor showed a perfect correspondence with the IOWA Aggression factor; the IOWA InattentiveOveractive factor was represented by items from Wilson and Kiessling’s Hyperactivity and Inattentive factors. Miscellaneous Correlates of Scale Scores. A great many external correlates of the HI have been reported in the literature, probably because of the ease of using a 10-item scale, particularly in school settings. A selected list of some of the more interesting correlates are found in Table 22.3. Some correlates of the Hyperactivity factor of the TQ-39 are presented in Table 22.4.

MAJOR RELIABILITY DATA Test—Retest Reliability.

Conners (1969) reported 1-month test-retest reliabilities ranging

from 0.72 to 0.91 for the TQ-39. Glow, Glow, and Rump (1982) reported 1-year reliabilities of 0.53 and 0.55 for the Conduct Problems and Hyperactivity factors, respectively. Roberts, Milich, Loney, and Caputo (1981) reported 1-week test-retest correlations on 120 children

of 0.90 for Conduct, 0.87 for Hyperactivity, and 0.84 for Inattentiveness. Luk, Leung, Lee, and Lieh (1988) presented test-retest data on 218 Hong Kong children rated by 54 teacher

TABLE 22.3 External Correlates of the Hyperactivity Index

Author

Correlates of Higher Hyperactivity Index Scores

Masser, & Sprague (1976)

Higher heart rate, systolic and diastolic blood pressure, 02 intake, volume of expired air, respiration rate and stepping rate compared with controls.

Oettinger, Majovski, &

Lower WISC coding A than Coding B compared with controls.

Ballard, Boileau, Sleator,

Gauch (1978) Jacob, O’Leary, & Rosenblad (1978)

Hyperactives have higher scores in formal than in informal settings.

Ackerman, Elardo, &

High scorers born to younger parents; 25% lived with mother and stepfather versus 0% of LD or normal controls.

Dykman (1979) Steinkamp (1980)

High scorers tended to be more distractable only when engaged in complex tasks. They did not improve more than controls when placed in distraction-free environments.

Zentall & Shaw (1980)

High noise exacerbated hyperactive child’s problems, whereas low noise had a normalizing effect.

Simonds & Aston (1981)

Higher minor physical anomalies for full-term (r = 0.31) and for preterm LBW babies (r = 0.37). Rapoport et al. (1974) found similar correlation with the Hyperactivity factor (r= 0.28) and Conduct factor

(r = 0.35) of the TQ - 39. Peter, Allan, & Horvath

(1983) Cohen et al. (1983)

Mothers of high scorers controlled behavior by expressing disapproval and making more impulse control suggestions in structured problem-solving than controls; they have little direct physical help. Children commented on tasks and their own performance more than controls.

Zentall, Gohs, & Culatta

Impulse verbalizations during transition and tasks requiring response delay; dysfluencies, high rate of commenting during task performance, movement out of seat.

(1983) Weithorn, Kagen, & Marcus (1984)

560

Perceive teachers as demanding and nonaccepting, showing less nurturance, affective reward, affiliative companionship, and principled discipline.

HI scores accounted variance.

for significant degree of achievement score

Weissbluth (1984) _

Low scorers were more adaptable, mild, and positive in mood, and had longer sleep durations than high scorers.

Raymer & Poppen (1985)

Correlations with EMG:

Omizo, Cubberly, Semands, & Omizo (1986)

Biofeedback and relaxation improved memory tasks in high scorers; EMG and paired-associate learning were significant discriminators between high and low scorers.

August & Garfinkel (1990)

Poor performance on sequential memory and attentional tasks.

high scorers more tense.

22

CONNERS

RATING SCALES

TABLE 22.4 External Correlates of the TQ-39 Hyperactivity Factor

Author

Correlates of High Scorers

Rapoport, Quinn, and Scibanu (1974)

Significant correlation with blood serotonin levels.

Copeland and Weissbrod (1978)

Negative correlation with sitting during viewing of a videotape fast activity model; total vocalizations positively related with the HA factor during baseline and a low active model; HA more affected by external activity cues.

Bala et al. (1981)

More saccades than normals, greatest at low target velocities and decreased at higher stimulus velocities; differences disappeared under stimulant medication treatment.

Radosh and Gittelman (1981)

More affected by low- and high-appeal distraction than normals.

Rosenbaum and Baker (1984)

In concept formation, task learned under schedule of contingent positive reinforcement, HA showed marked decrease in use of effective problem-solving strategies, with increase in negative self-evaluations and solution-irrelevant statements.

Rickard and de Rael (1987)

High correlation with MYTH scores (scale for Type A behaviors). Mothers own Type A ratings associated with rating their child’s Hyperactivity factor.

Hunter et al. (1985)

Total score was the best predictor teacherrated Type A behavior when controlling for age, race, and gender.

pairs. One-month

reliabilities were 0.88, 0.80, 0.68, and 0.87 for Conduct,

Inattentive,

Anxiety, and Hyperactivity factors, respectively. Teacher ratings on the HI consistently have been very high. Zentall and Barack (1979) reported 0.89 over a 2-week period, and Epstein and Nieminen (1983) reported 0.86 over a 1-month period. Edelbrock and Rancurello (1985) reported a test-retest correlation of 0.96 for 55 inpatients. Milich, Roberts, Loney, and Caputo (1980) rated 120 boys over a 1-week

interval and found a correlation of 0.91. Brito (1987) reported a test-retest reliability of 0.74 on 196 Brazilian school children with the same pre—post raters, but only 0.20 for different pre—post raters. Internal Consistency Reliability. High internal reliability for the TQ-39 was reported by Edelbrock, Greenbaum, and Conover (1985), with an average reliability of 0.94 for the various scales. Trites et al. (1982) reported alpha reliability coefficients ranging from 0.61 on the Daydreaming factor to 0.95 on the Hyperactivity factor. High internal consistency also was reported by Glow and Glow (1979) and Reynolds and Stark (1986). Item-total correlations on the PQ-48 ranged from 0.13 for Item 44 (vomiting or nausea) to 0.65 for Item 6 (sucks or chews thumb, clothing, blanket) (Goyette et al., 1978). Sandberg et al. (1980) reported an alpha internal consistency reliability coefficient of 0.92 for the Hyperactivity Index, corrected for length.

561

Interpretive Strategy STANDARD INTERPRETIVE STRATEGY Chapter 3 (“Interpretation”) of the user’s manual (Conners, 1989) provided a discussion on threats to validity that must be addressed to decide how much weight should be placed on the results obtained

from The Conners

Rating Scales

(CRS).

Consistent

with the old adage

“sarbage in, garbage out,” invalid input from CRS rater(s) is likely to produce misleading information. Psychometric theory precludes the validity of the actual test scores from being higher than the validity of the ratings on which the scores are based. Therefore, determining whether the scale was used appropriately for the child should be the first step in the interpretation process. As discussed in Chapter 3 of the user’s manual, there are four levels for interpreting the pattern of responses to the CRS. Interpreting Item Responses. Inspection of the overall pattern of item responses may clarify ambiguities and inconsistencies while the rater is still present. Items on the long scale versions—the PQ-93 and the TQ-39—are arranged in such a manner as to facilitate interpretation and discussion with the rater about major areas of clinical interest. Discussion following completion of the scale might even be incorporated into a semistructured interview. Of course, caution should be taken in interpreting responses from a single specific item. Consistency in reports of major areas of behavior are likely to be of diagnostic utility and in development of treatment programs. Comparing Ratings from Multiple Sources. When both teacher and parent ratings are available, this provides an opportunity to evaluate the degree of pervasiveness of the referring problem. As discussed in Chapter 5 of the manual, analysis of teacher and parent data should be used to identify the school or home as a context in which behavior is perceived as problematic. Of course, the presence of any threats to validity should be entertained first. Low interrater reliability from different sources such as parent and teacher should be expected. Therefore, use of multisource data must include a clinical judgment about the relative quality of the two data sources and the reasons for any reported discrepancies. Although this comparison is the most common one in the evaluation process, comparing ratings between caregivers may provide additional interpretive information. Such a comparison often is useful in revealing conflicting information from parents, and laying the basis for subsequent treatment intervention strategies, with them. Another strategy of interpretation is to inspect the pattern of scale scores associated with a known diagnostic pattern. The most common interpretation strategy employed in personality tests is the two-point code, which entails categorizing clients into clinically meaningful patterns of behavior based on two (or more) elevated scales. Unfortunately, little research has

been done with CRS to determine such patterns. Such a gap in the diagnostic/categorical process probably is due to the requirements of a large sample size and special handling of scale regression coefficients.

Nonetheless, some research involving cluster analytic procedures that focus on typological approaches to classification of behavior disorder types has yielded useful information. However, between-study comparisons are not possible at this point, due to a considerable variation in the methods used for such analyses and different sample features. As is generally understood, cluster analysis groups subjects based on characteristic similarity, whereas factor

analysis groups items based on interitem correlations. Cluster analysis is often more appealing to clinicians who are used to thinking of certain types of patients.

562

A.

22

CONNERS RATING SCALES

563

Two studies provided typological information for behavior disorders generally, whereas another focused specifically on developing different clusters of hyperactive children. Using ratings from the standardization sample of the PQ-93, Conners and Wells (1986) found that five clusters emerged, one of which reflected a normal childhood behavior profile (i.e., no elevations on any CRS-93 scale). A second cluster of children showed elevations on the Antisocial, Learning Problem, and Conduct Disorder factors, possibly representing a delinquency cluster. A third group of children was defined by elevations on the Anxious-Shy, Psychosomatic, and Hyperactive-Immature factors, the combination of which Conners and Wells referred to as the typical Neurotic or Internalizing pattern. A fourth cluster included what was considered the Hyperkinetic group, due to elevations on the Restless-Disorganized factor, with moderate elevations on the Hyperactive-Immature and Conduct Disorder factors.

Importantly, 70% of the children actually diagnosed Hyperkinetic by a child psychiatrist were distributed across the other clusters, suggesting considerable heterogeneity of the Hyperkinetic group. A fifth cluster was characterized by elevations on the Obsessive— Compulsive or Perfectionism factors. Taylor et al., (1986) conducted a cluster analysis on Taylor and Sandberg’s (1984) TQ-39 factors in conjunction with laboratory measures, and child and parent interview information regarding hyperactivity, defiance, anxiety, and depression. Results included four clusters: (a) Classroom Conduct Problems, characterized by elevations of the TQ Hyperactivity and Defiance scales; (b) Hyperactive, distinguished by elevations on all lab measures of hyperactivity, endorsements on interviews, and an elevated TQ Hyperactivity score; (c) Anxious, characterized by an elevation on the CTRS Anxiety scale, but low on interview-endorsed

depression; and (d) Depression, indicated by interview-endorsed depression, but no elevation on the CTRS Anxiety factor. Finally, Klein and Young (1979) conducted a cluster analysis on a group of hyperactivediagnosed outpatients. These results confirmed previous researchers’ findings of group heterogeneity, in that four types of hyperactive children were defined: Anxious, Conduct Problem, Inattentive, and Low Problem. Graphical presentation of the profiles showed that the anxious hyperactive child exhibited elevations on all TQ-39 factors except Conduct Problem. The Conduct Problem Hyperactive group also showed elevations on the Inattentive scale. The Inattentive group also showed elevations on the Hyperactivity scale. Low Problem Hyperactives were considerably lower than other groups on all TQ-39 factors. Clearly, such cluster analytic approaches deserve further study to aid in the development of typologies of problem behavior syndromes.

INTERPRETING INDIVIDUAL SCALE SCORES The most common method of interpreting the CRS is through the interpretation of individual scale scores. The individual scale scores are compared to norms for appropriate groups of children, that is, the transformed scale scores tell the user how the child compares to children

not specifically identified as having a diagnosable behavior problem. The user should Chapters 3 and 5 of the user’s manual for a presentation of factor structures of the versions of CRS, use of T scores, and interpretive guidelines of these standardized A large body of research provides some basis for more extended interpretation of

refer to various scores. individ-

ual scales. However, to reiterate, interpretations should not be used in lieu of a comprehen-

sive diagnostic procedure. Rather, interpretations provide hypotheses that the user-clinician may wish to pursue. Scale correlates simply are associated features and do not ensure that all features are present in every individual. Furthermore, although empirically derived scales

564

CONNERS

such as CRS (and comparable instruments) may show close parallels to theoretically derived classification systems and associated features, findings from the two strategies are not

interchangeable. Thus, an elevation on a particular CRS scale is not sufficient for making a DSM diagnosis.

EXPERT, ACTUARIAL AND COMBINATION SOURCES This author is unaware of any studies attempting to use the Conners scales in actuarial predictions. Early studies with the Parent Questionnaires convinced me that the “diagnostic efficiency” of these scales is only about 70%-—80%, which is quite adequate for screening, but not for diagnosis. However, until recently, diagnostic practices were so varied that it was doubtful that meaningful prediction equations could be written. Almost all studies floundered on the question of comorbidity. With increased diagnostic precision, it now may be possible to address the issue of how much one can rely on the scales to aid in diagnostic precision. One example of the possibilities is from a study by Satin et al. (1985). They examined the HI for its utility as a primary screen for ADHD. Ninety-two 6- to 9-year-old boys underwent a clinical assessment and their parents completed the HI. Scores identified 90% of DSM-III diagnosed cases 1 year later. The total HI proved useful for primary screening, and a 5-item subset correctly classified 91% of the hyperactives and 73% of the nonhyperactives. Both screens were superior to longer scales rated by parents and teachers at the time diagnoses were made. Brown (1985) showed that teacher ratings could discriminate DSM-III diagnoses of ADD with and without hyperactivity.

ROLE OF INFORMATION FROM OTHER SOURCES Psychological

tests such as the Wechsler

scales, structured

parent and child interviews,

neuropsychological tests, and other standard clinical assessment tools may be useful in combination with the rating scales to give a comprehensive picture of the child’s status and needs for treatment.

However,

none of these tools is necessary or sufficient for adequate

assessment and treatment planning. On the other hand, a careful developmental, medical, family, and social history are (in my opinion) absolutely essential elements of an adequate

assessment.

REFERENCES TO RECOMMENDED INTERPRETIVE MATERIALS An interpretive guide to the Conners Rating Scales is currently in preparation. The user’s manual (Conners, 1989) contains illustrative case materials and suggestions for interpretation, although it does not provide extensive guidance in use of the scales.

Treatment Planning: General Issues ISSUES RELEVANT TO USE IN TREATMENT PLANNING Factors relevant to the use of the rating scales in treatment planning include definition of the target behavior(s), the number of observation periods required, the source of the ratings, and type of treatment. The CRS are generally most sensitive to externalizing behaviors, so that aggressive conduct problems, hyperactivity, defiance to authority, and classroom behavior problems are likely to be the most useful targets to be measured in treatment programs. Obviously, the longer scales and their factors will need to be employed if more than one factor or dimension is being targeted. On the other hand, if the target behavior is overall psychopathology, the 10item abbreviated scale for parents or teachers has proved to be both sensitive and reliable in many different treatment studies. In programs or studies that only require a pre- and posttreatment measure, the longer parent and teacher forms always are recommended, with at least two pretreatment measures to offset the expected practice or regression-to-mean effect. Studies requiring frequent assessments (e.g., weekly or daily over several weeks or months) almost always should use the 10-item scale. Experience has shown that both parents and teachers will reliably fill out the shorter abbreviated scale over a long period, even with frequent administrations, but will begin to balk at repeated use of the longer scales. Frequently, one must decide which source of the ratings should be identified as the primary efficacy or outcome measure. If symptoms are high in school, but low at home, it makes little sense to rely on parent ratings. Sometimes one parent is more knowledgeable and reliable than the other, and some choice should be made regarding on which parent to rely. In stimulant drug studies, investigators often report a failure of parent scales to detect improvement, even though the drug may have worn off by the time the child reaches home from school. It is important to adjust dosages on the basis of measures that are obtained during the time of peak action of the drugs. In some cases, when treatment effects are expected to be quite transitory, it is helpful to define a specific time of the day when parents should observe the child more carefully than usual, and fill out their rating scales immediately following the observation period. For example, I was able to detect effects of artificial colors in challenge cookies when parents were given a specific period to observe the effects (Conners, 1980).

EXTRANEOUS FACTORS IN USE FOR TREATMENT PLANNING The use of the rating scales for measuring response to drug therapy almost always is accomplished better during the school year rather than during the summer. Without teacher

observations, the most important impact of drugs, whether in facilitating or impairing cognitive and classroom behavioral function, will be missed.

Another consideration in treatment planning is the stability of the symptom picture. Experience has shown, and studies demonstrate that, although there may be an initial drop in symptom scores between the first and second administration, repeated administration may reveal an upward drift in symptoms over time. Diamond and Deane (1990) administered the

565

566

CONNERS

Hyperactivity Index of the Conners Teachers’ Rating Scale weekly for 7 consecutive weeks to teachers of 54 randomly selected first- through third-grade children. A regression analysis confirmed the clinical impression that scores increased with time. The rank order of scores of specific children was nevertheless reliable.

RESEARCH

RELEVANT TO TREATMENT PLANNING

Drug Treatment and Rate Dependency. Baseline or placebo rate of behavior is a powerful predictor of drug response. This phenomenon, known as rate dependency, occurs across a wide range of species, behaviors, and drugs, but particularly for stimulant drugs such as amphetamines. The rate-dependency phenomenon first was reported for ADHD/LD children by Conners (1972) in a controlled study comparing dextro-amphetamine, methylphenidate, and placebo. Regression analyses showed that the best predictor of any drug-related changes in several different domains was always the initial rate or level of the behavior, even when controlled for regression-to-mean effects. Recent studies indicate that the degree of rate dependency is related to stimulant drug dose, that is, a 15-mg dose of MPH produces a steeper slope of the regression line relating placebo rate and on-drug rate of behavior than a 5-mg dose. Importantly, this finding holds for academic as well as social behavior, indicating that baseline or placebo behavior rates may be useful in predicting which dose is most effective for which domain of behavior. Given these effects, it seems reasonable in drug studies to use the most elevated scales as predictors of drug treatment outcome, as well as

the primary target behaviors for treatment. High scores on Conduct Disorder, Hyperactivity, and Inattention scales, in particular, are good indicators of likely improvement with drug therapy. A number of studies with the 10-item Hyperactivity Index have shown it to be quite sensitive to dose and time-action effects of drugs. For example, Brown, Borden, Spunt, and

Medenis (1985) found that decrease in scores closely followed the changes in the major metabolite of dextro-amphetamine.

_Behavior Therapy and Treatment Planning. In contrast to drug studies, data from wellcontrolled behavioral treatment studies indicate that low scores on Hyperactivity and Conduct scores predict treatment effects. For example, Sullivan and O’Leary (1990) compared the maintenance of treatment gains achieved in a classroom with reward and response-cost token programs in the ten 6- 9-year-old children with academic and/or behavioral problems. Superior maintenance of on-task behavior during fading of the response-cost program was predicted based on differential discriminability of treatment withdrawal. Both the reward and response-cost programs had large and equivalent treatment effects. During fading of the RC program, all children maintained their rates of on-task behavior. During fading of the reward program, half of the subjects did not maintain their rates of on-task behavior. The five nonmaintainers had significantly elevated Aggression and Hyperactivity scores on the Conners Teacher Rating Scale.

In general, data indicate that behavioral treatments are more likely to influence externalizing behaviors on the Conners scales than internalizing behaviors. For example, in a 6-month follow-up study of two interventions with 20 hyperactive boys, different patterns of improvement were observed for an intervention that focused on self-control and one that employed

contingent social reinforcement. Of the two manipulations, self-control methods produced

22

CONNERS

RATING SCALES

567

significantly stronger long-term benefits in terms of increased perception of personal control over academic outcomes. On the other hand, social reinforcement produced significantly stronger long-term benefits in terms of teacher ratings of hyperactivity or impulsivity (Bugental, Collins, Collins, & Chaney,

1978).

SCALES MOST USEFUL IN TREATMENT PLANNING There is little doubt that the Conduct and Hyperactivity factors are the most robust and replicable factors of the CRS. In the pre-adolescent age range, the Conduct factor appears to rather sensitively describe the DSM-III-R concept of oppositional defiant disorder, whereas the HI and Hyperactivity factor contain key elements of the DSM-III-R and DSM-III versions of attention deficit disorder. Therefore, these two scales are likely to prove most useful in the context of treatment planning and assessment with ODD and ADHD. All three scales have been shown to be sensitive to medication effects, including dosage. Relatively little literature exists on the use of the Sociability or Anxiety scales as measures of treatment

outcome.

FUTURE RESEARCH

NEEDS IN TREATMENT PLANNING

The CRS currently are being planned for major revisions, including new and more representative normal samples. One of the goals of this revision is to increase the number of items available for the internalizing scales. It also is clear that measurement of ADHD and related disorders in adolescence and young adulthood needs more research. Conners and Wells (1985) devised a self-report scale for adolescent ADHD that currently is being normed for use with older children and young adults. Not only does the symptom pattern of ADHD change with age (Weiss & Hechtman, 1986), but self-report becomes more veridical and consequently more useful as an assessment measure and indicator of change with treatment.

Use of the Conners Scales for Treatment Outcome Assessment Most of the issues in outcome assessment already have been touched on in the previous section on treatment planning. I now highlight certain aspects of outcome assessment that merit special attention.

GENERAL CONSIDERATIONS Direct observations of disordered behavior, such as time or interval sampling of behavior,

often is considered a more objective way to define a treatment outcome than global rating scales. Such observations may be used to validate or anchor the ratings made by parents or

568

CONNERS teachers. However,

validity studies with both direct observations (Oettinger, Majovski, &

Gauch, 1978) and performance measures (Steinkamp, 1980) indicate that the structure of the setting and complexity of the task are important variables in determining whether teacher or parent ratings correlate with objective measures. Hyperactive children are rated higher on the Hyperactivity Index in informal settings, but not in formal settings. High scorers on the HI tend to be more distractible on performance measures only when the latter are sufficiently complex. Thus, in using the scales to measure treatment effects on symptomatic behavior, it is important that the appropriate frame of reference for the observer is established. For example, teachers should understand that it is the formal structure of the classroom (not the

playground) that is the framework for making their observations. Highly restricted settings such as tutorials or small classes are not the same context as highly stimulating or crowded classes, and hyperactivity ratings are more valid where there are high-activity cues in the environment (Copeland & Weissbrod, 1978). Parents may need to be reminded that tasks, chores, routines, or family experiences that

require effort and structure are the framework for their observations, not watching TV or other highly passive or pleasurable activities. For example, parents often will assert that “He isn’t really inattentive because he can watch TV for hours without moving.”) The best correlations of parent ratings with directly observed parent behavior come when parents are involved

in structured

problem-solving

situations

(Cohen,

Sullivan,

Minde,

Novak,

&

Keens, 1983). Similarly, tasks requiring response delay or transitions are more likely to be sensitive to picking up impulsive verbalizations or disruptive behavior with the Hyperactivity Index (Zentall, Gohs, & Culatta, 1983). Because the scales are used in a variety of assessment, treatment, and research contexts, the specific instructions to the rater are quite limited

and need to be augmented for specific measurement contexts.

REPEATED ASSESSMENTS Research demonstrates a decrease in factor scores or Hyperactivity Index scores between the first and second administration. Therefore, it is highly recommended that at least two baseline measures be collected (Milich et al., 1980). In a simple pre—post design without an appropriate control, this artifact could lead to a mistaken interpretation of improvement due to the treatment alone. Studies that use patients in own-control or crossover designs need to be aware of the opposite tendency for repeated measures of the scales to drift upward over long periods of time (Diamond & Deane, 1990). Obviously, appropriate controls need to be employed to guard against misinterpreting upward drift in scores as a function of negative effects of the

active treatment.

RELIANCE ON PARTICULAR SCALES IN MEASURING OUTCOME As mentioned earlier, the largest body of work on treatment outcome is with the Hyperactivity Index or 10-item abbreviated scale. The Hyperactivity Index has proved to be especially valuable in quantitative drug studies, such as measurement of blood level relationships with behavior in highly structured settings (Brown, Ebert, Hunt, & Rapport, 1981). Perhaps

22

CONNERS

RATING SCALES

569

the second most often used scale is the Hyperactivity factor from the parent or teacher forms. These two measures of hyperactivity sometimes are confused, because of their names. It should be remembered that the index includes variance attributable to other factor dimensions, particularly aggressive conduct or oppositional behavior, and in the sense is a less pure measure of outcomes involving only hyperactivity per se. Clearly there has been confusion between the construct of hyperactivity and the physical behaviors of overactivity by some investigators in the past.

FUTURE RESEARCH ON TREATMENT OUTCOME ASSESSMENT Regardless of the current adequacy of its empirical base, all indications point to the important role that DSM-III (and momentarily expected DSM-IV) have on treatment outcome research. There are already normative data on DSM checklists that are being used in treatment and assessment research. Because the DSM categories, and hence the criteria for diagnosis, are not based on psychometric data such as factor analysis, it is important for current rating scales to include item content and scoring algorithms that will allow tianslation into DSM categories. At the same time, it is important to maintain the dimensionality of item structure actually found in empirical research. Thus, I foresee a research trend for rating scales to incorporate DSM structures without sacrificing empirical validity.

CLINICAL APPLICATIONS IN OUTCOME ASSESSMENT In addition to points mentioned earlier in the treatment planning section, a few additional points are relevant. Typically, one will examine the T scores and focus on treatment changes of half a standard deviation or more (T-score changes of 5 or more scale points). However, it

is important to remember that the factor scales represent a structure derived from large numbers of cases, and individuals may have only a few target symptoms relevant to themselves within a particular factor. Clinically, therefore, it is always useful in assessing change to have parents or teachers circle three to five items that they think are the most crucial problem areas. Then, regardless of changes in factor scores, it is possible to examine particular target symptoms or behaviors for evidence of treatment effect. Obviously, one must be mindful of the possibility of interpreting random fluctuations as real change, but this is precisely the reason for not relying on a single outcome measure. It is possible that a factor might show significant change, but a particular target symptom of interest to the parent or teacher does not change. Therefore, it is important to maintain an “ipsative” mindset, as well as a normative one, in evaluating change in the clinical setting with rating scales. In general, endorsements of “pretty much” or “very much” represent clinically significant levels of behavior or symptomatology. But some raters use a constricted scale, so that a

change from “not at all” to “just a little” represents a major shift in judgment of symptom intensity. Therefore, it is important in a clinical context to note the overall pattern of parent or teacher item endorsements and judge the impact of treatment accordingly. For the most part, I have interpreted clinically significant change as constituting a change from “very much” or “pretty much” to “not at all” or “just a little.” Based on this reasoning, average scale changes (where each item is scored 0, 1, 2, or 3) of 2 points are considered clinically meaningful.

570

~ CONNERS

USE WITH OTHER EVALUATION DATA In drug therapy outcome studies, an essential source of data to be used in conjunction with the scales is treatment-emergent side effects (TES). Although behavioral symptoms can be increased by drugs, and therefore constitute TES, specific lists of symptoms for this purpose have been devised (National Institute of Mental Health, 1973).

Other collateral information of importance is who, where, and when: Who made the ratings, in what setting, and when they were done in relation to the treatment. Obviously this advice is most pertinent to time-sensitive and context-sensitive treatments such as pharmacologic agents, but experience shows that changes in raters from pre- to posttreatment, as well as variations in time and place of data collection, affect the amount of noise in the rating data. A change in rater from one parent to the other, one classroom or teacher to another, or to a different time of day will add an unpredictable element to assessment of outcome, and most likely will diminish the sensitivity of ratings.

PROVISION OF FEEDBACK REGARDING

FINDINGS

When using the computertized versions of the CRS, it is possible to set a cutoff (e.g., a T score to 65) that, when surpassed, will trigger the output of a narrative passage describing the content of particular factors. This narrative passage often is useful as a clinical vignette. to give to parents: “This is what the computer said about your ratings of your child; do you agree?” This method often clarifies discrepancies between the ways the parents responded and what they meant to convey. In the hand-scored versions, it often is useful to interpret a factor based on one’s knowledge of its item content, saying something like (for a high Conduct Disorder score, for instance), “I see that you have indicated your child tends to be somewhat aggressive and bullying with others; is that how you see him?” Similar feedback is useful with respect to factor score changes with treatment. Particularly when interpreting a series of changes on the abbreviated scales in an individual drug trial, it is important to caution the parents that they may not have noticed drug effects that have worn off by the time the child gets home. Therefore, researchers tend to rely more on the teacher’s responses, unless the drug actually is given on weekends or other times when the parents can observe the effects.

LIMITATIONS/POTENTIAL PROBLEMS IN USE I noted earlier that the CRS clearly are more robust for externalizing than internalizing measurements. Inasmuch as some drugs can cause an increase in subjective symptoms of anxiety or somatic distress, it is clear that additional inquiry, usually with the patient rather than the parent or teacher, is necessary to pick up such adverse effects. With stimulant treatment in particular, a sign of overdosing is often the subjective sense of being “wired.” This type of effect will not be picked up in treatment studies using only the CRS. Similar

cautions apply to dysphoria and anhedonia. Therefore, the CRS are not advisable for treatment outcome dependent measures for depression or antidepressant drug research. For adolescents, the self-report scale described earlier (Conners & Wells, 1985) offers a useful complement to the parent and teacher scales, and includes an adequate sample of affective

symptoms.

Case Study The following case illustrates the use of the CRS in a typical situation in which a parent and teacher give conflicting information, and in which a definitive result on a therapeutic drug trial reveals some of the important dynamics in the way the parent and teacher use the rating scales. JG is an 8-year-old female referred because of difficulty focusing attention in school. A Wechsler Intelligence Scale for Children (WISC-R) revealed bright-normal IQ, and standardized achievement tests indicated satisfactory achievement in all subject except writing (standard score = 81) and calculation (standard score = 87). She was evaluated by a pediatrician, who gave her a diagnosis of ADHD and recommended medication, but the parents were reluctant to accept the diagnosis and the use of a stimulant drug. The parents’ ratings on the PQ-93 revealed only one score approaching clinical significance, the RestlessDisorganized factor, with a T score of 61 according to the mother and 59 according to the father. Figure 22.1 shows the initial maternal ratings on the PQ-93. In contrast, the teacher’s rating on the TQ-39 showed a T-score of 77 on the Hyperactivity factor, 71 on the Asocial factor, 82 for Daydreaming-Inattentive,

and 86 for Hyperactivity

Index. This picture of a hyperactive, inattentive, somewhat socially isolated youngster was much more in keeping with the parents’ picture of the child as it emerged during direct interview and careful discussion of their original PQ-93. Figure 22.2 shows the profile on the TQ-39 according to the patient’s teacher. Developmental history was positive for a complicated pregnancy marked by severe nausea and vomiting, extremely long labor, and fetal distress. The baby was hyperalert and seemed to require very little sleep, and as she got older she became active and wild, destroying the nursery and having difficulty being restrained in cars or outings. Teachers complained of disorganized, disheveled work and difficulty following complex directions. A structured interview of DSM-III-R symptoms of ADHD and ODD taken from the parents revealed that JG met 12 of 14 symptoms required for ADHD, and 7 of 9 symptoms for ODD. Psychologi-

PATIENTJG

=——tw

Parent Questionnaire (PQ-93) 65

60 4

|

55

| |

Ww

a

8 50

|

n

|

l=

|

45 40

|

| —Conduct

+ Anxious

FIG. 22.1.

: Restless

:

at ——} Psychosomatic Antisocial Learning Obsess-Comp Hyper-Immature FACTOR

Parent Questionnaire (PQ-93) for patient JG.

571

572

CONNERS

PATIENT JG: Teacher Questionnaire (TQ-39) 90 5

=



a

Sor

ts Ter

ae

80

70 Oo

%

pa 60 4 50 + 40

——

+

Hyper Conduct

FIG. 22.2.

+

+

Excitable

+

Asocial Anxious FACTOR

Index Inattentive

Teacher Questionnaire (TQ-39) for patient JG.

cal testing revealed poor sustained attention on a Continuous Performance Test (CPT), and impaired selective attention to visual stimuli presented to the left visual field (right hemisphere). When the lack of elevated scores on the CRS parent reports was discussed in an interview with the parents, they admitted that JG was in fact quite a problem at home, but because they felt they could handle the problem themselves, they were reluctant to brand her. Observation of parental interactions with JG showed they both were extremely affectionate, indulgent, and somewhat manipulated by the girl. Further history revealed that the mother felt extremely guilty about deferring childbearing until her career was finished, and had reacted by being extremely indulgent with JG, her only child. It was considered that the parental lack of ability to set limits and label the child’s behavior compounded a long-standing attention deficit disorder with a variety of oppositional and manipulative behaviors. JG’s father considered that he may have been hyperactive, recalling that as a toddler his parents had to use a leash to keep him in tow when traveling on a bus or train. In addition, there was a family history of possible bipolar illness and schizophrenia on the mother’s side of the family. A double blind and placebo-controlled trial of methylphenidate was recommended. Behavior was monitored by teacher and parents who filled out daily 10-item HI ratings. Figure 22.3 shows the plot of the results. Independent testing with several laboratory tests of attention confirmed that the patient improved significantly on the 15-mg dose. In addition to medication, the parents were advised to participate in a parent training group, and to enroll JG in a group for children with social skills deficits. A consultation was made to the school, and a behavioral program involving a daily report card and home-based reward system was instituted. Follow-up indicated a marked improvement in all areas of functioning. Parents continued to have difficulty setting firm limits on their much-adored, but

oppositional child until they successfully completed a 10-week course of behavioral parent training. This case illustrates a number of important aspects of the use of the CRS. First, the initial

report by the parents on the PQ-93 contained correct information on the child’s problems, but *

>

22

CONNERS

RATING SCALES

573

T-SCORE

10

16

26

DAY OF TREATMENT —&- PARENT

-++ TEACHER

FIG. 22.3. Double blind drug trial with the Hyperactivity Index for patient JG.

their rated intensity was reduced by parental anxieties. That is, when examining the pattern of individual item responses, the parents endorsed items of Restlessness, Excitability/Impulsiveness, and Short Attention Span. Additional problems were noted with fidgeting, running around between mouthfuls at meals, and easy frustratibility. Yet, the severity of the ratings compared with the normative sample was too low to reach cutoff levels of Statistical significance (e.g., a T score greater than 65). It was only when parents were questioned regarding the discrepancy from teacher’s report, and in a structured ADHD interview, that it became apparent how much they were minimizing problems that actually were causing considerable distress at home. Second, additional information from psychological testing confirmed that the child did not have a learning disability that might be causing distress only in the school environment. This made it easier to interpret the difficulties at school as primary behavior problems, not a frustration reaction to inappropriate educational expectations. Finally, a carefully monitored double blind trial and use of the Hyperactivity Index was successful in convincing parents that medication was effective and worth using as one component of an overall treatment plan. Previously, parents had rejected the notion that somehow their child’s problem had an important biological component, and had accepted various suggestions of mental health workers that the parents were the sole cause of JG’s difficulties. The clear concordance of teacher and parent ratings of behavior changes helped convince them that the problem was appearing in more than one setting and was responsive to pharmacologic intervention. The scales were also helpful in alleviating mother’s anxieties that her daughter was showing early signs of the psychotic behavior that mother’s sister had experienced all of her life.

Summary and Conclusions In summary, if one is mindful of the limitations of checklists like the CRS, they are extremely useful in the entire process of defining the problem, eliciting further information from parents and teachers, shaping treatment planning, and measuring treatment outcome. The

574

CONNERS

process is a dynamic one, requiring synthesis of several types of information and careful use of clinical judgment. A simplistic use of the scales for automatic diagnoses or actuarial decisionmaking is to be discouraged. Although normative data are important, one needs to remember that, in filling out such instruments, parents and teachers approach their task with varying degrees of observational skill, openness, defensiveness, or candor. The context of assessment is important, and the relationship of the parent or teacher to the professional may vary considerably, and results may vary accordingly. Factor scores represent broad hypotheses regarding distinct dimensions of psychopathology. They never should become reified to the point where they substitute for a clinician’s overall judgments regarding treatment planning and treatment outcome.

Acknowledgments The author is indebted to Steve Shapiro, Ph.D., who helped compile most of the studies on the Conners Rating Scales and whose organization of the resulting literature was instrumental in writing the present chapter. Tim Butcher gave invaluable assistance in compiling the references and conducting literature searches.

References Ackerman,

R.

Blouin, A. G., Conners, C. K., Seidel, W. T., &

A. (1979). A psychosocial study of hyperactive and learning disabled boys. Journal of Abnormal Child Psychology, 7, 91-99. Arnold, L. E., Barnebey, N. S., & Smeltzer, D. J. (1981). First grade norms, factor analysis

P. T., Elardo, P. T., & Dykman,

Blouin, J. (1989). The independence of hyperactivity from conduct disorder: Methodological considerations. Canadian Journal of Psychiatry, 34, 279-282. Brito, G. N. (1987). The Conners Abbreviated Teacher Rating Scale: Development of norms in Brazil. Journal of Abnormal Child Psychol-

and cross correlation for Conners, Davids, and

Quay-Peterson behavior rating scales. Journal of Learning Disabilities, 14, 269-275. August, G. J., & Garfinkel, B. D. (1990). Comorbidity of ADHD and reading disability among Clinic-referred children. Journal of Abnormal Child Psychology, 18, 29-45. Bala, S. P., Cohen, B., Morris, A. G., Atkin, A., Gittelman, R., & Kates, W. (1981). Saccades of hyperactive and normal boys during ocular pursuit. Developmental Medicine and Child Neurology, 23, 323-326. Ballard, J. E., Boileau, R. A., Sleator, E. K., Masser, B. H., & Sprague, R. L. (1976). Cardiovascular responses of hyperactive children to methylphenidate. Journal of the American Medical Association, 236, 2870-2874. Berg, I., Bulter, A., Hullin, R., Smith, R., & Tyrer, S. (1978). Features of children taken to juvenile court for failure to attend school. Psychological Medicine, 9, 453-477.

ogy, 15, 511-518. Brown,

G. L., Ebert, M.

H., Hunt,

R. D., &

Rapoport, J. L. (1981). Urinary 3-methoxy4-hydroxyphenylglycol and homovanillic acid response to d-amphetamine in hyperactive children. Biological Psychiatry, 16, 779-787. Brown, R. T. (1985). The validity of teacher ratings in differentiating between two subgroups of attention deficit disordered children with or without hyperactivity. Educational and Psychological Measurement, 45, 661—669. Brown, R. T., Borden, K. A., Spunt, A. L., & Medenis, R. (1985). Depression following pemoline withdrawal in a hyperactive child.

Clinical Pediatrics, 24, 174.

Bugental, D. B., Collins, S., Collins, L., & Chaney, L. A. (1978). Attributional and behavioral changes following two behavior management interventions with hyperactive boys:

22 A follow-up study. Child Development,

49,

247-250.

CONNERS

RATING SCALES

Diamond, J. M., & Deane, F. P. (1990). Conners Teachers’ Questionnaire: Effects and implications of frequent administration. Journal of Clinical Child Psychology, 19, 202—204. Edelbrock, C. S., Greenbaum, R., & Conover, N.C. (1985). Reliability and concurrent relations between the teacher version of the Child Behavior Profile and Conners Revised Teacher Rating Scale. Journal of Abnormal Child Psy-

Christie, D., de Witt, R. A., Kaltenbach, P., & Reed, D. (1984). Hyperactivity in children: Evidence for differences between parents’ and teachers’ perceptions of predominant features. Psychological Reports, 54, 771-774. Cohen, M. (1988). The Revised Conners Parent Rating Scale: Factor structure replication with a diversified clinical sample. Journal of Abchology, 13, 295-303. normal Child_Psychology, 16, 187-196. - Edelbrock, C. S., & Rancurello, M. D. (1985). Cohen, M., DuRant, R. H., & Cook, C. (1988). Childhood hyperactivity: An overview of ratThe Conners Teacher Rating Scale: Effects of ing scales and their applications. Attention age, sex, and race with special education childeficit disorder: Issues in assessment and intervention (Special Issue). Clinical Psychology dren. Psychology in the Schools, 25, 195—

202. Cohen,

‘ N. J., Sullivan,

C., & action aged Child

J., Minde,

Review, 5, 429-445. K., Novak,

Keens, S. (1983). Mother—child interin hyperactive and normal kindergartenchildren and the effect of treatment. Psychiatry and Human Development,

13, 213-224. Conners, C. K. (1969). A teacher rating scale for use in drug studies with children. American Journal of Psychiatry, 126, 884-888. Conners, C. K. (1970). Symptom patterns in hyperkinetic, neurotic, and normal children. Child Development,

41, 667-682.

Conners C. K. (1972). Stimulant drugs and cortical evoked responses in learning and behavior disorders in children. In W. L. Smith (Ed.),

Drugs,

development

and cerebral function

(pp. 179-199). Ft. Lauderdale: C. C. Thomas. Conners, C. K. (1980). Food additives and hyperactive children. New York: Plenum. Conners, C. K. (1989). Manual for Conners Rating Scales. N. Tonawanda, NY: MultiHealth Systems.

Conners, C. K., & Wells, K.C. (1985). ADD-H Self-Report Scale. Psychopharmacology Bul-

letin, 21, 321-323. Conners, C. K., & Wells, K. C. (1986). Hyperactivity in children: A neuropsychosocial approach. Beverly Hills, CA: Sage. Copeland, A. P., & Weissbrod, C. S. (1978). Behavioral correlates of the hyperactivity factor of the Conners Teacher Questionnaire. Journal of Abnormal Child Psychology, 6,

339-343. Dalby, J. T. (1985). Taxonomic separation of attention deficit disorders and developmental reading disorders. Contemporary Educational

Psychology, 10, 228-234.

Epstein, M. H., & Nieminen, G. S. (1983). Reliability of the Conners Abbreviated Teacher Rating Scale across raters and across time: Use with learning disabled students. School Psychology Review, 12, 337-339.

Furlong, M. J., & Fortman, J. B. (1984). Factor analysis of the abbreviated Conners Teacher Rating Scale: Implications for the assessment of hyperactivity. Psychology in the Schools, 21, 289-293. Gittelman, R. (1980). Indications for the use of stimulant treatment in learning disorders. Journal of the American Academy of Child Psychiatry, 19, 623-636. Glow, P. H., & Glow, R. A. (1979). Hyperkinetic impulse disorders: A developmental defect of motivation. Genetic Psychology Monographs, 100, 159-231. Glow,

R.

A.,

Glow,

P.

H.,

&

Rump,

E.

E. (1982). The stability of child behavior disorders: A one year test—retest study of Adelaide versions of the Conners Teacher and Parent Rating Scales. Journal of Abnormal Child Psychology, 10, 33-59. Goyette, C. H., Conners, C. K., & Ulrich, R. F. (1978). Normative data on revised Conners Parent and Teacher Rating Scales. Journal of Abnormal Child Psychology, 6, 22\236. Halperin, J. M., Wolf, L. E., Pascualvaca, D. M., Newcorm, J. H., Healey, J. M., O’Brien, J. D., Morganstein, A., & Young, J. G. (1988). Differential assessment of attention and impulsivity in children. Journal of the American Academy of Child & Adolescent Psychiatry, 27, 326-329. Homatidis, S., & Konstantareas, M. M. (1981).

575

576

CONNERS Assessment of hyperactivity: Isolating measures of high discriminant ability. Journal of Consulting and Clinical Psychology, 49, 533-

541.

in the Orient, 32, 120-128.

Horn, W. F., Conners, C. K., Wells, K. C., & Shaw, D. (1986). Use of the Abikoff classroom observation coding system on a children’s inpatient psychiatric unit. Journal of Psychopathology and Behavioral Assessment,

8, 9-23. Hunter, S. M., Parker, F. C., Williamson, G. D., Downey, A. M., Webber, L. S., & Berenson, G. S. (1985). Measurement assessment of the Type A coronary prone behavior pattern and hyperactivity/problem behaviors in children: Are they related? The Bogalusa Heart Study. Journal of Human Stress, 11, 177-183. Jacob, R., O’Leary, K. D., & Rosenblad, C. (1978). Formal and informal classroom settings: Effects on hyperactivity. Journal of Clinical Child Psychology, 6, 47-59. Kazdin, A. E., Esveldt-Dawson, K., & Loar, L. L. (1983). Correspondence of teacher ratings and direct observations of classroom behavior of psychiatric inpatient children. Journal of Abnormal Child Psychology, 11, 549-564. King, C., & Young, R. D. (1982). Attentional deficits with and without hyperactivity: Teacher and peer perceptions. Journal of Abnormal Child Psychology,

dren in Hong Kong: A factor-analytical study with Conners’ Teacher Rating Scale. Psychologia: An International Journal of Psychology

10, 483-495.

Klein, A. R., & Young, R. D. (1979). Hyperactive boys in their classroom: Assessment of teacher and peer perceptions, interactions and classroom behaviors. Journal of Abnormal Child Psychology, 7, 425-442. Koriath, U., Gualtieri, C. T., Van Bourgondien, M. E., Quade, D., & Werry, J. S. (1985). Construct validity of clinical diagnosis in pediatric psychiatry: Relationship amount measures. Journal of the American Academy of Child Psychiatry, 24, 429-436. Kuehne, C., Kehle, T. J., & McMahon, W. (1987). Differences’ between children with attention deficit disorder, children with specific learning disabilities, and normal children. Journal of School Psychology, 25, 161—

166. Lambert, N. M., Sandoval, J. H., & Sassone, D. M. (1978). Prevalence estimates of hyperac-

tivity in school children. Pediatric Annals, 7, 68-86. Leung, P. W., Luk, S. Li, & Lee, P: Ls (1989). Problem behaviour among special school chil-

Luk, S. L., & Leung, P. W. (1989). Conners’ teacher’s rating scale: A validity study in Hong Kong. Journal of Child Psychology and Psychiatry and Allied Disciplines, 30, 785-793. Luk, S. L., Leung, P. W., Lee, P. L., & Lieh, M. F. (1988). Teachers’ referral of children with mental health problems: A study of primary schools in Hong Kong. Psychology in the Schools, 25, 121-129. Margalit, M. (1983). Diagnostic application of the Conners Abbreviated Symptom Questionnaire. Journal of Clinical Child Psychology,

12, 355-357. Mattison, R. E., Humphrey, F. J., Kales, S. N., & Wallace, D. J. (1986). An objective evaluation of special class placement of elementary schoolboys with behavior problems. Journal of Abnormal Child Psychology, 14, 251-

262. Milich, R., & Fitzgerald, G. (1985). Validation of inattention/overactivity and aggression ratings with classroom observations. Journal of Consulting and Clinical Psychology, 53, 139-

140. Milich, R., Loney, J., & Landau, S. (1982). Independent dimensions of hyperactivity and aggression: A validation with playroom observation data. Journal of Abnormal Psychology,

9], 183-198. Milich, R., Roberts, M. A., Loney, J., & Caputo, J. (1980). Differentiating practice effects and statistical regression on the Conners Hyperkinesis Index. Journal of Abnormal Child Psychology, 8, 549-552. Newcorn, J. H., Halperin, J. M., Healey, J. M., O’Brien, J. D., Pascualvaca, D. M., Wolf, L. E., Morganstein, A., Sharma, V., & Young, J. G. (1989). Are ADDH and ADHD the same or different? Journal of the American Academy of Child and Adolescent Psychiatry, 28, 734—

738. National Institute of Mental Health. (1973). Psychopharmacology bulletin: Special Issue— Pharmacotherapy of children. Washington, DC: U. S. Government Printing Office. Oettinger, L., Majovski, L. V., & Gauch, R.

R. (1978). Coding A and Coding B on the WISC are not equivalent tasks. Perceptual and Motor Skills, 47, 987-991.

22 Omizo, M. M., Cubberly, W. E., Semands, S. G., & Omizo, S. A. (1986). The effects of biofeedback and relaxation training on memory tasks among hyperactive boys. Exceptional

Child, 33, 56-64. Paul, R., & Cohen, D. J. (1984). Outcomes of severe disorders of language acquisition. Journal of Autism and Developmental Disorders,

14, 405-421. Peter, D., Allan, J., & Horvath, A. (1983). Hyperactive children’s perceptions of teachers’ classroom behavior. Psychology in the Schools, 20, 234-240.

Plomin, R., & Foch, T. T. (1981). Hyperactivity and pediatrician diagnoses, parental ratings, specific cognitive abilities, and laboratory measures. Journal of Abnormal Child Psychology, 9, 55-64. Prinz, R. J., Connor, P. A., & Wilson, C. C. (1981). Hyperactive and aggressive behaviors in childhood: Intertwined dimensions. Journal of Abnormal Child Psychology, 9,

191-202. Prinz, F. J., & Loney, J. (1986). The hyperactive child grows up: Teachers’ descriptions and their predictors. Advances in Learning and Behavioral Disabilities, 5. Radosh, A., & Gittelman, R., (1981). The effect of appealing distractors on the performance of hyperactive children. Journal of Abnormal Child Psychology, 9, 179-189. Rapoport, J. L., Quinn, P. O., & Scribanu, N. (1974). Platelet serotonin of hyperactive school age boys. British Journal of Psychiatry,

125, 138-140. Raymer, R., & Poppen, R. (1985). Behavioral relaxation training with hyperactive children. Journal of Behavior Therapy & Experimental Psychiatry, 16, 309-316.

Reynolds, W. M., & Stark, K. D. (1986). Selfcontrol in children: A multimethod examination of treatment outcome measures. Journal of Abnormal Child Psychology, 14, 13-23. Rickard, K. M., & de Rael, C. W. (1987). The relationship between Type A behavior and hyperactivity in children as measured by the Conners Hyperactivity and MYTH-O scales. Social Behavior and Personality, 15, 207-214. Roberts, M. A., Milich, R., Loney, J., & Caputo, J. (1981). A multitrait-multimethod analysis of variance of teachers’ ratings of aggression, hyperactivity, and inattention. Journal of Ab‘normal Child Psychology, 9, 371-380.

CONNERS RATING SCALES

Rohrbeck, C. A., & Twentyman, C. T. (1986). Multimodal assessment of impulsiveness in abusing, neglecting, and nonmaltreating mothers and their preschool children. Journal of Consulting and Clinical Psychology, 54,

231-236. Rosenbaum, M., & Baker, E. (1984). Selfcontrol behavior in hyperactive and nonhyperactive children. Journal of Abnormal Child Psychology, 12, 303-317. Sandberg, S. T., Wieselberg, M., & Shaffer, D. (1980). Hyperkinetic and conduct problem children in a primary school population: Some epidemiological considerations. Journal of Child Psychology, Psychiatry and Allied Disciplines, 21, 293-311. Sandoval, J. (1981). Format effects in two teacher rating scales. Journal of Abnormal Child Psychology, 9, 203-218. Satin, M. S., Winsberg, B. G., Monetti, C. H., Sverd, J., & Foss, D. A. (1985). A general population screen for attention deficit disorder with hyperactivity. Journal of the American Academy of Child Psychiatry, 24, 756-764. Schaughency,

E. A.,

& Lahey,

B.

B. (1985).

Mothers’ and fathers’ perceptions of child deviance: Roles of child behavior, parental depression, and marital satisfaction. Journal of Consulting and Clinical Psychology, 53, 718—

23% Sigman, M., Cohen, S. E., Beckwith, L., & Topinka, C. (1987). Task persistence in 2-yearold preterm infants in relation to subsequent attentiveness and intelligence. Infant Behavior and Development, 10, 295-305. Simonds, J. F., & Aston, L. (1981). Relationship between minor physical anomalies, perinatal complications, and psychiatric diagnoses in children. Psychiatry Research, 4, 181-188. Stein, M. A., & O’Donnell, J. P. (1985). Classification of children’s behavior problems: Clinical and quantitative approaches. Journal of Abnormal Child Psychology, 13, 269-279. Steinkamp, M. W. (1980). Relationship between environmental distractions and task performance of hyperactive and normal children. Journal of Learning Disabilities, 13, 209-

214. Sullivan, M. A., & O’Leary, S. G. (1990). Maintenance following reward and cost token programs. Behavior Therapy, 21, 139-149. Taylor, E., & Sandberg, S. (1984). Hyperactive behavior in English schoolchildren: A ques-

577

578

CONNERS tionnaire survey. Journal of Abnormal Child Psychology, 12, 143-155. Taylor, E. A., Everitt, B., Thorley, G., Schachar, R., Rutter, M., & Wieselberg, M. (1986). Conduct disorder and hyperactivity: II. A cluster analytic approach to the identification of a behavioral syndrome. British Journal of Psychiatry, 149, 768-777. Therrien, R. W., & Fischer, J. (1979). Differences in the severity of disturbance of behaviors in children receiving inpatient and outpatient psychiatric treatment. Journal of Personality Assessment, 43, 276—280. Thorley, G. (1983). Data on the Conners Teacher Rating Scales from a British clinic population. Journal of Behavioral Assessment, 5, 1—

10.

Teacher Questionnaire-Norms and _ validity. Australian and New Zealand Journal of Psy-

chiatry, 10, 257-262. Werry, J. S., Sprague, R. L., & Cohen, N. M. (1975). Conners Teacher Rating Scale for use in drug studies with children: An empirical study. Journal of Abnormal Child Psychology, 3, 217-229.

Whalen, C. K., Henker, B., Collins, B. E., Finck, D., & Dotemoto, S. (1979). A social ecology of hyperactive boys: Medication effects in structured environments. Journal of Applied Behavior Analysis, 12, 65-81. Whalen, C. K., Henker, B., & Finck, D. (1981). Medication effects in the classroom: Three naturalistic indicators. Journal of Abnormal Child Psychology, 9, 419-433.

Laprade,

Wilson, J. M., & Kiessling, L. S. (1988). What

K. (1982). Factor analysis of the Conners Teacher Rating Scales based on a large normative sample. Journal of Consulting and Clini-

is measured by the Conners’ Teacher Behavior Rating Scale? Replication of factor analysis. Journal of Developmental and Behavioral Pe-

Trites,

R.

L.,

Blouin,

A.

G.,

&

cal Psychology, 50, 615-623.

Ullman, D. G., Egan, D., Fiedler, N., Jurencec, G., Pliske, R., Thompson, P., & Doherty, M. E. (1981). The many faces of hyperactivity: Similarities and differences in diagnostic policies. Journal of Consulting and Clinical Psychology, 49, 694-704. Ullmann, R. K., Sleator, E. K., & Sprague, R. L. (1985). A change in mind: The Conners Abbreviated Rating Scales reconsidered. Journal of Abnormal Child Psychology, 13, 553— 565. Weiss, G., & Hechtman, L. T. (1986). Hyperactive children grown up. New York: Guilford. Weissbiuth, M. (1984). Sleep duration, temperament, and Conners’ ratings of three-year-old children. Journal of Developmental and Behavioral Pediatrics, 5, 120-123. Weithorn, C. J. Kagen, E., & Marcus, M. (1984). The relationship of activity level ratings and cognitive impulsivity to task performance and academic achievement. Journal of Child Psychology and Psychiatry and Allied Disciplines, 25, 587—606.

Werry, J. S. & Hawthorne, D. (1976). Conners

diatrics, 9, 271—278. Wynne, M. E., & Brown, R. T. (1984).

Assess-

ment of high incidence learning disorders: Isolating measures with high discriminant ability. School Psychology Review, 13, 231-237. Yao, K. N., Solanto, M. V., & Wender, E. H. (1988). Prevalence of hyperactivity among newly immigrated Chinese-American children. Journal of Developmental and Behavioral Pediatrics, 9, 367—373. Zentall, S. S., & Barack, R. S. (1979). Rating scales for hyperactivity: Concurrent validity, reliability, and decisions to label for the Conners and Davids Abbreviated Scales. Journal of Abnormal Child Psychology, 7, 179-190. Zentall, S.S., Gohs, D. E. & Culatta, B. (1983). Language and activity of hyperactive and comparison children during listening tasks. Exceptional Children, 50, 255-266. Zentall, S. S., & Shaw, J. H. (1980). Effects of classroom noise on performance and activity of second grade hyperactive and control boys. Journal of Educational Psychology, 72, 830—

840.

PART IV

FUTURE DEVELOPMENTS

Af

x

bene,

gra

ys

arth of

*;

leche. 2 Pp

Det.

ven

A.

a gilh

foal Ad

2A

eee

(ch

TF

=

Sql VY

iD



ev J)

Me a?

.

oo

oe

lth

j

et,

‘==>

ee: ©

ate

a)

«

Seceany

Fads:

pal

2%

eae

oe, a

;

>

23

by

a

# Tabs (wh .

2.

i.

a!

mrs

re

he

A, Pos

he

Aenea

Ja

a”

Dear ay

2

en

oo

ee

Chapter 23 Future Directions in the Use of

Psychological Assessment for Treatment Planning, ancl Outcome Assessment: Predictions and Recommendations Kevin L. Moreland Fordham University

Raymond D. Fowler

L. Michael Honaker American Psychological Association In an authoritative review, Korchin and Schuldberg (1981) presented an ambivalent view of the relationship of test-based psychological assessment to treatment: A basic justification for assessment is that it provides information of value to the planning, execution, and evaluation of treatment. It seems self-evident that interventions are more rational, faster, and more effective if based on prior diagnosis of the problem. . . . However, it can be argued that not all knowledge is equally good or relevant and that clinical assessment may not provide the kind of information needed by therapists. Objective evidence is slim. (p. 1154)

A large body of opinion and some compelling case demonstrations support the utility of psychological assessment in treatment. Klopfer (1964) described psychotherapy without assessment as “the blind leading the blind” (p. 387). DeCourcy (1971) and Lambley (1974)

provided cautionary case illustrations of the dangers of doing psychotherapy in the absence of assessment. Affleck and Strider (1971) adduced evidence of considerable customer satis-

faction with psychological reports: About two-thirds of the requested items of information were seen as either providing new and significant information or as providing information which confirmed information previously suspected, but which was not well established. . . . It was found that 52% of the reports altered management in some manner, 24% had a minimal effect or confirmed current thinking, 22% had no effect, and 2% were felt to have erroneous or detrimental effect. (pp. 177-179)

Three contemporaneous articles reached far more pessimistic conclusions about the treatment utility of assessment. Cole and Magnussen (1966) concluded that “traditional diagnos-

tic procedures are only loosely related, if at all, to disposition and treatment” (p. 539). Breger (1968) concluded that his results raised “serious questions. . . about the clinical usefulness, logic, and validity of [assessment] practice. . . ” (p. 176). Hartlage, Freeman, Horine, and

Walton (1968) found that psychological reports were “evidently of little value in contributing toward any treatment decisions for the patient” (p. 483). More recently Hayes, Nelson, and Jarrett (1987), in their discussion of the treatment utility of assessment, rendered the scotch verdict that “clinical assessment has not yet proven its value in fostering favorable treatment outcomes” (p. 963).

581

582 - MORELAND, FOWLER, HONAKER What has led us to this confused state of affairs? Commenting on the empirical studies that they reviewed, Hayes et al. (1987) said, “To date, the role of clinical assessment in treatment

utility has been buried by conceptual confusion, poorly articulated methods, and inappropriate linkage to structural psychometric criteria. . . ” (p. 973). We believe these comments also apply, all too frequently, to the typical clinical practice discussed by Cole and Magnussen (1966), Breger (1968), and Hartlage et al. (1968). We fear that the optimal clinical practice portrayed in how-to writings (e.g., Blank, 1965; Cerney, 1978; DeCourcy, 1971; Lambley 1974; Lovitt, 1987), and evidently closely approximated in Affleck and Strider’s (1971) study, too often fails to mirror actual typical practice (cf. Reynolds, 1979; Sweeney, Clarkin, & Fitzgibbon, 1987; Wade & Baker, 1977). However, like Hayes et al. (1987), we believe

“t]he trends in the field all seem positive in these areas” (p. 973). The trends are part, because of the ever increasing conceptual, theoretical, and methodological tion that characterizes any scientific field. They are also positive, in part, because ing pressure for accountability that has come about independent of academic scientific progress. It is to that pressure for accountability that we now turn.

positive, in sophisticaof increasdesire for

Trends in Health Care The past 30 years have seen major economic and social changes in health care. From a fiscal standpoint, the annual cost of health care in the United States increased from $27 billion in 1960 to $458 billion in 1986, accounting for almost 11% of the gross national product (GNP), and far outpacing the rate of inflation (American Hospital Association, 1987; Gibson

& Waldo, 1982). Health-care costs were accounting for 14% of the GNP by 1992 (Adler, 1992). Government and employers are the primary payors of health-related services, financing 80% of all expenditures (Goldsmith, 1984). The remaining 20% of health-care expenditures comes out of consumers’ pockets. As a result, consumers, businesses, and government have become increasingly reluctant to pay for unlimited health care, and have expressed an expectation that medical care providers should be professionally and economically accountable for their decisions. From a societal viewpoint, on the other hand, health care increas-

ingly has been viewed as a basic right for all Americans (Callan & Yeager, 1991; Durenberger, 1989). Both of these trends have led the institutional payers to make vigorous efforts to contain these dramatic increases in their costs (Dorwat & Chartock, 1988).

TRENDS IN MENTAL HEALTH CARE Business-health coalitions have made the control of mental health costs one of their top agenda items for the coming decade (Kessler, 1986). Mental health costs account for 25% of the total health-care costs incurred by some companies (Sullivan, Flynn, & Lewin, 1987). Overall, mental health care is the fastest rising component of health-care costs, increasing at the rate of 27% a year in the late 1980s (Winslow,

1989).

MENTAL HEALTH CARE IN GENERAL HEALTH CARE A powerful case has been made for the potential of mental health care to increase the efficacy of physical health care (Adler, 1992; Tulkin & Frank, 1985). Up to 60% of all visits to

23.

FUTURE DIRECTIONS

583

general physicians reveal no physical pathology (Adler, 1992). This sets the stage for the socalled “medical offset effect,” which, by now, is a well-established phenomenon. VandenBos and DeLeon

(1988) summarized this literature, indicating that research consistently shows

that individuals in emotional distress who seek medical services tend to be higher users of all medical services before they receive psychological intervention. Indeed, they are twice as likely to visit their primary care physician as other people (Adler, 1992). The expenditures on mental health care for these patients are more’ than offset by the savings accrued through the consequent decline in the use of physical health-care services. Bryant Welch, the American Psychological Association’s (APA) executive director for professional affairs, recently told the Senate Finance Committee that 7 of the 10 leading causes of death and disabilities have significant psychological components (Adler, 1992). Mental health practitioners are developing circumscribed interventions to be used in treating the psychological elements of specific medical conditions. Such interventions include stress management, pain management, smoking cessation, weight control, medication and diet compliance, and recovery and adaptation to surgery and chronic illness (Adler, 1992; Austad & Berman, 1991).

Trends in Legislation HEALTH LEGISLATION The same desire to stem the rapid escalation of health-care costs, while guaranteeing all Americans access to quality medical care that led to the dramatic increase in managed health care, also has given rise to legislative efforts to accomplish those goals (Roemer, 1985; Tolchin, 1989). At least a dozen health-care reform bills are being discussed in Congress as

this chapter is being written in mid-1992 (Adler, 1992). These bills include serious proposals for national health insurance programs put forth by both Democratic and Republican legislators. How might these legislative proposals affect psychological assessment? It is impossible to know for certain, because the proposals do not reach that level of detail. However, it is possible to make some educated predictions by examining legislation that is under active consideration and regulations recently developed to implement related federal legislation. APA official Welch recently told a Senate Finance Committee hearing that Senator Lloyd Bentsen’s national health-care bill, “The Better Access of Affordable Health Care Act” (S. 1872), provides inadequate mental health-care coverage (Adler, 1992). Although we have no data, we fear that test-based psychological assessment may be one of the first aspects of mental health care (and probably health care in general) to be jettisoned in the face of inadequate levels of funding. Informal discussions with colleagues who are nationally recognized experts on managed care—settings in which costs are scrutinized perhaps most closely at this point—suggest that testing rarely is employed in those settings even now. The logic seems to go: “Most of our staff do not believe testing is very helpful in most cases” (cf. Berg, 1984, 1986; Garfield & Kurtz, 1976; Lewandowski & Sacuzzo,

1976; Smyth & Reznikoff,

1971). “Most of our staff believe they can find out what they need to know about patients via interviews. Interviews will be conducted, testing or no testing. Testing adds to our costs.

Ergo, why test?” Increasing attention to costs may lead other health-care providers to the same kind of thinking. Even more disquieting is the prospect that test-based psychological assessment, at least as

584

MORELAND,

FOWLER, HONAKER

we now know it, might be curtailed explicitly by regulations emanating from legislation. At this writing, the federal Social Security Administration (SSA) is poised to announce new rules for determining eligibility for mental disability benefits (Freiberg, 1992).! The proposed new rules emphasize medical data and what might be fairly characterized as an informal assessment of what Sundberg, Snowden, and Reynolds (1978) termed “personal competence . . . in life situations” (p. 179). The current rules refer to the Minnesota Multiphasic Personality Inventory (MMPI) as a well-standardized test and indicate that the Rorschach may be useful in establishing the existence of a mental disorder. Heretofore, testing has played a central role in determining mental disability. On the other hand, the proposed rules state that personality measures have limited applicability in the determination of mental disability. They comment that projective tests are of uncertain reliability and validity. To assessment professionals, disability determination is not the same thing as treatment planning and outcome measurement. However, the language of the proposed SSA rules strikes the present writers as a sweeping indictment of personality tests. If formally promulgated, the SSA rules might be hard for those charged with implementing national health-insurance programs to ignore. A less harsh, but still extremely problematic, outcome might be the reduction of assessment to the status of mere “testing” (cf. Matarazzo, 1990). This distinction is made in the controversial new Medicare fee schedule, due to go into effect in 1992. In that fee schedule,

test-based psychological assessment is reimbursed as a “technical service,” like a blood test or urinalysis, rather than as a “professional service” requiring expert knowledge to perform and interpret (DeAngelis,

1992; cf. Kiesler & Morton,

1988).

RELATED LEGISLATION Two recent pieces of legislation, dealing, in part, with the use of test-based psychological assessment in the world of work, are worth considering briefly for the insight they provide into the federal government’s thinking about testing. The Americans

with Disabilities

Act (ADA)

that went

into effect in July

1992 was

designed to decrease discrimination in the workplace. However, it agrees with the proposed SSA rules’ emphasis on personal competence (Youngstrom, 1992). Employment only may be denied if the individual cannot perform specific, core job functions without unreasonable accommodations by the employer. Among other provisions, the ADA explicitly forbids denying individuals employment simply because they suffer from a mental disability, including, for example, alcohol or drug abuse that is under treatment. Medical examinations that might reveal such a disability may be performed only after a tentative job offer has been made.’ Ironically, in view of the SSA’s proposed rules, at this writing it appears that data from instruments like the MMPI may be classified as “medical” for purposes of the ADA (Youngstrom, 1992). One of the provisions of the Civil Rights Act of 1991 “bars employers from adjusting the scores of, using different cutoff scores for, or otherwise altering the results of employmentrelated tests on the basis of race, color, religion, sex or national origin” (Biskupic, 1991). This provision was informed by two extensive studies of aptitude and ability tests by the 'The American Psychological Association is vigorously opposing implementation of these rules (DeAngelis, 1992; Freiberg, 1992).

"Employers have a legitimate interest in many medical conditions. One would not want to hire a person with heart problems for a job consisting primarily of heavy manual labor.

23

FUTURE DIRECTIONS

585

National Academy of Sciences (Hartigan & Wigdor, 1989; Wigdor & Garner, 1982). Although measures of personality and psychopathology appear not to have been considered in drafting the law, they are subject to its provisions. This presents a problem for the authors and publishers of most personality measures, because most are normed separately for men and women. We can only hope that a lawsuit alleging test-based discrimination does not cause similar strictures to be brought to bear in the clinical arena without due scientific

consideration.

Implications The major implication of all these trends for test-based psychological assessment in treatment planning and outcome assessment is the same: The heat is on in a way it never has been before. The heat is on from those who pay most medical bills—government and business— to reduce those bills; or at least to slow their growth. Test-based psychological assessment will continue to be viewed as useful to the degree that it is proved to help contain health-care costs. We use the example of alcoholism to illustrate what we believe to be the future of testbased psychological assessment. Alcoholism is a good exemplar, because of the scope and complexity of the problem. Alcohol abuse costs the United States $117 billion in 1983 via mechanisms as diverse as cirrhosis of the liver (the ninth leading cause of death in the United States), motor vehicle and other accidents, homicide, suicide, cancer, nutritional deficien-

cies, and fetal alcohol syndrome (Winett, King, & Altman, 1989). Its complexity is suggested by its responsiveness to treatments as diverse as social skills training and antidrinking medication in settings as diverse as self-help groups and inpatient units (cf. Hester & Miller, 1989). Of course,

we believe the observations we make about the assessment

of alcohol

abusers likely apply to the assessment of those suffering from most other clinical problems.

IMMEDIATE IMPLICATIONS FOR CLINICAL PRACTICE In what ways might test-based psychological assessment help reduce the cost of treatment for alcoholism? How about selecting the least intensive appropriate level of care for a given

patient? Inpatient treatment accounts for 70% of all mental health expenditures, even though there is no evidence that it is superior to less intensive, less expensive levels of care (Kiesler

& Sibulkin, 1987). The task, of course, is not simply choosing inpatient versus outpatient treatment, but rather choosing along a continuum of intervention from no formal intervention to residential care. All levels of treatment intensity are known to be efficacious in some cases of alcoholism (Miller, 1989) and some predictors of success at the various levels have been identified (see Table 23.1). However, evidence suggests that persons referred to a particular type of treatment program are most likely to be judged as needing the services of that program (Hansen & Emrick, 1983). This method of decision making is hardly credible, given the lack of evidence for the incremental efficacy of more intensive levels of treatment for alcoholism (given the current lack of patient/treatment matching) (Miller & Hester, 1986). Psychological testing undoubtedly can be useful in assigning alcohol abusers to the least

expensive appropriate level of care. Well-developed psychological tests are available to measure each of the constructs listed in Table 23.1. However,

does the typical clinical

assessment include well-developed measures of all these variables? Even discounting reli-

586

MORELAND, FOWLER, HONAKER TABLE 23.1

A Proposed Hierarchy of Levels of Care and Their Known Outcome Predictors for Alcohol Abuse

Outcome Predictors

Level of Care

ean

a

No formal intervention

Low alcohol dependence (ADS) High social stability (SS) (e.g., married employed)

Self-help groups

Low ADS High SS Authoritarian personality style Conforming Religious Not depressed

Brief intervention (psychoeducation) Outpatient group psychotherapy

Low ADS High SS

na

Any comorbid psychiatric disorder low in severity High conceptual level! High self-esteem! Outpatient individual psychotherapy

Low ADS High SS Any comorbid psychiatric disorder low in severity Depressed! Not antisocial

Intensive nonresidential day-treatment

?

Residential care in a hospital or specialized

treatment setting

Mid to high ADS

Mid to high SS

Any comorbid psychiatric disorder low to mid in severity Not antisocial

Note. From Improving Drug Abuse Research (Tables | and 2) by R. W. Pickens, C. G. Leukefeld, and C. R. Shuster (Eds.), 1991, Rockville, MD:

National Institute on Drug Abuse.

I These predictors may not generalize to all forms of psychotherapy.

giosity, which is felt by many to be an illegitimate area for clinical inquiry (Butcher & Tellegen, 1966), and social stability, which often is assessed (albeit probably suboptimally) by simply asking whether the patient is married and has a job, the answer is no. Amazingly,

the typical clinical assessment includes no well-developed measure of degree of alcohol dependence (Sweeney et al., 1987). Why is this so? We suggest three possible explanations, and all are probably correct in part. First, assigning levels of care typically has not been a focus of assessments of alcohol

abusers (Hansen & Emrick, 1983). Second, many firing-line clinical practitioners probably are unaware that level of alcohol dependence has been found to be useful in predicting response to various levels of treatment.

In particular, we fear that many clinicians do not

know that individuals with less severe dependence tend to fare worse in more intensive treatment (Miller & Hester, 1986). Two hard-nosed observers of the field feel that it tends to

be pervaded by belief in one of three myths about treatment (i.e., nothing works, one approach is superior, all treatments are equal), rather than knowledge of the empirical literature (Miller & Hester, 1986). Finally, someone involved in a patient’s care probably does evaluate

level of alcohol dependence via an unstructured clinical interview that leads to the assignast

23

FUTURE DIRECTIONS

ment of a DSM-III-R diagnosis of Alcohol Abuse or Alcohol Dependence. The latter may be (but in our experience infrequently is) further described as mild, moderate, or severe. We believe concerns about rising health-care costs will bring about changes in the assessment practices just illustrated. Specifically, there already is evidence that concern with assigning individuals to the least intensive appropriate level of care is on the rise (McClellan, 1985; Wisconsin State Medical Society, 1981). More generally, third-party payors appear to be increasingly demanding that assessors know the relevant nontesting research literature so that their assessments are focused on constructs—like those in Table 23.1—of demonstrated treatment relevance, rather than viewing their task, as Sundberg and Tyler (1962) suggested years ago, as formulating a working image or model of the person (Hymowitz & Sweeney, 1985). The third-party payors’ demands undoubtedly are fueled by the complaints of other mental health professionals that test findings are no longer relevant to their clinical needs and entail overly lengthy time delays (Berg, 1984; Lewandowski & Sacuzzo, 1976; Smyth & Reznikoff, 1971) and, perhaps, by the decline in requests for traditional psychological assessment (Berg, 1986; Garfield & Kurtz, 1976). Clearly, the eminently sensible assertion that interventions are faster (and therefore cheaper) and more effective if based on prior assessment of the problem is no longer good enough. Third-party payors also appear to be increasingly demanding that proved instruments be used to measure treatment-relevant constructs. We have colleagues who reported that payment was denied for assessment with one of psychology’s most popular tests (discussed in this book), because, the payors claimed, use of the instrument has not been shown to influence treatment. Test selection based on “personal clinical experience with a test . . . [rather] than pragmatic or psychometric experience” (Wade & Baker, 1977, p.) and “test usage [that] is not primarily a function of test quality” (Reynolds, 1979, p.) certainly must go the way of the dinosaur. Clearly, we believe that knowledge of data like those in Table 23.1 and selection of tests because they have proved utility in measuring such treatment-relevant constructs will be trends in the clinical practice of the immediate future. Note that these same comments generalize straightforwardly to the succeeding stages of the assessment process, the first of which is the selection of the specific treatment(s) the patient is to receive. Table 23.2 illustrates the complexity of this problem when it comes to the treatment of alcoholism (Miller, 1989). Are drinking-focused interventions needed? Are interventions

dealing with other alcohol-related life problems needed (Bedi, 1987)? Are both types of interventions needed? The assessor’s task is not nearly completed at this point. For example,

suppose an individual is deemed most appropriate for outpatient individual psychotherapy according to the criteria in Table 23.1, including the presence of depression. In that case,

TABLE

23.2

Interventions Known to Be Beneficial in the Treatment of Alcohol Abuse

Drinking-Focused Interventions

Antidrinking medication Aversion therapy Self-control training

Interventions for Other Problems

Marital therapy Family therapy Social skills training Stress/anxiety management Psychotropic medication Cognitive rehabilitation

From Handbook of Alcoholism Treatment Approaches: Effective Alternatives (Fig. 17.4, p. 268) by R. K. Hester and W. R. Miller (Eds.), 1989, Needham Heights, MA: Allyn & Bacon. Copyright © 1989 by Allyn & Bacon. Adapted and modified by permission.

587

588

MORELAND,

FOWLER, HONAKER TABLE 23.3 Selective Patient Variables for Psychotherapies for Depression

Psychodynamic

Chronic sense of emptiness

Cognitive

Interpersonal

Distorted thoughts about self,

Recent, focused dispute

world, future

Chronic underestimation of self-worth

Logical thinking

Social or communications problems

Childhood separation orloss

Real inadequacies

Recent life change

Conflicts in past relationships Capacity for insight

| Low to moderate self-direction High self-control

Abnormal grief High to moderate selfdirection

Ability to modulate regression Available support network Access to dreams and fantasy High self-direction Social stability Note. From ”Toward a Clinical Model of Psychotherapy for Depression, II: An Integrative and Selective Treatment Approach” by T. B. Karasu, 1990, American Journal of Psychiatry, 147 , Table 3, p. 275. Copyright ©1990 by the American Psychiatric Association. Adapted and modified by permission.

practitioners need to refer to criteria like those in Table 23.3 to recommend one of the several psychotherapies

demonstrated

to alleviate depression

modalities have been selected, what of dosage? How depressed person need (Howard,

Kopta, Lebow,

(Karasu,

1990).

Once

intervention

many sessions of therapy does a

& Orlinsky,

1982)? Finally, are special

efforts at relapse prevention likely to bear fruit once the drinking behavior is under control and therapies for related problems have taken hold (Annis & Davis, 1988)?

Although we can find no data on this point, we suspect that test-based outcome assessment is not typical of current clinical practice. We also suspect that financial pressures may change this. By this point, many practitioners probably have had an honest disagreement with a well-intentioned, but hard-nosed third-party payor about a client’s need to continue treatment. Test-based psychological assessment of the sort described earlier offers an immediate means of short-circuiting such disputes by reducing the subjectivity that undoubtedly fuels them—a cost-reducing measure. In fairness to the clinical practitioner, we are by no means asserting that the changes so far suggested will produce an optimal state of affairs. Careful examination of the issues raised previously sets a large agenda of needed basic research and application development.

Implications for Basic Research TREATMENT PLANNING The studies relied on in compiling Table 23.1 are less than completely helpful for several reasons. Most of those data were generated in studies called “post hoc identification of dimensions” by Hayes et al. (1987).3 In that type of study, a large number of persons is 3Most studies bearing on the usefulness of test-based psychological assessment are still of this sort; see, for example, the studies cited in Butcher’s (1990) recent text on using the MMPI-2 in Psychological Treatment. %

a

23

FUTURE DIRECTIONS

administered a treatment and, at the conclusion of treatment, a search is conducted for those

pretreatment measures that help predict who responded to the treatment. Hayes et al. (1987) pointed out several problems with such studies, and proposed solutions that are described in the context of Table 23.1. The studies that informed Table 23.1 examined the efficacy of the various levels of care one level at a time. In other words, we know that individuals with low alcohol dependence

and high social stability tend to improve with no formal intervention and with brief intervention, but we do not know who is likely to benefit most from one rather than the other. Obviously, comparative research is needed. The data in Table 23.1 are also nonparametric. What, precisely, is low alcohol dependence? Perhaps 75% of those with initial scores below 8 on the Alcohol Dependence Scale (ADS;

Skinner,

1984) can achieve scores

of 0 a year later with no formal intervention,

whereas 75% of those scoring from 8 to 13 will need brief intervention to reach the same success criterion. At present, we do not know (Heather, 1989). Obviously parametric comparative research is needed. It has been suggested that, owing to the poor prognosis associated with low social stability (e.g., being single), running a dating service might be an excellent way to treat alcoholism (McLellan & Alterman, 1991). The serious point underlying this tongue-in-cheek suggestion is that predictors like those in Table 23.1 are often of no further help—once treatment level is determined—in deciding how to help a patient. All too often, they are variables discovered post hoc that have no further differential therapeutic implications (e.g., level of alcohol dependence), variables we do not know how to change (e.g., authoritarianism), variables that

may not be related causally to alcohol abuse (e.g., conformity), or variables that we should not try to change (e.g., religiosity). This need not be the case. Careful theorizing should lead to a priori hypotheses involving measures that have useful implications beyond assigning patients to treatment levels. Self-efficacy comes immediately to mind as such a variable (Bandura, 1989). It is sensible to hypothesize that individuals high in self-efficacy would do well without formal treatment or in low-intensity treatments like brief intervention. On the other hand, such persons might chafe in some of the self-help groups like Alcoholics Anonymous, which are predicated on the admission of a lack of self-efficacy, or in inpatient treatment where a lack of self-efficacy is implied. On the other hand, those low in self-efficacy probably would fare poorly without formal intervention of some sort and might do particularly well in

inpatient treatment. Well-validated instruments for the assessment of alcohol-related selfefficacy are available

(e.g., Annis

& Graham,

1988),

and there is some

evidence

that

interventions aimed at the development of self-efficacy are effective in preventing relapse among those treated for alcoholism (Annis & Davis, 1988). Obviously theory-driven parametric comparative research is needed. As usual in psychological research, the predictors in Table 23.1 were developed exclusively using group methods.

Hayes et al. (1987) pointed out that, in such studies, correla-

tions found between improvement and pretreatment variables mix variability owing to treatment, extraneous variables, and measurement inconsistencies. Hayes and Leonhard (1991)

have suggested using individual time-series designs to overcome this difficulty. Research on the use of test-based psychological assessment to plan different intensities of treatment for alcoholism should be theory-driven, parametric, and comparative, and should employ exper-

imental designs that maximize the ability to detect valid predictors of responsiveness to different levels of treatment. As complicated as the successive iterations of the preceding sentence have become, we

only have specified the conditions necessary to demonstrate that test-based psychological assessment can be useful in treatment planning. We have yet to specify the conditions

589

590

MORELAND,

FOWLER, HONAKER

necessary to show that it is useful in treatment planning. Hayes et al. (1987) described two research designs for demonstrating the latter. The simplest is what they referred to as the “manipulated assessment” study, wherein test data would be used to assign treatment level for some patients, but not for others. Alternatively, one can manipulate use by assigning some patients to treatment levels based on their test results, while assigning others to treatment levels other than those indicated by the test(s). These basic designs can be elaborated to permit comparisons of the relative usefulness of different assessment procedures. Taking us full circle to the question we began this section with (“In what ways might testbased psychological assessment. help reduce the cost of treatment for alcoholism?”), testing that is demonstrably useful (i.e., valid) in treatment planning must also be demonstrably cost-effective. That is, research on the use of test-based psychological assessment in treatment planning should consider formally the costs and benefits of such test use (Newman & Sorensen, 1985). Researchers need to determine the cost of a particular assessment procedure using a formula like the following: Cost = sum {[(salary/hour + fringe benefits of personnel) < hours personnel involved] + costs of materials used + allocated overhead costs} The cost of testing then must be compared to the cost of not testing. There are two elements to this comparison. First, one must determine whether it is worthwhile to test at all. In practice, this means examining the cost-effectiveness of different assessment procedures. Remember the comments of the hypothetical cost-conscious administrator: “Most of our staff believe they can find out what they need to know about patients via interviews. Interviews will be conducted, testing or no testing. Testing adds to our costs. Ergo, why test?” Consider a realistic scenario: it does not take a refined cost accounting to decide it will not be costeffective to administer and score a social stability questionnaire (see Table 23.1) unless it is substantially more valid than glancing at the patient’s intake sheet to see if he or she is married and employed. Researchers also need to balance the costs of even the most valid, cost-effective evalua-

tions against the cost of errors that would occur if test-based psychological assessment were not employed. For example, if based on present evidence (Miller & Hester, 1986), a managed-care group decides not to fund day or inpatient treatment, it is not likely to find it cost-effective to employ the current typical clinical battery of a WAIS-R, Bender, MMPI, and projective techniques (Sweeney et al., 1987) to decide among the various less intensive treatment modalities listed in Table 23.1. By our calculations, the fee for the test battery

would be roughly $750 (Reimbursement Survey Results, 1991; Willcockson cited in DeAngelis, 1992). Because the median outpatient is seen for five or six individual sessions (Taube, Burns, & Kessler, 1984), prescribing the most expensive outpatient treatment for all clients would only cost about as much as the test battery.

So why administer the test battery? Because there is a linear relationship between number of psychotherapy sessions and amount of improvement (Howard, Kopta, Krause, & Orlinsky, 1986), the managed-care group would be better off foregoing the test battery and using the $750 to fund more therapy sessions for patients who would take advantage of them. The foregoing methodological points made in the context of choosing level of treatment

intensity generalize straightforwardly to the succeeding stages of the assessment process. In summary, the optimal study of the usefulness of test-based psychological assessment in

treatment planning should: (a) be theory-driven, (b) be parametric, (c) compare different treatments, (d) employ individualized designs, (e) compare different assessment approaches, and (f) assess cost-effectiveness.

23 FUTURE DIRECTIONS

591

OUTCOME ASSESSMENT Up to this point in the chapter, we have taken outcome assessment for granted, speaking blithely of “amount of improvement” and the like. Now we need to consider formally the implications of the push for cost-effectiveness for the future of outcome assessment (Linden & Wen, 1990). An immediately obvious question is what, exactly, do we mean by outcome? What is the criterion (are the criteria?) by which we should judge treatment to have been successful or unsuccessful? Traditionally, a statistically significant reduction in symptoms from the beginning to the end of treatment has been the criterion of success. Recently, this definition of success has come under attack on several fronts. Practitioners who are perfectly comfortable accepting symptom reduction as a reasonable criterion of success have long been uncomfortable accepting statistically significant reduction as a reasonable criterion of success. Kazdin (1977) was the first to tackle this long-noted problem. He proposed that interventions be judged successful to the degree to which they produced positive changes perceptible to significant others in patients’ everyday lives. Others have suggested elimination of the presenting problem or return to normative levels of functioning as appropriate indexes of the clinical significance of changes (Hugdahl & Ost, 1981; Jacobson, Follette, & Revenstorf,

1984). Jacobson et al. (1984) proposed three criteria, any

one of which might be used to operationalize the “return to normative levels” criterion: (a) change of more than two standard deviations (in the functional direction) resulting from treatment; (b) posttreatment scores not more than two standard deviations from the functional population mean; and (c) the client is more likely to be a member of the functional distribution of scores than of the dysfunctional distribution. Jacobson et al. (1984) also provided an index to ensure that observed changes indexed by their third method were reliable (corrected in Jacobson & Revenstorf, 1988). The article by Jacobson et al. (1984) generated a flurry of interest in the topic of clinical significance on the part of methodologists. Alternatives to the methods proposed by Jacobson and his colleagues have been developed (Kendall & Grove, 1988), and the methods have been applied (Alden, 1989). Unfortunately,

Kazdin (1991) recently commented—almost 15 years after his initial work—that “relatively few psychotherapy

studies

[and, we feel safe in adding,

studies of other mental

health

treatments] incorporate any of the measures designed to evaluate the clinical significance of change” (p. 305). We find this to be astonishing. To be sure, scholars have discussed many scientific and pragmatic problems with the concept of clinical significance, such as the definitions of functional and dysfunctional (Saunders, Howard, & Newman, 1988). However, no one to our knowledge has advocated statistical significance as the sine qua non of treatment success. Indeed, despite its near-universal continued use, null hypothesis testing has been condemned widely as a bankrupt epistemology for many years (Meehl, 1978). As scientists, we have no desire to foreclose honest scientific debate. However, with regard to this issue, it is our belief that professionals need to settle on a criterion for successful treatment that we recommend for policy-making purposes before legislators, insurance companies,

and other bodies that control health-care dollars do so for us. We

suggest that the third of the Revenstorf et al. (1984) criteria be considered acceptable for this purpose, subject to revision as theory and data warrant. (We strongly suggest that treatment

outcome studies not be published unless they address the clinical significance of changes owing to treatment

by some

defensible

method

[Jacobson,

1988].) Thus,

the following

formula should be used to establish the cutoff point to determine whether treatment has

moved a patient into the functional distribution:

_ (Sf)(Md) + (Sd)(Mf) R

Sf +°Sd

592

MORELAND, FOWLER, HONAKER

.

Sf is the standard deviation of the functional sample, Sd is the standard deviation of the

dysfunctional sample, MF is the mean of the functional sample, and Md is the mean of the dysfunctional population. In addition, the reliable change index should be calculated to ensure that the change is greater than might reasonably be expected owing to measurement error: RC

a. Pri

bs

SEc

Pr is the pretreatment score, Ps is the posttreatment score, and SEc is the standard error of the

change score. When RC exceeds 1.96, it is unlikely (p < .05) that the magnitude of change would be an artifact of an unreliable measuring instrument. If we can settle on a return to normative levels of functioning as a generally acceptable criterion of successful treatment, the next question becomes “What kind of normative func-

tioning?” Most outcome studies focus on the symptom(s) that caused an individual to enter treatment. In the assessment of treatments for alcoholism, one might compare the posttreatment scores of a treated group to a normative sample using a measure of drinking behavior. Obviously everyone would agree that remission of the primary symptom(s) leading to treatment is a necessary condition for successful treatment. However, it should be clear from our discussion of treatment planning that it is frequently not a sufficient condition for labeling treatment ’successful” (see Table 23.2). For example, if alcohol abuse has resulted in neuropsychological deficits (Parson, Butters, & Nathan, 1987), the efficacy of the cognitive rehabilitation component of the treatment program obviously would need to be evaluated (Hansen,

1980; Loberg,

1980). However,

even when treatment only has focused on drinking behavior per se, a multivariate assessment of outcome is desirable. Pattison (1966) noted that abstinence, at least in the case of alcohol abuse, is not correlated necessarily with improved life functioning. Indeed, in some cases, desired drinking outcomes may be associated with decrements in more general life functioning (Pattison, 1976). Another critical point for measuring outcome is the realization that treatment is only one environmental event that interacts with the patient and other elements of his or her environment.

Ogborne

(1980),

in particular,

noted that events

outside

of

treatment (e.g., change in marital status, loss of job) are likely to affect outcome. Cronbach (1982) succinctly summarized all these points, noting that “a treatment effect results from the interaction of population, treatment, and setting; therefore, the quest for an effect ‘free and clear’ of other effects is unrealistic” (p. 32). We propose that at least three broad classes of measures be added routinely to the outcome measures dictated by the specific nature of the treatment program: personal competence, well-being, and environmental measures. The first class of proposed additional outcome measures is implied in SSA’s proposed new rules for determining eligibility for mental disability benefits

(Freiberg,

1992),

in the ADA

(Youngstrom,

1992), and in Pattison’s

(1966, 1976) discussion of outcomes of treatment for alcoholism. Both the federal initiatives and the alcoholism research emphasize the importance of measuring personal competence in adapting to everyday life situations. What Pattison (1966, 1976) referred to when he spoke of improved life functioning was things like the ability to obtain and hold a job. The ability to abstain from alcohol use does not imply directly the initiative needed to search for a job, the

writing skills required to compile a resume, the social skills necessary to favorably impress an interviewer, the reliability needed to get to the job on time every working day, or the taskrelated skills necessary to perform well on the job. Of course it is also important to assess an individual’s ability to manage the tasks of daily living important in their nonwork life. Someone who has recovered from an episode of schizophrenic disorder, in the sense of no

23

FUTURE DIRECTIONS

longer displaying hallucinations and delusions, still may have difficulty renting an apartment, getting along with neighbors, purchasing necessities, and paying bills. Obviously measures of personal competence are especially important in assessing the outcome of efforts to deal with arguably our most expensive mental health problem: treatment of those labeled “chronically mentally ill” (Austad & Berman,

1991; Kiesler, Simpkins, & Morton,

1991). How-

ever, others—and we join them—have advocated the use of such measures to assess the outcome of medical and psychosocial interventions with a variety of other populations (Matarazzo, 1992; Sundberg et al., 1978; Tupper & Cicerone, 1990; Weissman, 1975). In a recent Annual Review of Psychology chapter, Rodin and Salovey (1989) said: “[T]he absence of disease does not necessarily equal good health. In both psychology and medicine, the major focus of research and theory has been on the abnormal; states of normality have been defined as the absence of pathology” (p. 563). In other words, answering “no” to “Do you feel sad and blue?” is a far cry from saying “yes” when asked “Do you feel terrific?” Rodin and Salovey (1989) recommended asking more of the latter type of questions. Someone concerned with containing health-care costs might ask, “Why should we be concerned with going beyond curing pathology to promoting ‘good health’? Isn’t ridding a patient of pathology a sufficient—and sufficiently expensive—goal of health care?” Rodin and Salovey (1989) went on to address those questions: “Studies . . . of individuals’ subjective perceptions of their own health . . . suggest that people’s sense of their own health is not only a reflection of their psychological and physical well-being but also a predictor of subsequent physical health” (pp. 563-564; italics added). Data also suggest that high subjective well-being may help insulate people from future psychological difficulties (Taylor, 1989). We have little doubt that promoting a sense of subjective well-being, either as part of a tertiary care program such as psychotherapy or as part of a primary prevention program, will prove cost-effective, because it will reduce the need for subsequent tertiary care (Kaplan, 1984). Thus, we recommend that future outcome

studies include measures of subjective well-being along the dimensions in Table 23.4 that are appropriate for a given case, in addition to the symptom measures ordinarily employed (Schlosser,

1990).4

Measures of the client’s environment ordinarily will not be useful as outcome measures per se. Rather, they will be useful as a means to gauge happy accidents (e.g., marrying a wonderful spouse), uncontrollable tragedies (e.g., the death of a wonderful spouse), and other extra-treatment factors that help or hinder treatment. The collection and proper inter-

pretation of such data will be particularly critical once treatment has ended. Should we deem treatment a failure because a man relapses after a year of abstinence from drinking during which his wife died, his children contested their mother’s will, he lost his job, and he was

diagnosed as having cancer? Probably not. Thus, we feel that it is important that studies of treatment efficacy include assessments of the general life context in which treatment and recovery are taking place. In the absence of such measures, there is a real danger that costcutters will deem useful treatment(s) ineffective. Although so far they have been used in investigations of mental health and medical treatments only infrequently, a variety of wellconstructed measures of an individual psychosocial milieu are available (Moos, 1982). Once one has compiled measures of symptoms, personal competence, well-being, and the environment, when does one administer them? There seems to be general agreement that the

4Measures of well-being also should prove useful in treatment planning. They should be useful in identifying

high-risk populations where early intervention may prevent deterioration of physical or mental health. They also should be helpful should tertiary care become necessary. They should help identify strengths that can be leveraged in a patient’s treatment (Frisch et al., 1992).

593

594

MORELAND, FOWLER, HONAKER TABLE 23.4 Sources of Subjective Well-Being

Social Participation Family life Marriage Friendships Surroundings Housing Neighborhood Community Life Circumstances Education Finances

Health Positive life events Activities Job Leisure activities

Dispositions Extraversion Sources: Bradburn (1969); Campbell, Converse, and Rodgers (1976); Frisch, Cornell, Villanueva, and Retzlaff (1992); Headey, Holstrom, and Wearing (1984); Schlosser (1990); and Warr, Barter, and Brownbridge (1983).

timing of outcome assessment is a critical variable in treatment studies, but, beyond (imme-

diate) posttreatment assessment, there appears to be no general agreement about when outcome measures should be gathered (Kazdin, 1991). Although there are no data on this point, we believe that the collection of outcome data has been driven by two factors: one practical and one theoretical. First, common experience suggests that the time and effort involved in manually collecting outcome data have motivated even the most well-funded and dedicated investigators to collect those data only a handful of times through the course of a study. Second, the conventional wisdom about some therapies is that it is unrealistic to expect them to have much of a lasting impact in the short run—-say, less than a year—so it makes no sense to measure outcome sooner. We believe the first factor is no longer important and the second is an assumption overdue for examination (Howard

et al., 1986).

In general, we believe outcome data or clinical change data should be collected far more frequently through the course of treatment and after the conclusion of treatment than has heretofore been the case. McCullough, Farrell, and Longabaugh (1986) described a Computerized Assessment System for Psychotherapy Evaluation and Research (CASPER) they have been developing since 1982 to automate much of such data collection. CASPER is a microcomputer-based system that includes, among other features, an initial assessment module and modules for assessing clinical change after each therapy session, at termination, and on follow-up. As suggested by the subtitle of the article by McCullough et al. (1986), “A Potential Tool for Bridging the Scientist—Practitioner Gap,” CASPER is being developed with careful attention to practical as well as scientific concerns. Thus, it takes only 2 or 3

minutes for the patient and therapist to complete the presenting symptom-focused clinical

change ratings they are asked to make after each treatment session. These objectively recorded and evaluated postsession data can be used to determine when a more comprehensive CASPER outcome assessment should be undertaken, which may, in turn, suggest that therapy be terminated. Use of CASPER or a similar system to continuously monitor progress

23

FUTURE DIRECTIONS

595

in therapy should permit researchers and clinicians to determine (in the scientific, but individualized way we have advocated) within a few sessions’ error how much treatment is enough. Moreover, such a system will facilitate the development of statistical formulae to help clinicians make these decisions more accurately and with reduced expenditures of their time and effort. Kleinmuntz (1990) indicated that, 35 years after Meehl’s (1956) famous call for “a good cookbook” (of statistical formulas), clinicians still must use their heads rather than formulas,

because

formulas

are usually unavailable

and their usefulness

cannot be

evaluated even when they are available. CASPER would change that. CASPER’s developers are extraordinarily careful. Even after 10 years of research and development, they are not yet completely satisfied with their system and have not released it for general research or clinical use.

Implications for Applied Development As expensive a public health problem as mental illness is, the most accurate test-based psychological assessment will not help reduce costs if the assessment itself costs too much, Current technology for the delivery of test-based psychological assessment—the test-based cure embodied in the comprehensive, staged assessment described earlier (see “Immediate Implications for Clinical Practice’”)—could turn out to be worse than the current disease. To our knowledge, no test currently available or under development assesses all the predictors listed in Table 23.1, nor all those in Table 23.3, nor those we could have listed in

Table 23.2. Hence, implementing our suggested assessment strategy probably would mean administering more tests than comprise the typical clinical battery (Sweeney et al., 1987). Even if the instruments were largely self-administered paper-and pencil questionnaires, the time, effort, and expense of using them might exceed that of the typical clinical battery: multiple scientific sources (e.g., the Mental Measurements Yearbook) would need to be perused to identify the best tests, many test publishers’ catalogs would need to be consulted, several purchase orders would have to be completed, many checks would have to be cut, and

a variety of test materials stored, not to mention the activities involved in actually using the tests such as scoring and interpretation (Reichelt, 1984). Test developers are beginning to recognize the need to develop measures that go beyond symptoms to include other psychological variables that are important in treatment planning, such as the patient’s self-reported motivation for and amenability to psychological treatment (see Archer’s and Greene’s chapters in this volume) and perceived social support (see Morey, this volume). This trend should accelerate. ;

The comprehensive assessment of a disorder like alcoholism, whose potential complexity is amply illustrated in Tables 23.1—23.3, cannot be accomplished efficiently with traditional tests, no matter how well constructed. There are simply too many items that would have to be

presented in fixed-format tests. What is needed is a delivery system that presents all relevant items in what is known as “computer-adaptive” fashion. In this approach, the computer

selects items based on the examinee’s responses to previous items. For example, if an alcoholic woman indicates that she is married, the computer might ask her a series of questions about her marriage to determine whether her husband is a source of support to be enlisted in the treatment program or contributes to the patient’s abuse in a way that must be addressed for treatment to be successful. Naturally, such questions would not be addressed to patients reporting no significant other in their life. There are commercially published nonpsychometric tools such as patient-completed psychosocial history programs (Giannetti,

596

MORELAND, FOWLER, HONAKER 1987) and clinician-completed structured interview programs (First, Gibbon, Williams,

&

Spitzer, 1992) that function in this way. Researchers are working to optimize administration of traditional, fixed-format questionnaires like the MMPI-2 in this fashion (Ben-Porath, Slutske, & Butcher, 1989; Roper, Ben-Porath, & Butcher, 1991). This latter endeavor is

complex because the succession of items is based not on rational considerations as in our example, but on statistical psychometric models. The most sophisticated of these models are based on Item Response Theory (IRT). The beauty of IRT models is that they have the potential to allow one to measure a construct with twice the precision of a fixed-format test, while using only half as many items as the fixed-format test (Weiss & Vale, 1987). The computer-adaptive approach needs to be extended across different types of assessment tools (e.g., psychosocial histories, personality tests) to streamline the assessment process as much as possible. For example, after determining, on the basis of questions whose answers

are taken at face value, that our hypothetical alcoholic patient also has been hospitalized for what she understood to be schizophrenia, the computer might employ videodisk technology to administer the Holtzman Inkblots (Holtzman, Thorpe, Swartz, & Herron, 1961) computer-

adaptively according to an IRT model until evidence of a thought disorder is obtained or it is clear that thought disorder will not be manifest on the Holtzman. Increased availability of computer-adaptive assessment systems not only will spare the weary patient, but also will free up more of the clinician’s time for tasks only a human can perform (thereby reducing costs).

Obviously, another way to reduce costs is to use human skills optimally. In an authoritative discussion of national mental health policy, Kiesler et al. (1991) stated: It is unrealistic to expect the supply of doctoral mental health professionals to meet more than a fraction of the mental health needs in this country. A top-down analysis shifts attention to making the best possible use of specialists, utilizing providers and systems (e.g., families, social networks, volunteers, self-help groups) [and] maximizing the mental health benefits of coexisting systems (e.g., health, education, and welfare). . . . (p. 95).

Combined with the increasing emphasis on cost-effectiveness, one implication of this is that nonmental health-care professionals such as family medical practitioners, nurse practitioners, and others increasingly will become de jour purveyors of mental health-care services (Austad & Berman, 1991). (That such professionals have long been de facto providers of mental health care is well known.) Thus, special efforts should be made to make sure that our

assessment results are comprehensible and useful to professionals whose primary training is not in mental health services. Meyer, Fink, and Carey (1988) provided a starting point for this effort in their informative article on communication problems between nonpsychiatric physicians and psychologists.

Use of Tests as Part of Treatment Proper Among the many telling points they made, Hayes et al. (1987) noted that “assessment is often not integrated into the therapy process” (p. 964). We would go further and substitute “usually” or “almost always” for “often.”> We feel this is quite unfortunate. The recent literature is replete with anecdotal reports of the value of incorporating the discussion of test 5For excellent discussions of the reasons for this state of affairs and detailed arguments refuting them, see Hayes et al. (1987) and Finn and Tonsager (1992).

23

FUTURE DIRECTIONS

597

results into the treatment process per se (Finn & Tonsager, 1992). This should come as no surprise. For example, because we know that some persons with alcohol problems get better without formal treatment, whereas others do well with self-help materials, can there be great

doubt that there is a group that would benefit from test feedback, perhaps to guide their selfhelp efforts (formal or informal)? There seems no reason not to believe that test feedback would be a useful therapeutic ingredient in more traditional treatment programs. Finn and Butcher (1991) listed the putative benefits of incorporating test feedback into the therapeutic process as including (a) an increase in self-esteem, (b) increased hope, (c) decreased symptomatology, (d) reduced feelings of isolation, (e) greater self-awareness and understanding, and (f) increased motivation for treatment. In a recent study of MMPI-2based feedback, Finn and Tonsager (1992) provided empirical verification of benefits a—c (they did not study d—f). We have little doubt that future empirical stycies will prove Finn and Butcher (1991) right about benefits d—f as well. We have raised this issue at this point in the chapter, because it does not fit neatly elsewhere. We believe that the rationale for the therapeutic value of test feedback per se is so strong that it should occur routinely. Of course, we recognize that this belief is in need of far greater efforts at empirical verification. We also realize that conceptual frameworks for providing test feedback are either absent or quite rudimentary for most tests. However, the major reason for discussing this issue now is to give it the emphasis we believe it deserves. If cost-effectiveness is the driving force of mental health practice in the 21st century, there is no surer way to justify testing than to go straight to the bottom line: does test feedback have a direct positive impact on the client and is it ultimately cheaper than simply forging ahead with therapy in the absence of testing? As Hayes et al. (1987) trenchantly noted, “To take this point to its logical extreme, it is useful to recognize that there may even be times when an assessment process could have treatment utility without the assessment report having any reliability or validity whatsoever” (p. 972).

Later in the 21st Century As this chapter was being prepared, Matarazzo (1992) published a prescient article entitled “Psychological Testing and Assessment in the 21st Century.” We are pleased to note that Matarazzo anticipated several trends we also foresee (e.g., increasing emphasis on personal competence and quality of life). Matarazzo also predicted increased use of biological measures of intelligence and other aspects of brain functioning. We agree with him in that respect, as well. We feel that if anything—although forecasting the future is a hazardous

enterprise at best—Matarazzo probably was not bold enough. Granted, the 21st century is less than a decade away. However, most of the positron emission tomography) exist now. For Schlosser (1991) be our oracle. He envisioned a Schlosser (1991) envisioned a future in which

techniques discussed by Matarazzo (e.g., purposes of closing this chapter, we let future a bit more distant than Matarazzo’s. computers present test takers with stimuli

ranging from today’s verbal items to moving projective stimuli to synthesized smells. Ultimately, much assessment in the latter half of the 21st century may involve the use of “virtual

reality.” Virtual reality refers to computer-generated simulations similar to today’s flight training simulators, in which images, sounds, and even tactile sensations are produced to create a synthetic, three-dimensional representation of reality. With a system of this sort,

assessment of social skills will no longer involve asking questions, ad hoc role playing, or in

598

MORELAND, FOWLER, HONAKER

vivo observation (unless that observation is carried out via video/computer telemetry). Rather, a Clinician will observe a client’s interactions in a situation that has the dual advantages of seeming quite real to the client while having all the psychometric virtues of today’s much simpler tests (e.g., standardized “administration,” reliability). Observations of clients’ test behavior also will be much more sophisticated than at present. The computer will capture a wide variety of reactions to test stimuli: verbalizations, body movements, facial expressions, infrared heat changes, brain images, and physicochemical changes obtained using microscopic machines seeded into the body to make bioassays. How does our alcohol abuser react to the smell of bourbon? With psychophysiological arousal and self-reported feelings of guilt? Or indifference manifested both biologically and psychologically? Furthermore, the computer will analyze and synthesize all these data instantly, providing the ultimate in computer-adaptive assessment. The computer would instantly hypothesize that our aroused, guilt-reporting client uses alcohol as an anxiolytic and direct the next portion of the assessment accordingly. On the other hand, the computer would conclude that perhaps the indifference of the second client indicates an antisocial personality, of which excess drinking is but one small feature. The second assessment would then proceed differently than the first. Sound like science fiction? Prototypes of almost all this technology—some already quite sophisticated, some still rather crude—are currently under development.

Acknowledgments The views expressed in this chapter are the private views of the authors, not official statements on behalf of the American Psychological Association. We would like to thank David

Glenwick and Marvin Reznikoff for their comments on an earlier version of this manuscript. Of course, we are responsible for any remaining errors or omissions.

References Adler, T. (1992, April). Mental health care is key to low medical costs. APA Monitor, p. 24. Affleck, D. C., & Strider, F. D. (1971). Contribution of psychological reports to patient management. Journal of Consulting and Clinical Psychology, 37, 177-179. Alcohol, Drug Abuse, and Mental Health Administration. (1987). ADAMHA update fact sheet. Washington, DC: Public Health Service. Alden, L. (1989). Short-term structured treatment for personality disorder. Journal of Consulting and Clinical Psychology, 57, 756-764. American Hospital Association. (1987). Hospital statistics. Chicago: Author.

Annis, H. M., & Davis, C. S. (1988). Selfefficacy and the prevention of alcoholic re-

lapse: Initial findings from a treatment trial. In T. B. Baker & D. Cannon (Eds.), Addictive disorders: Psychological research on assessment and treatment (pp. 84—111). New York: Praeger. Annis, H. M., & Graham, J. M. (1988). Situational Confidence Questionnaire (SCQ) user’s guide. Toronto, Canada: Addiction Research Foundation of Ontario. Austad, C. S., & Berman, W. H. (1991). Managed health care and the evolution of psychotherapy. In C. S. Austad & W. H. Berman (Eds.), Psychotherapy in managed health care: The optimal use of time and resources

(pp. 3-18). Washington, DC: American Psychological Association.

23 Bandura,

A. (1989). Human

agency

in social

cognitive theory. American Psychologist, 44; 1175-1184. Bedi, A. (1987). Alcoholism, drug abuse, and other psychiatric disorders. In R. E. Herrington, G. R. Jacobson, & D. G. Benzer (Eds), Alcohol and drug abuse handbook (pp. 346-384). St. Louis: Warren H. Green. Ben-Porath, Y. S., Slutske, W. S., & Butcher, J. N. (1989). A real-data simulation of computerized adaptive administration of the MMPI. Psychological Assessment, 1, 18-22. Berg, M. (1984). Teaching psychiatric residents about psychological testing. Professional Psy-

chology: Research and Practice, 3523

15, 343-

Berg, M. (1986). Toward a diagnostic alliance between psychiatrist and psychologist. American Psychologist, 41, 52-59.

Biskupic, J. (1991, December 7). Civil Rights Act of 1991. Congressional Quarterly, 3620-—

3622. Blank, L. (1965). Psychological evaluation in psychotherapy: Ten case histories. Chicago: Aldine. Bradburn, N. M. (1969). The structure of psychological well-being. Chicago: Aldine. Breger, L. (1968). Psychological testing: Treatment and research implications. Journal of Consulting and Clinical Psychology, 32, 178-

181. Butcher, J. N., & Tellegen, A. (1966). Objections to MMPI items. Journal of Consulting Psychology, 30, 527-534. Butcher, J. N. (1990). MMPI-2 in Psychological Treatment. New York: Oxford University Press. Callan, M. F., & Yeager, D. C. (1991). Containing the health care cost spiral. New York: McGraw-Hill.

Campbell, A., Converse, P. E., & Rogers, W. L. (1976). The quality of American life. New York: Russell Sage Foundation. Cerney, M. S. (1978). Use of the psychological test report in the course of psychotherapy. Journal of Personality Assessment, 42, 457-

463.

FUTURE DIRECTIONS

DeAngelis, T. (1992, July). APA aims for equitable spot in new HCFA fee schedule. APA Monitor, pp. 22-24. DeCourcy, P. (1971). The hazards of short-term psychotherapy without assessment: A case history. Journal of Personality Assessment, 35,

285-288. Dorwat, R. A., & Chartock, L. R. (1988). Psychiatry and the resource-based relative value scale. American Journal of Psychiatry,

145,1237-1242. Durenberger, D. (1989). Providing mental health care services to. Americans. American Psychologist, 44, 1293-1297.

Finn, S. E., & Butcher, J. N. (1991). Clinical objective personality assessment. In M. Hersen, A. E. Kazdin,

& A. S. Bellack (Eds.),

The clinical psychology handbook (2nd ed.., pp. 362-373). New York: Pergamon. Finn, S. E., & Tonsager, M. E. (1992). Therapeutic effects of providing MMPI-2 test feedback to college students awaiting therapy. Psychological Assessment, 4, 278-287.

First, M. B., Gibbon, M., Williams, J. B. W., & Spitzer, R..L. (1992). AutoSCID I. [computer program]. Toronto: Multi—Health Sys-

tems. Freiberg, P. (1992, July). SSA may postpone rules announcement. APA Monitor, p. 24. Frisch, M. B., Cornell, J., Villanueva, M., & Retzlaff, P. J. (1992). Clinical validation of the Quality of Life Inventory: A measure of life satisfaction for use in treatment planning and outcome assessment. Psychological Assessment, 4, 92-101. ~

Garfield, S. R., & Kurtz, R. (1976). Clinical psychologists in the 1970s. American Psychologist, 31, 1-9.

Giannetti, R. A. (1987). The GOLPH psychosical history: Response-contingent data acquisition and reporting. In J. N. Butcher (Ed.), Computerized psychological assessment: A practitioner's guide (pp. 124-144). New York: Basic Books.

Gibson,

R.,

&

Waldo,

health expenditures,

D. (1982). National

1981. Health Care Fi-

nancing Review, 4, 1—36.

Cole, J. K., & Magnussen, M. J. (1966). Where the action is. Journal of Consulting and Clini-

Goldsmith, J. C. (1984, Fall). Death of a para-

cal Psychology, 30, 539-543. Cronbach, L. J. (1982). Designing evatuations

Hansen, L. (1980). Treatment of reduced intellectual functioning in alcoholics. Journal of Studies on Alcohol, 41, 156-158. Hansen, J., & Emrick, C. D. (1983). Whom are

of educational and social programs. San Fran-

cisco: Jossey-Bass.

digm. Health Affairs, pp. 5-19.

599

600

MORELAND,

FOWLER, HONAKER

we calling “alcoholic”? Bulletin of the Society of Psychologists in Addictive Behaviors, 2, 164-178. Hartigan, J. A., & Wigdor, A. K. (1989). Fairness in employment testing: Validity generalization, minority issues and the General Aptitude Test Battery. Washington, DC: National Academy Press. Hartlage, L. C., Freeman, W., Horine, L., & Walton, C. (1968). Decisional utility of psychological reports. Journal of Clinical Psychology, 24, 481-483. Hayes, S. C., & Leonhard, C. (1991). The role of the individual case in clinical science and practice. In M. Hersen, A. E. Kazdin, & A. S. Bellack (Eds.), The clinical psychology handbook (2nd ed., pp. 223—238). New York: Pergamon.

Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987). The treatment utility of assessment: A functional approach to evaluate assessment quality. American Psychologist, 42, 963-974. Heather, N. (1989). Controlled drinking treatment: Where do we stand today? In T. Loberg,

W. R. Miller, P. E. Nathan, & G. A. Marlatt (Eds.), Addictive behaviors: Prevention and early intervention (pp. 31-50). Amsterdam: Swets & Zeitlinger. Headey, B. W., Holmstrom, E. L., & Wearing, A. J. (1984). The impact of life events and changes in domain satisfactions on well-being. Social Indicators Research, 15, 203-227. Hester, R. K., & Miller, W. R. (Eds.). (1989). Handbook

of

alcoholism

treatment

proaches: Effective alternatives. Pergamon.

New

ap-

York:

Holtzman, W. H., Thorpe, J. S., Swartz, J. D., &

Herron, E. W. (1961). Inkblot perception and personality. Austin, TX: University of Texas Press. Howard,

K. I., Kopta, S. M., Krause, M. S., &

Orlinsky, D. E. (1986). The dose-effect relationship in psychotherapy. American Psychologist, 41, 159-164. Howard, K. I., Kopta, S. M., Lebow, J. L., & Orlinsky, D. E. (1982, June). Pattern of symptomatic change in psychotherapy. Paper presented at the annual meeting of the Society for Psychotherapy Research, Aspen, Colorado. Hugdahl, K., & Ost, L. (1981). On the difference between statistical and clinical significance. Behavioral Assessment, 3, 289-295.

Hymowitz, P., & Sweeney, J. A. (1985). Focal diagnostic psychological assessment. The Psychiatric Hospital,

16, 91-95.

Jacobson, N. S. (Ed.). (1988). Defining clinically significant change [Special Issue]. Behavioral Assessment, 10. Jacobson, N. S., Follette, W. C., & Revenstorf, ~ D.- (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy,

15, 336-352. Jacobson, N. S., & Revenstorf, D. (1988). Statistics for assessing the clinical significance of psychotherapy techniques: Issues, problems, and new developments. Behavioral Assessment, 10, 133-145. Kaplan, R. M.

(1984).

The connection between

clinical health promotion and health status: A critical overview. American Psychologist, 39,

755-765. Karasu, T. B. (1990). Toward a clinical model of psychotherapy for depression, II: An integrative and selective treatment approach. American Journal of Psychiatry, 147, 269-278. Kazdin, A. E. (1977). Assessing the clinical or applied importance of behavior change through social validation. Behavior Modifica-

tion,

1, 427-452.

Kazdin, A. E. (1991). Treatment research: The investigation and evaluation of psychotherapy. In M. Hersen, A. E. Kazdin, & A. S. Bellack (Eds.), The clinical psychology handbook (2nd ed., pp. 293-312). New York: Pergamon. Kendall, P. C., & Grove, W. M.

(1988).

Norma-

tive comparisons in therapy outcome. Behavioral Assessment,

10, 147-158.

Kessler, K. A. (1986). Benefit design, utilization review, case management, PPOs contain costs. Benefits Today, 3, 1-2. Kiesler, C. A., & Morton, T. L. (1988). Prospective payment system for inpatient psychiatry: The advantages of controversy. American Psychologist, 43, 141-150.

Kiesler, C. A., & Sibulkin, A. (1987). Mental hospitalization: myths and facts about a national crisis. Newbury Park, CA: Sage. Kiesler, C. A., Simpkins, C. G., & Morton, T. L. (1991). Research issues in mental health policy. In M. Hersen, A. E. Kazdin, & A. S. Bellack (Eds.), The clinical psychology handbook (2nd ed., pp. 78-101). New York: Per-

gamon. Kleinmuntz, B. (1990). Why we still use our heads instead of formulas: Toward an integra> an

23 tive approach.

Psychological Bulletin,

197,

296-310. Klopfer, W. G. (1964). The blind leading the blind: Psychotherapy without assessment. Journal of Projective Techniques and Personality Assessment,

28, 387—392.

Korchin, S., & Schuldberg, D. (1981). The future of clinical assessment. American Psychologist, 36, 1147-1158. Lambley, P. (1974). The dangers of therapy without assessment: A case study. Journal of Personality Assessment,

38, 263-265.

Lewandowski, D., & Sacuzzo, D. (1976). The decline of psychological testing. Professional Psychology, 7, 177-184.

Linden, W., & Wen, F. K. (1990). Therapy outcome research, health care policy, and the continuing lack of accumulated knowledge. Professional Psychology: Research and Practice,

21, 482-488. Loberg, T. (1980). Alcoholism and neuropsychological deficits in men. Journal of Studies on Alcohol, 41, 119-128.

Lovitt, R. (1987). A conceptual model and case study for the psychological assessment of hysterical pseudo-seizures with the Rorschach. Journal of Personality Assessment,

51, 207—

219. Matarazzo, J. D. (1990). Psychological assessment versus psychological testing. American Psychologist, 45, 999-1017. Matarazzo, J. D. (1992). Psychological testing and assessment in the 21st century. American Psychologist, 47, 1007-1018. McClellan, K. (1985, August). The changing nature of EAP practice. Personnel Administra-

tor, pp. 29-37. McCullough, L., Farrell, A. D., & Longabaugh, R. (1986). The development of a microcomputer-based mental health information system: A potential tool for bridging the scientist—practitioner gap. American Psychologist, 41, 207-214. McLellan, A. T., & Alterman, A. I. (1991). Patient-treatment matching: A conceptual and methodological review with suggestions for future research. In R. W. Pickens, C. G. Leukefeld, & C. R. Shuster (Eds.), Improving drug abuse research (NIDA Research Monograph 106) (pp. 114-135). Rockville, MD: National Institute on Drug Abuse. Meehl, P. E. (1956). Wanted—a good cookbook. American Psychologist, 11, 263-272. Meehl, P. E. (1978). Theoretical risks and tabu-

FUTURE DIRECTIONS

lar asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806— 834. Meyer, J. D., Fink, C. M.se& sCarey: P. F. (1988). Medical views of psychological consultation. Professional Psychology: Research and Practice, 19, 356-358. Miller, W. R. (1989). Matching individuals with interventions. In R. K. Hester & W. R. Miller (Eds.), Handbook of alcoholism treatment approaches: Effective alternatives (pp. 261-271). New York: Pergamon. Miller, W. R., & Hester, R. K. (1986). Inpatient alcoholism treatment: Who benefits? American Psychologist, 41, 794—805.

Moos, R. H. (1982). Introduction to the Social Climate Scales. Palo Alto, CA: Consulting Psychologists Press. Newman, F. L., & Sorensen, J. E. (1985). Integrated clinical and fiscal management in mental health. Norwood, NJ: Ablex. Ogborme, A. C. (1980). Controlled evaluative studies of treatment for alcohol and drug abuse. Acta Psychiatrica Scandinavica, 62, 66-76. Parsons,

O.

A.,

Butters,

N.,

&

Nathan,

P.

E. (1987). Neuropsychology of alcoholism: Implications for diagnosis and treatment. New York: Guilford. Pattison, E. M. (1966). A critique of alcoholism treatment concepts with special reference to abstinence. Quarterly Journal of Studies on Alcohol, 27, 49-71. Pattison, E. M. (1976). A conceptual approach to alcoholism treatment goals. Addictive Behaviors, 1, 177-192. Reichelt, P. A. (1984). Localization and utilization of available behavioral measurement instruments. Professional Psychology: Research and Practice, 14, 341-356.

Reimbursement Survey Results. (1991, Summer). Assessment Applications, p. 1. (Available from National Computer Systems, 5605 Green Circle Drive, Minnetonka, MN 55343) Reynolds, W. M. (1979). Psychological tests: Clinical usage versus psychometric quality. Professional Psychology, 10, 324-329. Rodin, J., & Salovey, P. (1989). Health psychology, Annual Review of Psychology, 40, 533SH Roemer, M. I. (1985). National strategies for health care organization. Ann Arbor: Health Administration Press.

601

602

MORELAND, FOWLER, HONAKER Roper, B. L., Ben-Porath, Y. S., & Butcher, J. N. (1991). Comparability of .computerized adaptive and conventional testing with the MMPI-2. Journal of Personality Assessment, 57, 278-290. Saunders, S. M., Howard, K. I., & Newman, F.

L. (1988). Evaluating

the clinical

cance of treatment effects: Norms mality. Behavioral Assessment,

signifiand nor-

10, 207-218.

Schlosser, B. (1990). The assessment of subjective well-being and its relationship to the stress process. Journal of Personality Assessment, 54, 128-140. Schlosser, B. (1991). The future of psychology and technology in assessment. Social Science Computer Review, 9, 575-592.

Skinner, H. A. (1984). Instruments for assessing alcohol and drug problems. Bulletin of the Society of Psychologists in Addictive Behaviors, 3, 21-33. Smyth, R., & Reznikoff, M. (1971). Attitudes of psychiatrists toward the usefulness of psychodiagnostic reports. Professional Psychology, 2, 283-288. ; Sullivan, S., Flynn, T. J., & Lewin, M. E. (1987). The quest to manage mental health costs. Business and Health, 4, 24—28. Sundberg, N. D., Snowden, L. R., & Reynolds, W. M. (1978). Toward assessment of personal competence and incompetence in life situations. Annual Review of Psychology, 29, 1792218 Sundberg, N. D., & Tyler, L. E. (1962). Clinical psychology. New York: Appleton-CenturyCrofts. Sweeney, J. A., Clarkin, J. F., & Fitzgibbon, M. L. (1987). Current practice of psychological assessment. Professional Psychology: Research and Practice, 18, 377—380. Taube, C. A., Burns, B. J., & Kessler, L. (1984). Patients of psychiatrists and psychologists in office based practice: 1980. American Psychologist, 39, 1435-1447. Taylor, S. E. (1989). Positive illusions: Creative self-deception and the healthy mind. New York: Basic Books. Tolchin, M. (1989, September 24). Sudden support for national health care. New York Times, p. E4.

TJulkin, S., & Frank, G. W. (1985). The changing role of psychologists in health maintenance organizations. American Psychologist,

40, 1125-1129. Tupper, D. E., & Cicerone, K. D. (Eds.). (1990). The neuropsychology of everyday life. Boston: Kluwer. VandenBos, G., & DeLeon, P. H. (1988). The use of psychotherapy to improve physical health. Psychotherapy, 25, 335-343. Wade, T. C., & Baker, T. B. (1977). Opinions and use of psychological tests: A survey of clinical psychologists. American Psychologist,

32, 874—-882.' Warr, P. B., Barter, J., & Brownbridge, G. (1983). On the independence of positive and negative affect. Journal of Personality and Social Psychology, 44, 664-651. Weiss, D. J., & Vale, C. D. (1987). Computerized adaptive testing for measuring abilities and other psychological variables. In J. N. Butcher (Ed.), Computerized psychological assessment: A practitioner’s guide (pp. 325— 343). New York: Basic Books. Weissman, M. M. (1975). The assessment of social adjustment: A review of techniques. Archives of General Psychiatry, 32, 357—366.

Wigdor, A. K., & Garner, W. R. (1982). Ability testing: Uses, consequences, and controversies: Washington, DC: National Academy Press. Winett, R. A., King, A. C., & Altman, D. G. (1989). Health psychology and _public health: An integrative approach. New York: Pergamon. Winslow, R. (1989, December 13). Spending to cut mental health costs. Wall Street Journal, p.

Al. Wisconsin State Medical Society, Committee on Alcohol and Other Drug Abuse. (1981). Guideline admission criteria for chemical dependency treatment services. Wisconsin Medical Journal, 80, 5—7.

Youngstrom, N. (1992, July). ADA is super advocate for those with disabilities. APA Monitor, p. 26.

Author

A Aaronson, S. T., 236, 247 Abeles, N., 67, 72 Abeloff, M. D., 221, 241, 244

Alovis, N., 233, 247

Alp, I#E., 256; 277 Alpher, V. S., 257, 277 Alterman, A. I., 598, 601 Althof, S. E., 241, 243, 247

Abraham, K. R., 379, 397

Altman, D. G., 585, 602

Abrams, D. B., 81, 96

Alvarez, W., 239, 245 Amick, B. C., 33, 34, 51

Abrams, R., 380, 397 Achenbach, T. M., 100, 101, 109, 444, 449, 490, 511, 517, 518, 520, 522, 523, 526, 528, 530, 531, 532, 533, 534, 536, 537, 538, 539, 547, 548, 549 Achim, A., 379, 400

Index

Ammerman,

R. T., 284, 288

Anastasi, A., 424, 449 _ Andersen, A. E., 64, 72, 152, 157 Andersen, J., 384, 397 Anderson, D. J., 222, 231, 236, 246,

281, 290

Ackerman, B. J., 483, 513

Anderson, M. D, 10, 19

Ackerman, P. T., 560, 574

Anderson, R. L., 33, 34, 50, 51

Adams, K. M., 355, 368 Adler, G., 301, 302 Adler, T., 582, 583, 598 Ae Lee, M., 239, 243 Affleck, D. C., 581, 582, 598 Agdeppa, J., 381, 397 Ahern, D. K., 81, 96

Anderson, S. M., 33, 34, 35, 48

Ainsworth, M. D., 252, 278

Akiskal, H. S., 281, 288 Albert, M., 42, 48 Albott, W. L., 151, 156 Alden, L., 591, 598 Alexander, F. G., 305, 315 Ali, R. M., 365, 368 Allan, J., 560, 577 Allen, C., 41, 48

Allen, J. P., 151, 156 Allen, R., 41, 48 Allred, L. J., 231, 244

Anderson, W., 137, 138, 139, 156 Andreason, N. C., 380, 381, 397

Andrews, F. M., 407, 408, 409, 418 Aneshensel, C. S., 24, 54 Angelone, J. V., 151, 159 Angle, R. S., 104, 109 Angold, A., 490, 511 Angst, J., 230, 238, 243, 248 Annable, L., 392, 398 Annis, H. M., 588, 589, 598 Anthony, J. C., 32, 40, 51, 52 Anton, W. D., 313, 320 Antonuccio, D. O., 282, 288

Appel, M., 311, 317 Applebaum, S. A., 11, 12, 19 Apter, A., 490, 511

Aragona, J. C., 312, 315 Arana, G. W., 44, 48

603

604 AUTHOR INDEX Arbisi, P., 37, 49 Archer, R. P., 425, 426, 427, 428, 429, 430, 431, 432, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 446, 448, 449, 450, 451, 452

Balla, D., 499, 515, 516 Ballard, J. E., 560, 574 Ballenger, J. C., 222, 236, 243

Balleweg, B. J., 281, 290 Ban, T. A., 196, 197, 215 Bandura, A., 559, 599

Areshensel, C. S., 30, 50 Arkowitz, H., 57, 58, 64, 71 Armentrout, D. P., 151, 158 Armstrong, D., 334, 349 Arnold, L. E., 558, 574 Asnis, G. M., 222, 238, 248 Aston, L., 560, 577 Atilano, R. B., 88, 97 Atkeson, B. M., 281, 288 Atkin, A., 561, 574 Atkins, T. E., 483, 515

Barack, R. S., 555, 561, 578 Barad, S. J., 479, 511 Barber, J. P., 63, 71 Barbrack, C. R., 88, 96 Barker, B. M., 297, 315 Barker, H. R., 297, 315 Barker, L. R., 297, 302, 315, 316, 320 Barkham, M., 238, 247 Barkley, R. A., 486, 489, 511, 512

Banzett, L., 379, 400

Atkinson, J., 294, 315

Barlow, D. H., 313, 316

Atkinson, L., 256, 277

Barlow, S., 90, 94

Attkisson, C. C., 103, 109, 118, 133,

Barnebey, N. S., 558, 574 Barnes, R. A., 282, 289

402, 403, 404, 405, 406, 407, 408,

409, 410, 411, 412, 413, 414, 417, 418, 419, 420 Auerbach, S. M., 297, 318 August, G. J., 560, 574

Ausaman, J. I., 355, 368 Austad, C. S., 6, 7, 9, 10, 20, 583, 593, 596, 598 Averill, J. R., 305, 315 Avner, R., 64, 74, 124, 133, 134 Ayoub, C. C., 129, 132, 134 B Babor, T. E., 151, 158

Badger, L. W., 33, 34, 50, 51, 52

Baer, B. A., 62, 73, 219, 229, 245

Bailey, D. B., 88, 94 Bailey, J., 281, 288 Baird, A. D., 355, 368 Bairnsfather, L. E., 33, 34, 49 Baker, E., 561, 577, 40, 41, 48 Baker, G., 257, 278 Baker, T. B., 582, 587, 602 Bala, S. P., 561, 574 Baldessarini, R. J., 44, 48

Baron, D., 236, 246

Barrett, C. L., 536, 549 Barrett, J. A., 24, 32, 48 Barrett, J. E., 24, 32, 48 Barron, F., 64, 71

Barsky, A. J., 47, 53 Bartels, R. L., 311, 316 Barter, J., 594, 602

Bartlett, D., 235, 246 Bascue, L. O., 348

Bass, D., 92, 95 Basta, S. M., 483, 511 Bastani, B., 394, 400 Baucom, D. H., 333, 348 Baum, A., 240, 245 Bayles, K., 41, 48 Beach, S. R. H., 334, 347, 339 Beasley, C., 382, 383, 384, 399 Bech, P., 31, 32, 48, 219, 230, 247, 384, 397 Beck, A. T., 31, 48, 52, 57, 59, 69, 71, 73,,93495~2375248, 196, 2152279: 280, 281, 282, 283, 284, 285, 286, 288, 290, 291, 379, 397, 314, 316 Beck, R. W., 31, 48, 280, 284, 288, 290

AUTHOR INDEX Beck, T., 36, 53 Becker, B., 197, 215 Becker, J. M. T., 392, 401 Becker, R. W., 379, 381, 382, 397 Beckwith, L., 558, 577 Bedell, J., 313, 320 Bedell, R., 104, 110, 118, 125, 133 Bednar, R. L., 82, 94 Beitman, B. D., 58, 61, 72 Belevich, J. K. S., 441, 449 Bell, M. J., 197, 215, 381, 382, 397 Bellack, L., 282, 286, 288 Beller, S. A., 372, 387, 397 Belluomini, J., 281, 288 Bem, S. L., 328, 349 Ben-Porath, Y. S., 138, 156, 426, 427, 428, 429, 431, 434, 437, 438, 440, 441, 442, 443, 444, 450, 452, 461, 478, 596, 599, 602

Bendig, A. W., 301, 316

605

Berman, K., 5, 19 Berman, W. H., 583, 593, 596, 598 Bernard, H. S., 355, 368 Berndt, D. J., 281, 288, 466, 477 Berndt, S. M., 281, 288 Bernstein, D. A., 295, 316 Berthot, B. D., 385, 397 Berwick, D. M., 17, 19, 46, 52, 47, 53 Berzins, J. I., 82, 94 Beutler, L. E., 55, 58, 60, 61, 62, 64, 65, 66, 69, 72, 73, 74, 82, 94, 99, 109, 124, 124, 129, 131, 132, 230, 243, 506, 511, 466, 477 Bhardwaj, S., 242, 247 Biaggio, M. K., 301, 316 Bialow, M., 281, 290 Biase, D. V., 295, 321

Bielawska, C., 40, 49 Bigelow, L. B., 385, 387, 397, 399

Benson, F., 39, 49

Bigler, E. D., 483, 515 Billington, R., 197, 215, 381, 382, 397 Binder, J. L., 57, 62, 74 Binder, R. L., 405, 419, 420 Birdwell, J., 101, 109 Birdwell, T., 87, 95 Birtchnell, J., 47, 48

Benjamin, A. H., 227, 243

Benjamin, G., 37, 48 Benjamin, L. S., 66, 67, 72 Benkert, O., 222, 243 Bennun, I., 334, 349 Benson, H., 238, 243

Bishop, S., 282, 286, 288

Bentler, P. M., 128, 132 Berchick, R. J., 283, 284, 286, 288 Bereiter, C., 115, 134 Berenson, G. S., 561, 576

Biskupic, J., 584, 599 Bisoffi, G., 410, 411, 412, 417, 420 Bitter, I., 380, 381, 397, 398 Black, R. W., 106, 110

Berg, G., 40, 48

Blackburn, I. M., 282, 286, 288

Berg, I., 554, 574

Blackstock, J., 40, 49

Berg, L., 40, 48

Blain, J., 237, 248

Berg, M., 583, 587, 599

Blair-West, S., 236, 245

Berg, P., 327, 349

Blakelock, E. H., 306, 308, 317

Bergan, J., 60, 62, 64, 65, 69, 72, 74, 230, 243 Berger, M., 84, 94 Berger, P. A., 380, 381, 394, 401 Bergin, A. E., 60, 73, 74, 84, 95, 470, 477 Bergman, E., 284, 290 Berhard, M., 39, 40, 50 Berman, J. S., 60, 74, 86, 96, 101, 110

Blanchard, E. B., 59, 72

Blanchard, R., 149, 158 Bland, R. C., 528, 549 Blank, L., 582, 599 Blashfield, R. K., 148, 150, 156, 158, 196, 216 Blashki, T. G., 281, 289 Blatt, S. J., 256, 260, 277, 281, 288 Blazer, D., 32, 48

606 AUTHOR INDEX Bleeker, J. A. C., 380, 381, 382, 386, 398 Blessed, G., 41, 48

Blisk, D., 507, 514 Blodgett, C., 151, 158

Blouin, A. G., 552, 553, 558, 574, 578 Blouin, J., 552, 574 Blumenthal, J., 309, 321 Bobula, J. A., 34, 51 Bock, J. C., 64, 73 Bodner, D. R., 241, 243 Boen, D. L., 348, 349 Boerger, A. R., 382, 397, 398 Boersma, D. C., 483, 486, 487, 489,

500, 501, 503, 514, 515

Brent, D. A., 491, 512 Bridges, K., 35, 48 Briggs, P. F., 428, 429, 438, 444, 451 Brito, G. N., 561, 574 Britt, B. C., 334, 350 Broadhurst, A. D., 282, 289

Broderick, J. E., 334, 349 Brodie, H., 34, 52 Brodman, K., 217, 248 Broman, C. L., 311, 318 Bromet, E., 312, 316

Brook, R. H., 16, 19 Brooks, M. L., 311, 316 Broskowski, A., 5, 6, 7, 9, 19 Brown, B., 226, 244

Bohachick, P., 237, 243

Brown, C. C., 151, 158

Boileau, R. A., 560, 574

Brown, E., 483, 513 Brown, G. G., 355, 368

Boleloucky, Z., 221, 243

Bollen, K. A., 128, 132 Bonato, R. R., 235, 244

Brown, G. L., 569, 574

Boodoosingh, L., 387, 401

Brown, G., 31, 52, 283, 284, 286, 288 Brown, J. M., 281, 290

Booarem, C. D., 88, 95

Booth-Kewley, S., 311, 316

Brown, J. S., 531, 537, 548

Borden, K. A., 566, 574

Brown, M. S., 303, 307, 318

Bordin, E. S., 409, 414, 419 Borduin, C. M., 240, 248

Brown, R. L., 231, 243

Borison, R. L., 379, 398 Borkovec, T. D., 295, 316 Boronow, J., 380, 398

Borrelli, D. J., 238, 243 Borus, J. F., 46, 52 Bowen, L. L., 379, 400

Brown, R. T., 554, 564, 566, 574, 578 Brown, T. R., 19, 98, 99, 101, 102, 109, 149, 156, 339, 349, 390, 391, 392, 398, 415, 417, 418, 419, 439, 440, 450, 536, 537, 548 Brownbridge, G., 594, 602

Bowlby, J., 283, 290 Boyd, J. H., 24, 32, 47, 52 Bradburn, N. M., 594, 599 Braiman, S., 31, 37, 54 Brandon, A. D., 313, 314, 317 Brandt, L., 87, 95, 101, 109 Breckenridge, J., 227, 229, 247 Breen, M. J., 486, 489, 511, 512

Budman, S. H., 46, 52

Breger, L., 581, 582, 599

Bugental, D. B., 567, 574, 575

Bregman, Z., 379, 398

Bulik, C. M., 231, 243 Buller, R., 222, 243 Bullock, D., 489, 512 Bulpitt, C. J., 226, 244

Breier, A., 379, 380; 398, 401 Brengelmann, J. C., 334, 349 Brennan, J., 281, 290

Browne, R. C., 138, 157 Brunsting, D. W., 392, 401

Bryant, B. E., 368, 369

Bryant, D., 114, 115, 133 Bryer, J. B., 238, 240, 243 Bryk, A. S., 129, 130, 132 Buck, Dy K., 3125316

AUTHOR INDEX 607 Bulter, A., 554, 574 Bunin, G. R., 483, 515 Burdock, E. I., 360, 369 Burgess, E. W., 353, 369

Burgess, P. M., 281, 289 Burisch, M., 332, 349 Burke, A., 470, 478 Burke, J. D., 24, 32, 33, 47, 48, 52, 53 Burke, J., 33, 34, 37, 53 Burlingame, G., 58, 73, 90, 91, 92, 94, 97 Burnam, M. A., 24, 33, 40, 47, 48, 49, 54 Burnett, S., 394, 400 Burns, B. J., 33, 34, 37, 48, 52, 53, 590, 602 ‘ Burrows, G. D., 222, 236, 243, 245, 281, 292, 289

Calsyn, R. J., 88, 94, 95 Calvert, S. J., 65, 72 Cameron, O. G., 222, 239, 243 Campbell, A., 594, 599 Campbell, D. T., 294, 316 Campbell, J. L., 309, 310, 316 Campbell, M. M., 281, 289 Campbell, W., 392, 398 Canger, J. M., 311, 316 Canino, G. J., 528, 549

Caputo, J., 536, 549, 559, 561, 568, ilo, Se Capwell, D. F., 434, 450 Carey, K. B., 232, 243 Carey, M. P., 232, 243 Carey, P. F., 596, 601

Carlson, G., 280, 281, 290 Carlson, J. G., 66, 72

Burton, J., 229, 245

Carlson, W. A., 281, 289

Buss, A. H., 300, 303, 316

Carnicke, C. L. M., 32, 49, 224, 244 Carpenter, B. D., 311, 316 Carpenter, J. T., 231, 243 Carpenter, W. T., Jr., 380, 389, 394, 398, 399 Garr} JSE-, 311,321 Carr, S. J., 229, 245

Butcher, J. N., 11, 12, 19, 56, 65, 69,

72, 137, 138, 139, 156, 157, 168, 184, 423, 426, 427, 428, 429, 431, 432, 434, 437, 438, 440, 441, 442, 443, 444, 450, 451, 452, 461, 478, 586, 588, 596, 597, 599, 602 Butkus, M., 483, 487, 513, 515 Butters, N., 592, 601 Byars, W. D., 281, 288 Byerly, F. C., 281, 289 Bystritsky, A., 236, 246

Carrington, P., 237, 243

Carroll, B. J., 281, Carroll, I. G., 410, Carter, D. E., 102, Cartwright, D. S.,

289 413, 416, 419 1:09, 118, 132 81, 82, 95

Carver, C., 238, 248

Cc Caddell, J. M., 196, 216 Cahn, T. S., 227, 243 Cahn, W., 222, 238, 248 Caine, T. M., 300, 316 Cairns, V. E., 368, 369 Calatta, B., 560, 568, 578 Calder, P., 483, 512 Caldwell, A. B., 138, 156, 431, 450

Calhoun, K. S., 281, 288 Califano, J., 5, 19 Callan, M. F., 582, 599

Casat, C. D., 387, 398 Case, W. G., 237, 246 Catalano, R. F., 77, 97 Cattell, R. B., 221, 245, 294, 295, 296, 298, 316, 406, 419 Caudrey, D. J., 88, 95 Cavan, R. S., 353, 369

Cavanaugh, D. J., 311, 316 Cavanaugh, S. V., 281, 289 Cerney, M. S., 382, 383, 384, 399, 582, 599 Chadda, R., 380, 400 Chaney, L. A., 567, 574, 575

608 AUTHOR INDEX Chape, C., 306, 308, 318 Chapman, J., 233, 245 Charap, P., 38, 53 Chartock, L. R., 582, 599 Chelst, M. R., 366, 370 Chen, M. K., 368, 369 Chesney, A. P., 306, 308, 317

Chevron, E. S., 281, 288 Chiles, J. A., 227, 243, 490, 512 Choca, J. P., 11, 12, 19, 165, 166, 184, 466, 467, 478 Chong, M., 47, 48

Chouinard, G., 392, 398 Chrisholm, B. J., 307, 318 Christ, S. L., 281, 290 Christenfeld, R., 281, 289 Christensen, E. R., 84, 96, 101, 109

Clarkin, J. F., 55, 56, 58, 61, 62, 66, 72, 73, 99, 109, 201, 215, 582, 586, 590, 595, 602 Cleary, P. A., 220, 221, 244 Cleary, P. D., 15, 19, 34, 50 Cleary, P., 355, 358, 370 Clement, P. W., 81, 96 Cloninger, C. R., 26, 51, 490, 512 Clum, G. A., 101, 109, 362, 363, 367, 369 Clyde, D. J., 360, 369 Cochran, C. D., 30, 49, 225, 243 Cody, J. J., 489, 513 Coggins, D. R., 33, 34, 52 Cohen, B., 561, 574 Cohen, C., 240, 247

Christiansen, J., 231, 246

Cohen; DrI.92299243 355,370; 557, 370, 557

Christie, D., 555, 575

Cohen, J., 540, 548

Christie, W., 282, 286, 288

Cohen, L. J., 231, 243

Christoph, P., 82, 96

Cohen, Cohen, Cohen, Cohen,

Christopher, T., 92, 95

Christy, W., 32, 33, 34, 53 Chu; 'G.,C), 355, 369 Ciarlo, J. A., 19, 98, 99, 101, 102, 109, 149, 156, 339, 349, 390, 391, 392, 398, 415, 417, 418, 419, 439, 440, 450, 536, 537, 548 Cicchetti, D., 499, 515, 516 Cicerone, K. D., 593, 602

Claghorn, J. L., 379, 393, 398 Clancy, J., 222, 231, 236, 246 Clare, A. W., 368, 369 Clark, B. L., 328, 329, 330, 334, 349, 351 Clark, C. A., 24, 54 Clark, C., 152, 157 Clark, E., 41, 45, 481, 489, 512 Clark, L. A., 239, 243 Clark, L., 38, 49 Clark, M. S., 88, 95 Clark, V. A., 30, 50 Clarke, D. C., 281, 289 Clarke, M. A., 64, 73 Clarke, M., 281, 289

M. J., 151, 158 M., 556, 557, 558, 575 N. J., 552, 560, 568, 575, 578 P., 490, 512

Cohen, S. E., 558, 577

Colbus, D., 490, 513 Cole, A., 151, 159

Cole, J. K., 581, 582, 599 Cole, J. O., 236, 247, 360, 361, 364, 368, 369 Cole, JS 2374-243 Coleman, R. E., 281, 289 Colligan, R. C., 138, 156 Collings, G. H., 237, 243 Collins, B. E., 555, 578 Collins, J. F., 59, 60, 72, 73 Collins, L. M., 112, 114, 132 Collins, L., 567, 574, 575 Collins, S., 567, 574, 575 Comrey, A. L., 405, 419 Comstock, G. W., 30, 49

Congden, R. J., 130, 132 Conners, C. K., 550, 551, 552, 553, 555, 557, 558, 559, 564, 565, 567, $70, 574, 575, 576

AUTHOR INDEX Connor, P. A., 555, 577 Conover, N. C., 491, 512, 561, 575 Converse, P. E., 594, 599 Conway, J. A., 151, 158 Cook, C., 558, 575 Cook, T. H., 393, 398 Cook, W. W., 300, 303, 316 Cooney, N. L.,, 65, 72, 73 Coopen, A., 281, 282, 288 Cooper, A. F., 49, 50 Cooper, A., 63, 72

609

Crook, T., 41, 45, 359, 364, 367, 368, 369 Crough, M. A., 33, 34, 49 Crowe, M. J., 334, 349 Crowe, R. R., 222, 236, 246 Croyle, R. T., 311, 316 Cruickshank, W. M., 489, 516 Csernansky, J. G., 40, 50, 372, 378, 379, 380, 381, 386, 392, 393, 398, 399, 400, 401 Cubberly, W. E., 560, 577 Cull, J. G., 198, 215

Cooper, B., 32, 51 Cooper, J. E., 221, 248 Copeland, D. R., 483, 515 Coppard, N., 379, 401 Cornelius, J. R., 236, 243, 244 Cornell, D. G., 483, 512 Cornell, J., 593, 594, 599 Coryell, W., 231, 244 Costa, L. D., 355, 358, 370 Costa, P. T., 196, 197, 198, 215 Costello, A. J., 491, 512, 528, 536, 548, 549 Costello, R. M., 151, 156 Covi, L., 27, 30, 49, 217, 244 Cox, G. B., 490, 512 Coyne, L., 382, 383, 384, 386, 398, 399 Crago, M., 64, 65, 72, 73, 506, 511 Craig, R. J., 151, 156 Craig, T. J., 379, 398 Cramer, P., 260, 277 Crane, R. J., 300, 305, 306, 307, 308, 311, 313, 320 Crane;:R: S., 303, 311, 316 Craske, M., 36, 49

D’Angio, G., 483, 515 D’Elia, L., 41, 51 Dahl, L., 63, 72 Dahlstrom, L. E., 137, 138, 139, 156, 481, 512 Dahlstrom, W. G., 137, 138, 139, 156, 442, 451, 481, 512 Daldrop, R. J., 64, 65, 69, 72, 230, 243 Dall’Agnola, R., 404, 409, 410, 411, 412, 414, 417, 420 Danahy, S., 149, 151, 157

Craven, J. L., 31, 49

Dandoy, A. C., 483, 513

Cristol, A. H., 206, 216 Crits-Christoph, P., 63, 65, 72, 74, 229, 237, 244

Danzinger, W., 40, 48

Cromwell, R. E., 322, 349

Cronbach, L. 125, 132, Croog, S. H., Crook, T. H.,

J., 100, 109, 114, 115, 133, 298, 316, 592, 599 226, 244 379, 401

Cummings, J., 39, 49

Cummings, N. A., 9, 10, 13, 14, 15, 19, 20 Curran, J. P., 81, 95 Curtis, G. C., 222, 239, 243 Curtis, G., 311, 316, 318 Curtis, N., 300, 301, 316 Cyr, Jah, 256) 277; Cytrynbaum, S., 87, 95, 101, 109

Czobor, P., 380, 398 D

Dar, R., 236, 246 Darwin, C., 293, 294, 317 Dasinger, E. M., 366, 370 Davidson, E. R., 483, 515 Davidson, J. R. T., 240, 244 Davidson, K. C., 114, 116, 125, 129,

130,133

610 AUTHOR INDEX

de la Torre, J., 312, 317

DeRubeis, R. J., 52, 72 Deshields, T. L., 311, 317 Deutsch, C. P., 489, 516 Devlin, M. J., 236, 248 Dewald, P., 466, 478 Dewitt, K. N., 237, 245 Deykin, E., 36, 54 Diamond, B. I., 379, 398 Diamond, J. M., 565, 568, 575 Diaz, F. G., 355, 368 Dick, J., 40, 49 DiClemente, C. C., 61, 69, 74 Dietzel; C: S$.) 67,92 DiLeo, F. B., 393, 401 Dillon, E. A., 382, 401

de Rael, C. W., 561, 577

DiMascio, A., 60, 73

de Witt, R. A., 555, 575 Deahl, M., 47, 48 Deane, F. P., 565, 568, 575

Dingemans, P. M., 380, 381, 382, 386,

DeAngelis, T., 584, 590, 599

DiVittis, A., 88, 96

DeCourcy, P., 581, 582, 599 Deering, C. D., 382, 383, 384, 399 Deese, J., 293, 318, 319 Deffenbacher, J. L., 304, 310, 312, 313, 314, 317, 318, 305, 320, 321 DeJulio, R., 101, 109

Dixon, D. N., 348, 349

Davidson, W. S., 88, 94, 95 Davies, B., 281, 289 Davies, H., 282, 289 Davies, M., 31, 37, 54, 528, 549 Davies, S. O., 238, 246 Davies-Avery, A., 16, 19 Davis, C. S., 64, 73, 588, 589, 598 Davis, G. C., 381, 401 Davis, T. C., 33, 34, 49 Davison, J. G., 31, 53 Dawson,.E.,

151, 158

De Freitas, B., 379, 398 De Pauw, K., 37, 53 de Brey, H., 406, 408, 419

DeJulio, S. S., 84, 96

DeKrey, S. J., 489, 512 Delay, J., 282, 289 DeLeon, P. H., 583, 602 deLeon, M., 41, 55 Delgado, A., 39, 40, 50 DellaPietra, L., 37, 49 Demb, H. B., 483, 512

398 Dismuke, S. E., 24, 53

Dixon, L., 392, 398 Dobie, R., 233, 247 Dobler-Mikola,

A., 238, 243

Dobson, K., 337, 349

Docherty, J. P., 59, 60, 72 Doherty, M. E., 557, 578 Dohrenwend, B. P., 24, 49 Dohrenwend, B. S., 24, 49

Dolan, S., 231, 247 Dolinsky, A., 31, 37, 54 Dolinsky, Z. S., 151, 158 Dollinger, S. J., 489, 513 Donabedian, A., 17, 19

Demm, P. M., 313, 314, 317

Donham, G. W., 297, 317 Doran, A. R., 379, 380, 398, 401

Demorest, A., 63, 72

Dorwat, R. A., 582, 599

DeMotts, J., 281, 290 den Boer, J. A., 379, 398 Depue, R., 37, 49 Derogatis; LR 27450, 32.483,.35, 37; 47, 49, 59, 72, 90, 95, 217, 218, 219, 220, 22Wed 2249 220.2 30,38, 241, 244

Dotemoto, S., 555, 578

Dembroski, T. M., 309, 317

Dertouzos, M. L., 17, 19

Dowd, E. T., 64, 72

Downey, A. M:, 561, 576 Doxey, N. C. 64, 75

Doyle, G., 40, 49 Drachman, D. A., 151, 157, 233, 245 Drolette, M. E., 305, 306, 317 Drummond,

C., 36, 54

AUTHOR INDEX Duckworth, J. C., 137, 138, 139, 156 Duffy, F., 292, 317 Duke, B. J., 507, 512 Duker, J., 137, 138, 139, 157 Dulcan, M. K., 491, 512, 528, 548, 549 Dunn, L. M., 499, 501, 512 Dunn, S., 40, 49 Dupont, R. L., 222, 236, 243 DuRant, R. H., 558, 575 Durenberger, D., 582, 599 Durkee, A., 300, 303, 316 Dworkin, S. F., 32, 53 Dyck, D. G., 311, 318 Dykman, R. A., 560, 574 Dysken, W. W., 236, 246 Dyson, W. L., 281, 282, 286, 290 18) Eames, K., 36, 54 Earls, F., 490, 512 Ebert, M. H., 569, 574 Edelbrock, C. S., 100, 101, 109, 444, 449, 491, 512, 519, 528, 530, 533, 534, 536, 548, 549, 555, 556, 561, 575 Eden, D. T., 30, 52 Edguer, N., 311, 318 Edinger, J. D., 152, 157 Edwards, B. C., 85, 96 Edwards, D. W., 19, 40, 48, 98, 99, 101, 102, 109, 149, 156, 339, 349, 390, 391, 392, 398, 415, 417, 418, 419, 439, 440, 450 Edwards, N. B., 483, 513 Edwin, D., 64, 72, 152, 157 Egan, B., 311, 319 Egan, D., 557, 578 Ehly, S. W., 489, 512 Ehrenworth, N. V., 438, 450

Elkins, D. E., 441, 449 Ellenberger, H. F., 249, 250, 251, 277 Ellertsen, B., 483, 512 Ellis, E. M., 281, 288 Ellison, R., 37, 50

Ellsworth, R. B., 101, 109 Ellwood, P. M., 5, 9, 15, 19 Elstein, A. S., 46, 54

Emery, G., 279, 288 Emery, R. E., 337, 349 Emrick, C. D., 585, 586, 599, 600 Endicott, J., 355, 370, 394, 395, 398, 401 Engle, D., 60, 62, 64, 65, 69, 72, 74, 230, 243 Ennis, J., 282, 289 Epstein, M. H., 561, 575 Epstein, N. B., 88, 97, 333, 348 Erbaugh, J., 59, 69, 72, 279, 281, 288, 379, 397 Erdman, H. P., 237, 247 Erfurt, J. C., 306, 308, 318 Erikkson, J., 31, 49 Erwin, R., 380, 399 Escobar, J., 40, 49, 50 Esveldt-Dawson,

K., 490, 513, 554,

556, 576 Eth, S., 393, 399 Evans, C., 47, 48 Evans, D. R., 301, 317

Evans, M. D., 62, 72 Evans, W. R., 536, 549

Everitt, B., 563, 578 Exner, J. E., Jr., 249, 250, 251, 252, 254,255, 256, 26022623 2678275, 276, 278, 436, 450 Eysenck, H. J., 304, 317

Eysenck, S. B. G., 304, 317 la

Eidelman, B. H., 59, 73

Eilenberg, M. D., 31, 53 Eisner, W. H., 379, 401 Elardo, P. T., 560, 574 Elkin, I., 59, 60, 72, 73

611

Faibish, G. M., 392, 401 Fairburn, C. G., 229, 245 Fairweather, G. W., 82, 95 Faltico, G., 236, 246

612 AUTHOR INDEX Farmer, M. E., 365, 370

Farmer, R., 460, 478 Farnbach, P., 236, 245 Farrell, A. D., 81, 95, 594, 601 Farrington, D. P., 491, 515

Faschingbauer, T. R., 462, 478 Faull, K. F., 380, 392, 393, 398 Faulstich, C., 328, 338, 350 Fauman, M. A., 34, 50 Faust, D., 39, 40, 50 Faustman, W. O., 40, 50, 379, 380,

386, 393, 398, 399, 400 Fava, J., 61, 69, 74

Feighner, J. P., 379, 398

Fitzgibbon, M. L., 582, 586, 590, 595, 602 Fjetland, O. K., 196, 197, 215 Fleetham, J., 152, 157 Flegenheimer, W., 229, 248 Fleiss, J. L., 355, 370 Flemenbaum, A., 382, 383, 398, 399 Fleming, R., 240, 245 Fletcher, J. M., 114, 125, 129, 130, 133 Fletcher, K. E., 229, 245 Flett, G. L., 280, 291 Fleuridas, C., 88, 95

Flowers, J. V., 88, 95 Flynn, T. J., 582, 602

Feister, S. J., 59, 60, 72

Foavl 3B

Ferris, S., 41, 50 Fetting, J. H., 32, 49, 224, 244 Ficken, Re P5733, '34,)50, 51 Fiedler, N., 557, 578 Fielding, J. M., 281, 289 Filstead, W: J., 151, 157 Finch, A. J., Jr., 297, 318 Finch, S. J., 281, 289 Finck, D., 555, 578 Fineberg, H. V., 46, 54 Fink, C. M., 596, 601

Foch sRAl

Finkelstein, S., 44, 48

Finn, S. E., 596, 597, 599 First, M. B., 596, 599 Firth, J., 90, 96, 222, 238, 247

2125 3493 1

sty

557

Foenander, G., 282, 289 Fogel, B.,39, 40, 50

Fogelman, B. S., 24, 53 Folkman, S., 293, 319

Follett, G. M., 479, 513 Follette, W. C., 90, 94, 95, 282, 289, 290, 334, 339, 591, 600 Folstein, M., 39, 40, 50, 53 Folstein, S., 39, 50 Fons, B. J., 471, 472, 478

Forbes, G. B., 480, 483, 486, 489, 512, 513 Ford, H., 392, 400

Fordyce, D. J., 355, 369

Firth-Cozens, J., 238, 247

Foreman, M., 40, 50

Fischer, J., 557, 578

Forness, S. R., 489, 513

Fischer, P. C., 309, 310, 317

Fischoff, J., 483, 514 Fishback, D., 41, 50 Fisher, P., 528, 549

Forsyth, R. P., 82, 95 Forsythe, A., 40, 49, 50 Fortman, J. B., 557, 559, 575 Foss, D. A., 558, 564, 577

Fisher, R. S., 355, 358, 370

Foster, S. A., 366, 370

Fishman, D. B., 118, 123, 133, 260, 278 Fisk, J. L., 483, 513 Fiske, D. W., 81, 82, 95, 360, 367, 369 Fiszbein, A., 380, 384, 399 Fitt, D. X., 104, 109 Fitzgerald, G., 554, 576

Foster-Higgins, 5, 7, 19 Foulds, G. A., 300, 316 Fournier, D. G., 322, 349 Fox, P:.105 19 Frances, A., 56, 73, 201, 215, 392, 398

Francis, D. J., 114, 116, 125, 129, 130, 133 Frank, E., 231, 243

AUTHOR INDEX Frank, Frank, Frank, Frank,

G. W., 582, 602 J. D., 217, 246, 354, 370 L. K., 252, 278 R., 32, 33, 35, 53

613

Frerichs, R. R., Freud, S., 293, 294, 295, 313, 317 Fricchione, G. L., 233, 242, 245 Fridman, R., 237, 246 Friedman, A. F., 137, 138, 139, 157 Friedman, H. S., 311, 316 Friedman, I., 361, 362, 365, 366, 369, 392, 399 Frisch, M. B., 593, 594, 599 Froyd, J., 77, 80, 95 Fruchtman, L., 327, 350 Fruzzetti, A. E., 337, 349 Fry, E., 491, 513 Fry, R., 238, 248

Ganmon, D., 490, 516 Gansereit, K. H., 354, 362, 365, 366, 368, 370 Gany, F., 38, 53 Garbin, M. A., 280, 281, 282, 285, 288 Garcia-Espana, F., 237, 246 Garfield, S. L., 62, 74, 84, 87, 95, 124, 133 Garfield, S. R., 583, 587, 599 Garfinkel, B. D., 560, 574 Garner, D. M., 238, 246 Garner, W. R., 585, 602 Garrison, B., 283, 284, 285, 286, 288 Garvey, M. J., 52, 72, 231, 246 Garvin, R. D., 150, 157 Garwood, J., 257, 278 Gatchel, R. J., 240, 245 Gauch, R. R., 560, 568, 576 Gaudry, E., 297, 317 Gaynor, J. A., 118, 132, 133 Gdowski, C. L., 327, 328, 338, 350, 481, 483, 484, 485, 486, 491, 505, 507, 513, 516 Geary, D. C., 260, 278 Geigle, R., 14, 15, 19

Fuerst, D. R., 483, 513

Geller, M. H., 66, 74

Fuhriman, A., 58, 73

Fulop, G., 24, 38, 50, 53

Gendreau, P., 406, 408, 420 Gent, C. L., 528, 530, 531, 549

Funkenstein, D. H., 305, 306, 317

Genthner, R. W., 366, 369

Fuqua, D. R., 309, 310, 317 Furby, L., 114, 115, 133 Furlong, M. J., 557, 559, 575

Gentry, W. D., 306, 308, 317

Freeman, H. E., 353, 369

Freeman, W., 581, 582, 600 Freiberg, P., 584, 592, 599 Freiman, K. E., 328, 329, 330, 350 French, T. M., 305, 315 Frenette, L., 406, 408, 420 Frensch, P., 36, 53

G

George, D., 34, 54

George, G. K., 32, 48 George, L. K., 24, 34, 47, 52 Gerardi, M. A., 59, 72 Gerber, P. D., 24, 32, 48

Gabbard, G. O., 382, 383, 384, 399 Gallagher, D., 227, 229, 247, 281, 283, 284, 285, 289 Gallucci, N. T., 149, 151, 157, 436, 440, 450, 451 Galton, F., 25, 50 Gammon, G. D., 483, 490, 511, 513 Gandhi, S., 40, 51 Gandhy, P. R., 491, 515 Ganguli, M., 32, 33, 34, 53

German, P., 33, 34, 37, 51, 53 Gerstle, R. M., 260, 278 Geschwind, N., 42, 51 Getsinger, S. H., 151, 157

Getter, H., 60, 65, 72, 73, 74 Ghoneim, M. M., 222, 236, 246 Ghonheim, M. M., 30, 52 Giambra, L. M., 281, 289 Giannetti, R. A., 595, 596

Gibbon, M., 596, 599

614 AUTHOR INDEX

Glasser, J. H., 9, 20

Gordon, R. A., 426, 428, 435, 442, 443, 446, 449, 450, 451, 452 Gorham, D. R., 361, 370, 372, 375, 376, 382, 383, 399, 400 Gorkin, L., 311, 317 Gorsuch, R. L., 59, 67, 74, 295, 296, 298, 299, 320, 405, 419, 498, 516 Gotlib, I. H., 333, 349 Gottlieb, G. L., 382, 399 Gottlieb, H. J., 151, 158 Gould, J. W., 31, 52, 280, 281, 290 Goyette, C. H., 552, 553,559, 561, 575 Grady-Fletcher, A., 334, 351 Graham, J. M., 589, 598 Graham, J. R., 56, 64, 73, 137, 138, 139 C150 81567157, 361% 362365; 366, 367, 369, 382, 392, 397, 398, 399, 400, 426, 427, 428, 429, 431, 434, 435, 437, 438, 440, 441, 442, 443, 450, 461, 478

Glazer, W., 232, 247

Grant, F. B., 47, 50

Glen, A. I. M., 282, 286, 288 Gleser, G. C., 83, 84, 95 Glod, C. A., 236, 247 Glow, P. H., 559, 561, 575 Glow, R. A., 559, 561, 575

Grawe, K., 65, 72

Gibbons, R. D., 281, 289 Gibson, R. L., 82, 95 Gibson, R., 582, 599 Gift, A. G., 227, 245 Gilbar, O., 230, 245 Gilberstadt, H., 137, 138, 139, 157 Gill, W. S., 198, 215 Ginath, T., 101, 109 Ginath, Y., 87, 95 Ginsberg, B., 38, 53 Gioia, P., 528, 549

‘Girodo, M., 240, 245 Gispen-De Wied, C. C., 236, 245 Gisriel, M. M., 240, 245 Gittelman, R., 557, 561, 574, 577

Gladis, M., 236, 248 Glaister, B., 81, 95 Glass, D. R., 59, 60, 72, 73

Glass, G. V., 83, 86, 97

Greden, J. F., 239, 243

Green, C. J., 456, 461, 462, 463, 466, 478 Green, J., 280, 281, 290

Glynn, S. M., 379, 393, 399, 400

Green, M. F., 392, 399

Goffaux, J., 311, 317 Goh, D. S., 489, 513 Gohs, D. E., 560, 568, 578 Goldberg, D., 29, 30, 35, 48, 50 Goldberg, I. D., 24, 33, 35, 52 Goldberg, L. R., 332, 349, 442, 451 Goldberg, S. C., 365, 369 Goldfarb, A., 39, 40, 51 Goldfried, M. R., 63, 73, 252, 257, 278 Goldgerger, E., 231, 247 Goldhamer, H., 353, 369 Golding, J. M., 24, 33, 40, 49, 50, 54 Goldman, E., 281, 290 Goldman, P. A., 47, 53

Greeny Ras.

Goldsmith, J. C., 582, 599 Goldstein, G., 67, 73

Goodwin, F. K., 32, 52, 365, 370

103,09

Green, S. M., 490, 515, 491 Green, W. J., 282, 289 Greenbaum,

R., 561, 575

Greenberg. Ex, 82, 5929.53

Greenberg, L. S., 333, 349 Greene, B. C., 83, 84, 95 Greene, R. L., 137, 138, 139, 150, 155,

157, 431, 451 Greenfield, T. K., 406, 407, 408, 413, 414, 415, Greist, J. H., 236,

402, 409, 417, 237,

403, 410, 418, 246,

Gretter, L., 60, 74 Gretter, M. A., 93, 96

Griffin, B. P., 106, 110 Griffith, E. H., 237, 245 Griger, M. L., 242, 246

404, 405, 411, 412, 419, 420 247

AUTHOR INDEX Grohmann, R., 377, 400 Grosby, H., 31, 48 Grossberg, I. N., 483, 512 Grove, W. M., 52, 72, 90, 95, 591, 600 Gruber, C. P., 491, 507, 512 Gualtieri, C. T., 556, 576 Guck, T. P., 151, 157

Hannum, T. E., 281, 290 Hanon, T. E., 389, 394, 399 Hansen, L., 585, 586, 592, 599, 600 Hanson, R. K., 256, 278 Harberg, E., 306, 308, 317, 318 Hardesty, A. S., 360, 369 Hardy, G. E.,238, 247

Gudeman, H., 355, 369

Hare, R. D., 197, 215

Guelfi, G. P., 380, 386, 399 Guiloff, R., 40, 49 Gunter, P. A., 236, 247 Gur, R. C., 380, 382, 399 Gur, R. E., 380, 382, 399 Guthrie, M., 242, 248

Harford, T. C., 47, 50 Hargreaves, W. A., 118, 133

H Haas, G. L., 88, 96, 392, 398 Haas, L. J., 9, 19 Haber, J. D., 152, 159

615

Harrington, R. G., 479, 513

Harris, C. W., 405, 419 Harris, R. E., 426, 451 Harrison, W., 238, 246 Harrop-Griffiths, J., 233, 247 Harrow, M., 365, 369

Hartfield, M. T., 311, 318 Harthorn, B. H., 33, 34, 35, 48 Hartigan, J. A., 585, 600 Hartlage, L. C., 581, 582, 600

Hadley, S. W., 82, 97, 101, 110

Hase, H. D., 332, 349 Hasin, D. S., 47, 50

Haglund, R., 40, 50

Hasset, T., 242, 247

Hahlweg, K., 334, 349

Hatch, D-R., 85, 96

Haimes, P. E., 444, 452 Hajek, V., 39, 50 Hakerem, G., 360, 369 Hale, W. D., 30, 49, 225, 243 Hall, R. P., 306, 308, 317 Hallahan, D. P., 489, 513 Haller, D. L., 137, 138, 139, 158, 444, 451 Halperin, J. M., 554, 555, 556, 557, 575, 576 Hamblin, D. L., 82, 94 Hamilton, M., 29, 31, 50, 59, 73, 77, 95, 196, 215, 280, 289, 294, 318, 379, 399

Hathaway, S. R., 137, 138, 139, 157,

Hadigan, C. M., 236, 248

196, 215, 328, 349, 434, 435, 451 Hatzenbuchler, L. C., 281, 289 Hauenstein, L. S., 306, 308, 318 Haverstock, S., 379, 398

Havighurst, R. J., 353, 369 Hawkins, J. D., 77, 97

Hawkins, W., 36, 37, 51 Hawthorne, D., 552, 558, 559, 578 Hawton, K., 24, 50

Hayduck, L. A., 128, 133 Hayes, S. C., 581, 582, 588, 589, 590, 596, 597, 600 Hays, K., 67, 74

Hammen, C. I., 93, 95, 281, 289

Hazaleus, S. L., 313, 314, 318

Hampe, E., 536, 549 Haney, T. L., 309, 317, 321 Hankin, J., 33, 52 Hanley, J. A., 46, 50 Hanna, P. S., 490, 513 Hannah, M. T., 64, 74

Headey, B. W., 594, 600 Healey, J..M., 554, 555, 556, 557, 575, 576 Healy, B. J., 254, 278 Heather, N., 589, 600 Hechtman, L. T., 567, 578

616 AUTHOR INDEX Hedgepeth, B. E., 225, 245 Hedlund, J. L., 31, 50, 382, 386, 399, 435, 451 Heikkinen, C., 37, 50

Heim, C. R., 311, 317 Heim, S. C., 337, 349 Heimberg, L., 283, 289 Heinemann, S. H., 392, 399 Heinrich, J. V., 30, 52 Heinrichs, D. W., 389, 394, 399 Heller, K., 198, 216 Helmreich, R., 328, 351

Helsing, K. J., 30, 49 Helzer, J. E., 32, 52, 53, 528, 549 Hemmings, K. A., 60, 74, 93, 96

Hendrichs, M., 224, 244 Hendricks, V. M., 232, 247 Henker, B., 555, 578

Henly, G. A., 498, 516 Henrichs, M., 32, 49 Henrichs, T. F., 442, 451 Henry, B. W., 392, 400 Henry, W. P., 67, 73, 87, 97, 257, 277 Hensley, V. R., 490, 514

Herbert, J. D., 87, 95 Herjanic, B., 491, 515 Herkov, M. J., 443, 451 Herman, I., 237, 248 Herman, R. D., 507, 513 Herron, E. W., 596, 600

Hippius, H., 377, 400

Hirschfeld, R. M., 32, 52 Hirsh, S. R., 283, 290 Hoaken, P. C. S., 281, 282, 290 Hodges, W. F., 295, 318 Hodgson, R. J., 196, 216

Hoelscher, T. J., 152, 157 Hoeper, E. W., 33, 34, 50, 52

Hoffman, H., 150, 151, 157, 392, 399 Hoffman, N. H., 151, 157 Hogarty, G. E., 358, 359, 364, 365, 367, 368, 369 Hogg,J. A., 314, 317, 318 Hole, A., 237, 248 Hollister, L. E., 372, 378, 379, 382, 385, 386, 399, 400, 401 Hollon, D. S., 52, 57, 72, 73, 93, 95, 282, 286, 290 Holmes, S., 60, 74 Holmes, T. H., 198, 215 Holmstrom, E. L., 594, 600

Holroyd, K. A., 311, 317, 318 Holsopple, J. Q., 372, 400 Holt, R. R., 252, 256, 260, 278 Holtzer, C. E., 281, 290 Holtzman, W. H., 596, 600 Holzer, III, C. E., 32, 52

Homatidis, S., 554, 575, 576 Hommer, D. W., 379, 380, 398, 400

Heverly, M. A., 104, 109, 110, 118, 125, 133 Heyman, R. E., 333, 351 Hill, R. D., 283, 284, 285, 289 Hillier, V. F., 30, 50 Hillman, S. B., 487, 515

Honigfeld, G., 379, 381, 399 Hooke, J. F., 297, 318 Hoover, D. K., 328, 329, 332, 349, 350, 351 Hope, A., 229, 245 Hope, K., 300, 316 Horine, L., 581, 582, 600 Horn, J. L., 112, 114, 132 Horn, W. F., 555, 576 Horowitz, L. M., 62, 73, 219, 229, 245 Horowitz, M. J., 237, 239, 245, 389, 390, 392, 393, 399 Horvath, A., 560, 577 Horvath, M., 221, 243

Himmelstein, P., 260, 278

Houpt, J. L., 34, 52

Hinrichs, J. V., 222, 236, 246

Howanitz, E., 233, 242, 245

Herron, L. D., 151, 157 Herschberger, P., 311, 318 Hersen, M., 67, 73

Hersh, S., 365, 370 Hesselbrock, M. M., 280, 281, 289 Hesselbrock, V. M., 280, 281, 289 Hester, R. K., 585, 586, 587, 600, 601

AUTHOR INDEX Howard, K. I., 101, 102, 104, 105, 106, 108, 109, 110, 118, 120, 125, 133, 588, 590, 594, 600 Howard, M. T., 281, 290 Howe, B. A., 240, 248 Howell, C. T., 490, 511, 517, 548 Hubbard, J. W., 379, 400 Huber, N. A., 149, 151, 157 Hudson, C. J., 239, 243

Hugdahl, K., 591, 600 Hughes, H. M., 479, 511 Hughes, J. N., 491, 513

617

Jacobs, G. A., 300, 303, 305, 306, 307, 308, 311, 313, 318, 320 Jacobs, J., 39, 40, 50 Jacobson, J. M., 432, 449 Jacobson, N. S., 90, 94, 95, 97, 282, 289, 290, 333, 334, 337, 340, 341, 342, 349, 591, 600 Jaeger, J., 381, 397 Jaffee, C. L., 9, 20 Jandorf, L., 233, 242, 245 Janisse, M. P., 311, 318 Jansen, D. G., 150, 151, 157

Huisman, J., 379, 398 Hulbert, T. A., 481, 513 Hullin, R., 554, 574

Jarrett, F. J., 281, 282, 290 Jarrett, R. B., 581, 582, 588, 589, 590, 596, 597, 600 Jarvik, L., 41, 51

Hulsey, T. L., 151, 156

Jefferson, J. W., 47, 50, 236, 246

Humphrey, F. J., 554, 576

Jemmott, III, J. B., 311, 316 Jenkins, C. D., 226, 244 Jenkins, R. L., 372, 400

Hughson, A. V. M., 47, 50

Hunsley, J., 256, 278

Hunt, R. D., 569, 574 Hunter, R. H., 102, 110 Hunter, S. H., 352, 361, 370 Hunter, S. M., 561, 576

Jenson, W. R., 489, 512

John, K., 490, 511, 513, 516 Johnson, C., 435, 451

Hurley, J. R., 221, 245

Johnson, D. A., 355, 370

Husum, B., 31, 48

Johnson, E. H., 300, 305, 306, 307, 308, 309, 311, 313, 318, 319, 320

Hwu, H. G., 528, 549 Hyde, T. S., 483, 516

Johnson, J., 31, 37, 54

Hymowitz, P., 587, 600

Johnson, M., 355, 358, 369 Johnson, P. L., 17, 20 Johnson, R., 37, 50

Imber, S. D., 59, 60, 72, 73, 82, 96 Imhof, E. A., 432, 433, 434, 446, 450 Ingram, R. E., 93, 95 Irving, D., 102, 110 Ishida, T., 283, 284, 285, 289

Johnson, S. M., 333, 349 Johnston, P., 359, 364, 368, 370 Johnstone, B. G. M., 233, 245 Johnstone, E. E., 393, 398 Jones, D. L., 151, 158

Jones, L. R., 33, 34, 50, 51 Jones, R., 229, 245

Itschner, L., 393, 398

Jones, S. B., 14, 15, 19

Ives, J. O., 235, 246

Jordan, K., 32, 48 Joreskog, K. G., 128, 133 Judd, B., 40, 51 Judd, F. K., 236, 245 Judd, L. Li, 32, 52,365, 370 Jung, K. G., 490, 512

dj; Jacewitz, J., 129, 132

Jackson, D. N., 185, 215, 460, 478 Jacob, R. G., 59, 73, 560, 576

Junge, A., 367, 370

618 AUTHOR INDEX Jungner, F., 47, 54 Jurencec, G., 557, 578

K

Katz, M. M., 352, 353, 354, 355, 357, 358, 359, 360, 361, 364, 367, 368, 369 Kauffman, J. M., 489, 513 Kay, D. C., 149, 151, 157, 440, 451 Kay, S. R., 380, 384, 399 Kazdin, A. E., 90, 92, 95, 490, 513, 554, 556, 576, 591, 594, 600

Kabat-Zinn, J., 229, 245 Kadden, R. M., 65, 72, 73 Kaemmer, B., 426, 427, 428, 429, 431, 434, 437, 438, 440, 441, 442, 443, 444, 450

Kearns, W. D., 311, 318 Kedward, H. B., 32, 51

Kagen, E., 560, 578

Keenan, P. A., 501, 513

Kahn, R. S., 39, 40, 51, 222, 236, 238, 245, 248

Keens, S., 560, 568, 575 Kehle, T. J., 489, 512, 554, 556, 576 Keiser, T. W., 329, 330, 351 Keith, S. J., 365, 370 Keith, T. M., 489, 512 Kellams, J. J., 379, 389, 401

Kaiser, H. E., 222, 245

Kaiser, M. K., 129, 133 Kalas, R., 491, 512, 528, 549 Kales, S. N., 554, 576 Kalichman, S., 311, 316, 318 Kalikow, K., 31, 37, 54 Kaltenbach, P., 555, 575

Kaltreider, N. B., 239, 245, 389, 390, 392, 393, 399 Kaman, H. C., 217, 246 Kameoka, V. A., 281, 290

Kamerbeek, D. W., 236, 245 Kamerow, D. B., 24, 51

Kammeier, M. L., 137, 138, 151, 156, 157 Kane, J. M., 379, 384, 399, 401 Kane, M. T., 31, 51 Kane, R. A., 101, 109 Kane, R. L., 101, 109 Kanonchoff, A. D., 311, 316 Kaplan, H. B., 392, 401 Kaplan, R. M., 593, 600 Kaplin-Denour, A. K., 230, 245

Karasu, T. B., 588, 600 Karno, M., 24, 40, 47, 49, 50, 52 Karson, C. N., 387, 399 Kashani, J. H., 483, 513 Kaszniak, A. W., 37, 41, 48, 60, 62, 74 Kates, W., 561, 574

Katon, W., 24, 33, 35, 47, 51, 233, 238, 239, 245, 247

Keane, T. M., 196, 216

Keller, L. S., 139, 151, 157

Kellner, R., 379, 399 Kelly, E. J., 489, 513 Kelly, S. J., 240, 245 Kelman, H. C., 354, 370

Kelsoe, J., 379, 400, 401 Kempf, E. J., 25, 51 Kempt, J., 232, 246 Kendall, P. C., 31, 51, 90, 93, 95, 297, 318, 591, 600 Kennedy, L. L., 382, 383, 384, 399 Kennedy, S., 282, 289 Kenny, D. A., 126, 127, 128, 133 Kessler, K. A., 582, 600

Kessler, L. G., 33, 34, 51, 590, 602 Kestenbaum,

R., 229, 248

Khavin, A. B., 64, 73 Kiesler, C. A., 5, 6, 7, 8, 10, 20, 584,

585, 593, 596, 600 Kiesler, D. J., 66, 67, 73, 100, 103, 109 Kiessling, L. S., 559, 578 Kilroy, V., 37, 49 Kim, J-O, 405, 406, 414, 419 Kim, S. W., 236, 246 Kinder, B., 311, 316, 318 King, A. C., 585, 602 King, C. A., 483, 513

AUTHOR INDEX King, King, King, King, King,

C., 554, 576 D., 281, 288 K. M., 222, 241, 244 R. J., 380, 398 S. H., 305, 306, 317

619

Koch, R., 230, 247, 248 Kolb, D. L., 64, 73 Kong, Y., 309, 321 Konick, D. S., 361, 362, 365, 366, 369, 392, 399

Kingston, M. D., 85, 96

Konstantareas, M. M., 554, 575, 576

Kinney, N., 38, 49

Kopta, A. M., 101, 104, 105, 106, 109, 110 Kopta, S. M., 118, 120, 125, 133, 588, 590, 594, 600

Kirby, S., 40, 53 Kiresuk, T. J., 19, 87, 95, 96, 98, 99, 101, 102, 109, 149, 156, 339, 349, 390, 391, 392, 398, 415, 417, 418, 419, 439, 440, 450 Kirkner, F. J., 257, 278 Kirtner, W. L., 81, 82, 95 Kiser, L. J., 483, 513 Kiviahan, D. R., 151, 158 Klaric, S. H., 528, 549 Kleber, H. D., 232, 247, 281, 290 Klein, A. R., 563, 576 Klein, H. E., 355, 369

Kraemer, H. C., 383, 393, 400

Klein, H., 490, 515

Krakauer, S. Y., 428, 451

Klein, L., 33, 34, 37, 40, 53 Klein, M. A., 328, 338, 350 Klein, M. H., 237, 247 Klein, R. H., 240, 245, 355, 368 Kleinman, P. H., 232, 246

Kramer, M., 24, 32, 34, 35, 37, 40, 47, 51552

Kleinmuntz, B., 595, 600, 601

Klerman, G. L., 36, 46, 52, 54, 60, 73, 161, 184, 222, 226, 244, 246 Klett, C. J., 355, 360, 370 Klett, J. C., 381, 372, 376, 377, 382, 386, 399, 400 Kline, R. B., 481, 483, 486, 487, 489, 499, 500, 501, 503, 513, 514, 515 Klinedinst, J. K., 327, 351, 481, 483, 490, 516 Klinefelter, D., 429, 441, 449, 450 Klonoff, H., 152, 157 Klonoff, P. S., 355, 358, 370 Klopfer, B., 252, 257, 278

Klopfer, W. G., 252, 278, 581, 60 Knight, D., 490, 514 Knight, R. G., 307, 318 Knight-Law, A., 64, 73 Knoff, H. M., 481, 514

Korchin, S. J., 10, 20, 581, 601

Koriath, U., 556, 576 Kornetsky, C., 238, 243 Koss, M. P., 432, 451 Kostyniuk, A., 483, 512 Kotik, D., 460, 478

Kovacs, M., 282, 283, 284, 285, 286, 288, 290, 291, 491, 514 Kozak, M. J., 313, 317

Krames, I., 280, 291 Krasner, S. S., 308, 311, 318, 320 Krasnoff, A., 151, 157

Krause, M. S., 101, 109, 118, 120, 125, 133, 588, 590, 594, 600 Krauss, S., 37, 49

Krishnamurthy, R., 436, 450 Kristeller, J.. 229, 245

Kroll, P. A., 240, 243 Krosesler, D., 233, 242, 245 Krueger, M., 36, 49 Kruger, A., 32, 53

Kudler, H. S., 240, 244 Kuehne, C., 554, 556, 576 Kulhara, P., 379, 380, 397, 400 Kupfer, D. J., 231, 243

Kuppenheimer, M., 231, 247 Kurdek, L. A., 507, 514 Kurland, A. A., 354, 362, 365, 366, 368, 370, 393, 401 Kurpnick, J., 392, 399 Kursh, E. D., 241, 243

620 AUTHOR INDEX Kurtz, J., 198, 216 Kurtz, L. F., 406, 416, 419 Kurtz, R., 583, 587, 599 Kutcher, M., 196, 197, 215

15 La Combe, J., 328, 338, 350 Lachar, D., 137, 138, 139, 156, 157, 327, 328, 329, 330, 350, 351, 432, 442, 448, 451, 480, 481, 483, 484, 485, 486, 487, 489, 490, 491, 499, 500, 501, 503, 505, 507, 513, 514, 515, 516 Lacks, P., 90, 94, 96 LaCombe, J. A., 487, 515 LaCroix, J. M., 64, 73 Lader, M., 295, 318 LaForge, R., 361, 370 Lahey, B. B., 490, 491, 515, 557, 577 Lambert, M. J., 60, 73, 74, 76, 77, 80, 81, 84, 85, 86, 89, 90, 91, 92, 95, 96, 97, 101, 109, 201, 216, 280, 290, 470, 477 Lambert, N. M., 555, 576 Lambley, P., 581, 582, 601 Landau, S., 552, 559, 576 Landerman,

R., 32, 48

Landsverk, J., 40, 47, 48, 49, 50 Lang, P., 196, 216 Langs, R., 466, 478 Lanyon, R. I., 348, 349 Laprade, K., 552, 553, 558, 561, 578 Larsen, D. L., 402, 405, 406, 407, 411, 419 Larsen, J. K., 382, 383, 384, 397, 399 Larson, E., 40, 53 LaRue, A., 41, 51 Laseg, M., 490, 511 Lasky, J. J., 355, 360, 370 Latham, L. E., 303, 307, 318 Pazar. Bers2, 52 Lazarus, A. A., 58, 74 Lazarus, R. S., 293, 318, 319

Leat) PJ 32eo2 Leake, B., 47, 48 Leary, T., 66, 74 Lebell, M., 236, 246 Leber, W. R., 59, 60, 72, 73 Lebovits, A., 38, 53, 242, 247 Lebow, J. L., 406, 409, 411, 413, 419, 588, 600 Lee, C. K., 528, 549 Lee, H. B., 405, 419 Lee, K. L., 309, 321 Lee, Pil, 5595576 Leepek, J. D., 33, 34, 50, 51 Lehman, A. F., 407, 408, 409, 419 Lehrer, P. M., 237, 243 Leigh, G. K., 88, 95 LeighehsE:, 68795 Leltieri, D. J., 415, 419 Lemmi,

H., 281, 288

Lemperiere, T., 282, 289 Lenderking, W. R., 229, 245

Lenihan, P., 40, 49 Leonard, E., 309, 310, 317 Leonard., K., 312, 316

Leong, A., 392, 393, 399 Leong, G. B., Leonhard, C., 589, 600

Leppig, M., 377, 400 leResche, L., 32, 53 Lerner, P. M., 254, 278 Leserman, J., 233, 245 Lesher, E., 40, 51

Lesser, I. M., 222, 236, 243 Lester, R. K., 17, 19 Leung, P. W., 559, 576

Levick, S., 380, 399 Levin, B. L., 9, 20 Levin, S., 88, 97 Levine, E. G., 233, 246 Levine, M., 38, 49

Levine, S., 226, 236, 241, 243, 244, 246 Levitt, E. Es) 2959319 LeVois, M., 402, 404, 406, 407, 416, 419

AUTHOR INDEX Lewak, R. W., 137, 138, 139, 149, 157, 158, 437, 442, 451 Lewandowski, D., 382, 400, 583, 587, 601 Lewin, M. E., 582, 602 Lewinsohn, P. M., 282, 288 Lewis, A. B., 88, 96 Lewis, P., 82, 96

Lewis, S., 490, 512 Liberman, R. P., 379, 388, 389, 400 Libow, L., 41, 51 Liebowitz, M. R., 238, 246 Lieh, M. F., 559, 576 Lilly, R. S., 361, 362, 365, 366, 369, 382, 397, 398 Lin, E., 24, 33, 35, 47, 51 Linden, W., 13, 20, 591, 601 Linder-Pelz, S., 409, 419 Lingoes, J. C., 426, 451 Linn, Ls 245733, 34, 37, 31 Lipman, R. S., 27, 30, 49, 217, 244 Lipscomb, P., 24, 33, 35, 47, 51 Lipsedge, M. S., 282, 286, 290 Lipton, D. S., 232, 246 Liskow, B., 231, 232, 246 Eitt"Me DB", 65, 72573 Littlefield, C., 31, 49 Littlefield, E., 36, 37, 51 Lizando, E., 379, 400

Loar, L2L., 594, 956,.576 Loberg, T., 592, 601 Locke, B. Z., 24, 30, 32, 47, 48, 52, 238, 248, 365, 370 Locke, H. J., 77, 96, 327, 349 Loeber, R., 490, 491, 515 Loevinger, J., 185, 216, 458, 460, 478 Lohr, J. B., 377, 378, 400 Loney, J., 536, 549, 552, 557, 559, 568, 576, 577 Long, C. J., 151, 158 Longabaugh,

R., 594, 601

Loper, R.'G:, 151, 157 Lopes, C. E., 231, 244 Lord, F. M., 114, 133 Lorr, M., 355, 360, 361, 370, 372, 400

621

Love, A. W., 151, 158 Lovitt, R., 582, 601 Lowery, H. A., 358, 359, 360, 361, 364, 368, 369 Lowman, J. C., 327, 350 Lubin, B., 280, 281, 291, 295, 321, 453, 478 Luborsky, L., 63, 65, 72, 74, 75, 82, 96, 237, 248, 417, 419 Lucas, J., 36, 53

Lucas, P. B., 379, 400, 401 Luce, R. D., 26, 51

Luk#'S: 1, 7559"5/6 Lukoff, D., 388, 389, 400 Lum, D., 231, 247 Lushene, R. E., 59, 74, 295, 296, 298, 320, 498, 516 Lyerly, Ss. B., 352).3535354, 355)3574 360, 362, 364, 367, 368, 369 M MacAndrew,

C., 149, 158, 426, 451

Macchitelli, F. J., 151, 158 MacDonald, D. I., 24, 51 MacDonald,

R., 65, 72

MacDougall, J. M., 309, 317 Machado, P. P. P., 65, 74 Madri, J. J., 46, 51 Maggs, R., 282, 289 Magill, M., 34, 54 Magnussen, M. J., 581, 582, 599 Maguire, M., 355, 358, 369 Maher, C. A., 88, 96

Mahy, G. E., 237, 245 Maier, W., 222, 243 Main, A., 92, 97 Maiuro, R. D., 301, 316, 311, 321 Majovski, L. V., 560, 568, 576 Malachowski, B., 490, 512 Malec, J., 233, 246 Malt, U. F., 30, 51 Maltz, A., 483, 514 Mangen, D. J., 101, 109 Mangrum, L. F., 334, 350

622 AUTHOR INDEX Manicavasagar, V., 60, 74 Mann, N. A., 379, 401 Mannuzza, S., 384, 401

Mantor, K. G., 32, 48 Marcus, M., 560, 578 Marder, S. R., 236, 246 Marengo, J., 365, 369 Margalit, M., 554, 557, 559, 576 Margolin, G., 333, 349, 351

Mark, D., 63, 72 Markowitz, J., 238, 246 Marks, P. A., 137, 138, 139, 149, 157, 158, 428, 429, 431, 437, 438, 442, 444, 448, 451 Markwardt, F. C., Jr., 499, 512 Marmar, C., 237, 245, 389, 390, 392, 399 Marriott, P. F., 236, 245 Marsden, C., 40, 49 Marsella, A. J., 281, 290

Marshall, B. D., 379, 400 Martin, I., 295, 319 Martin, P. J., 393, 400 Martin, R. P., 491, 515 Maruish, M., 4, 10, 11, 20, 432, 433, 434, 446, 450 Maser, J. D., 26, 51

Mason, B., 255, 278 Masser, B. H., 560, 574 Massimo, J. L., 82, 97 Massion, A. O., 229, 245 Master, N., 47, 48 Masters, M. A., 309, 310, 317

McBeth, C. D., 221, 241, 244 McCabe, S.B., 337, 349 McCann, J. T., 148, 158, 166, 172, 173, 184 McCarthy, D., 501, 515 McCartney, J., 40, 51 McClellan, K., 587, 601

McClelland, M., 32, 33, 34, 53 McColgan, E., 483, 513 McConaughy, S. H., 490, 511, 517, 518, 522, 528, 530, 531, 532, 534, 538, 548, 549 McCrae, R. R., 196, 197, 198, 215 McCreary, C., 151, 158 McCullough, L., 229, 248, 594, 601 McDermott, P. A., 489, 515 McDermott, R., 36, 37, 51

McDonald, C., 281, 288, 365, 369 McDougall, G., 39, 51 McDowell, I., 368, 370 McElroy, M. G., 280, 281, 290 McGovern, M. P., 101, 102, 104, 109 110 McGrath, P. J., 238, 246 McHugh, P., 39, 50

McKinley, J. C., 137, 157, 196, 215, 236, 246, 328, 349 McLarnon, M. C., 379, 398 McLellan, A. T., 589, 601

Mavissakalian, M., 90, 96

McLellan, T., 237, 248 McLemore, C. W., 67, 74 McMahon, W., 554, 556, 576 McMillian, S. C., 311, 319 McNair, D. M., 355, 360, 370 McNairy, R. M., 393, 400 McNeil, B. J., 15, 19, 46, 50, 54 McNeilly, C., 101, 104, 110 McNiel, D. E., 405, 419, 420 McReynolds, P., 295, 319 McRoberts, C. H., 77, 81, 96

May, A. E., 280, 281, 290

McSweeney,

May, P.R. A., 236, 246

McWilliams, J., 151, 158

McArdle, C. S., 47, 50

Meagher, R. B., Jr., 456, 461, 462, 463,

McArthur, D. L., 151, 158 McArthur, J., 40, 53

466, 478 Mechanic, D., 7, 20

Masterson, J., 151, 159

Matarazzo, J. D., 584, 593, 597, 601 Mathews, L., 281, 289 Matthews, E. J., 238, 243 Mattison, R. E., 554, 576

Maurer, H. S., 151, 158

A. J., 355, 368

AUTHOR INDEX Medenis, R., 566, 574 Medley, D. M., 300, 303, 316 Meehl, P. E., 43, 51, 137, 138, 139, 157, 442, 451, 459, 461, 478, 595, 601 Meese, M., 242, 247 Megargee, E. I., 3, 20 Meilman, P. W., 151, 157 Meisler, A. W., 232, 243 Melges, F., 283, 290 Melisaratos, N., 30, 49, 90, 95, 217, 244 Mellergard, M., 219, 230, 247

Mellon, J., 63, 72, 74 Meltzer, H., 379, 394, 399, 400 Melville, M. L., 32, 48 Mendels, J., 281, 282, 286, 290 Mendelsohn, G. A., 66, 74 Mendelson, M., 31, 48, 59, 69, 72, 279,

281, 288, 379, 397 Mendias, R. M., 406, 407, 420 Merbaum, M., 281, 290 Meredith, K., 64, 65, 69, 72, 230, 243 Merideth, C. H., 379, 398 Merikangas, K. R., 32, 54, 490, 511, 516 Merry, W., 64, 65, 69, 72, 230, 243

623

Miller, A. B., 232, 246 Miller, A. G., 281, 289 Miller, D., 435, 451 Miller, J. B., 240, 243 Miller, L. C., 536, 549 Miller, M. L., 490, 512 Miller, R. C., 86, 96 Miller, S. T., 24, 53 Miller, T. I., 83, 86, 97 Miller, W. R., 585, 587, 590, 600, 601 Millman, R. B., 232, 246 Millon, T., 148, 158, 163, 164, 165, 166, 167, 168, 174, 184, 456, 459, 460, 461, 462, 463, 465, 466, 467, 468, 469, 473, 478 Mills, M. J., 392, 401 Milne, C. R., 64, 72 Milstein, V., 379, 389, 401 Minde, K., 560, 568, 575 Minkoff, K., 284, 290 Mintz, J., 82, 96, 100, 103, 109, 236, 246, 379, 392, 399, 400 Mirouze, R., 282, 289

Mesulam, M., 42, 51

Mitchell, C. M., 233, 245 Mitchell, E. S., 311, 321 Mitchell, J. B., 35, 53 Mock, J., 59, 69, 72, 279, 281, 288, 379, 397 Mohr, D. C., 60, 62, 64, 65, 69, 72, 74, 30, 243

Metcalfe, M., 281, 290

Moldawsky, S., 5, 20

Metha, M. P., 30, 52 Metz, C. E., 46, 51

Monachesi, E. D., 434, 451

Metzger, D. S., 232, 247

Monsma, B., 327, 350

Meyer, J. D., 596, 601

Monti, P. M., 81, 95, 96

Meyer, J. K., 222, 241, 244 Meyer, J., 40, 51

Moore, J. E., 151, 158, 393, 400

Messer, S. B., 87, 96 Messick, S., 221, 246

Meyer, R. E., 280, 281, 289

Michaux, M. H., 354, 362, 365, 366, 368, 370 Midha, K. K., 379, 400 Mikail, S. F., 483, 515 Mikulka, P. J., 297, 318 Milich, R., 536, 549, 552, 554, 559, 561, 568, 576, 577

Monetti, C. H., 558, 564

Moore, A. D., 355, 370

Moore, J. T., 34, 51, 54, 379, 389, 401 Moos, R. H., 593, 601 Moran, P. W., 282, 290 Moreland, K. L., 435, 452 Morency, A., 489, 516

Morey, L. C., 148, 158, 185, 192, 193, 194, 195, 196, 197, 196, 197, 198, 200, 201, 211, 213, 214, 215, 216

624 AUTHOR INDEX Morganstein, A., 554, 555, 556, 557, 575, 576 Morris, A. G., 561, 574 Morrison, L. A., 238, 247 Morrow, G. R., 32, 49, 224, 244 Mortel, K., 40, 51 Morton, T. L., 5, 6, 7, 9, 10, 20, 584,

593, 596, 600 Moses, J. A., Jr., 40, 50, 310, 315, 319, 380, 393, 398, 400 Moses, T., 490, 511 Mozdzierz, G. J., 151, 158

Neimeyer, R. A., 60, 74, 101, 110, 233, 246 Nelson, B. A., 240, 243 Nelson, G. E., 149, 156, 157, 437, 442, 451 Nelson, J. E., 415, 419

Nelson, P., 36, 53 Nelson, R. O., 581, 582, 588, 589, 590, 596, 597, 600 Nelson-Gray, R. O., 460, 478 Nesse, R. M., 222, 239, 243 Neuhauser, D., 46, 54

Mozley, D., 380, 399

Neutra, R. R., 46, 54

Mueller, C. W., 405, 406, 414, 419

Newcomer, J. W., 379, 380, 386, 393,

Mueser, K. T., 87, 95

400 Newcom, J. H., 554, 555, 556, 557, 575, 576 Newell, C., 368, 370 Newfield, N., 334, 349

Mulaik, S. A., 405, 420 Muldawer, M. D., 379, 399 Mungas, D., 46, 52

Munroe, S. M., 81, 96 Murphy, G. E., 62, 74 Murphy, J. M., 46, 48, 52, 53 Murphy, R. W., 151, 159 Murray, S., 36, 37, 51 Myers, J. K., 24, 32, 47, 52, 54 Myers, J. L., 126, 129, 133 Myerson, P. G., 301, 321

Mylar, J. L., 81, 96

Newhouse, A., 220, 246

Newman, F. L., 13, 15, 17, 19, 20, 98, 99, 100, 101, 102, 103, 104, 105, 106, 108, 109, 110, 118, 125, 132, 133, 134, 149, 156, 339, 349, 390, 391, 392, 398, 415, 417, 418, 419, 439, 440, 450, 590, 601 Newman, S., 528, 549 Newson-Smith, J. G. B., 283, 290 Newton, A., 355, 370

N Naber, D., 377, 400 Nael, S., 242, 248 Nailboff, B. D., 151, 158 Nair, N. P. V., 379, 400 Narens, L., 26, 51

Nathan, P. E., 150, 158, 592, 601 Nathan, R. G., 33, 34, 49 Nayak, R., 379, 400 Nebeker, H., 404, 410, 413, 415, 416, 420

Nguyen, T. D., 103, 109 Nickel, E. J., 231, 232, 246 Nielson, A. C., 27, 52 Nieminen, G. S., 561, 575 Nies, A., 235, 246 Nies, G., 281, 289 Nietzel, M. T., 60, 74, 92, 93, 96, 97 Nimorwicz, P., 240, 245

Noble, H., 536, 549 Norris, W. R., 483, 516 Novaco, R. W., 301, 314, 319

Novak, C., 560, 568, 575

Neeman, K., 64, 74

Noyes, R., 222, 236, 243

Neeman, R., 124, 133, 134 Neff, D. F., 59, 72 Neider, J., 392, 400

Nuechterlein, K. H., 388, 389, 392, 399, 400 Nuguera, R., 282, 289

AUTHOR INDEX Nunnaily, J. C., 27, 52, 100, 104, 110 Nussbaum.

M., 483, 515

Nyczi, G. R., 33, 34, 50, 52

O O’Brien, J. D., 554, 555, 556, 557, 575, 576

Pass, R., 379, 398 Pathak, D., 379, 399 Pato, C., 379, 400 Pattison, E. M., 592, 601 Patton, M. J., 392, 401 Paul, R., 557, 577 Paul, S. M., 379, 401 Paz, G. G., 393, 399

O’Connor, P., 490, 512

Pearson, D. A., 499, 515

O’Donnell, J. P., 554, 577

Pendleton, D., 409, 420 Perlin, S., 355, 358, 370 Perlmutter, F., 392, 399

O’Leary, K. D., 337, 349, 560, 576 O’Leary, S. G., 566, 577 Oettinger, L., 560, 568, 576 Ogborne, A. C., 592, 601 Ohrvik, J., 379, 398 Olson, D. H., 322, 349 Omizo, M. M., 560, 577 Omizo, S. A., 560, 577 Opipari, L., 483, 513 Opler, L. A., 380, 384, 399 Orlinsky, D. E., 588, 590, 594, 600 Orvaschel, H., 490, 511

Orvin, G. H., 435, Ost, L., 591, 600 Overall, J. E., 361, 376, 377, 378, 391, 392, 399,

450

Peter, D., 557, 577

Peters, La G2 3556370 Peterson, R. F., 483, 511 Pfefferbaum, B., 387, 401 Piacentini, J., 528, 549

Pichot, P., 385, 386, 400 Pickar, D., 379, 380, 398, 400, 401 Pierce, L., 490, 515 Piotrowski, C., 453, 478

Piotrowski, D. L., 432, 433, 434, 435, 436, 450 Pleasants, D. Z., 387, 398

370, 372, 373, 375, 382, 383, 385, 386, 400, 401

P,Q

Pliske, R., 557, 578 Plomin, R., 557, 577 Pokorny, A. D., 379, 392, 399, 401 Poppen, R., 560, 577 Poscher, M. E., 380, 398

Packer, R. J., 483, 515 Pagel, M., 334, 349 Pancoast, D. L., 431, 435, 438, 442, 450, 452 Paolino, A. F., 361, 362, 365, 366, 369, 392, 399 Parker, F. C., 561, 576

Parker, G., 359, 364, 368, 370 Parler, D. W., 387, 398 Parloff, M. B., 354, 370, 506, 516 Parsons, O. A., 592, 601 Pascoe, G. C., 404, 406, 407, 409, 411, 418, 420 Pascualvaca, D. M., 554, 555, 556, 557, 575.576

625

Posthuma, A., 355, 370

Prigatano; G. P., 355, 369 Prinz; FJ sO9o7 577 Prinz,;-Re.; 557.00 Pruim, R. J., 366, 370 Pruitt, D. B., 483, 513 Prusoff, B. A., 490, 511, 513, 516 Pugh, R., 483, 513 Pull, C. B., 391, 401 Quade, D., 556, 576

Quinn, P. O., 561, 577

R Radosh, A., 561, 577

626 AUTHOR INDEX

Reed, D., 560, 577

Simpkins, C. G., 593, 596, 600 Skinner, H. A., 589, 602 Sleator, E. K., 552, 560, 574, 578 Slutske, W. S., 596, 599

Reichelt, P. A., 595, 601

Smeltzer, D. J., 558, 574

Retzlaff, P. J., 593, 594, 599 Revenstorf, D., 591, 600

Smith, R., 554, 574 Smyth, R., 583, 587, 602

Reynolds, W. M., 555, 577, 582, 587,

Snowden, L. R., 584, 593, 602

601 Reznikoff, M., 583, 587, 602 Rickard, K. M., 561, 577 Roberts, M. A., 559, 561, 568, 576, 577 Rodin, J., 593, 601

Solanto, M. V., 559, 578

Rancurello, M. D., 555, 556, 561, 575 Rapoport, J. L., 561, 569, 574, 577 Raymer, R., 560, 577

Sorenson, J. E., 590, 601

Rosenbaum, M., 561, 577

Spitzer, R. L., 596, 599 Sprague, R. L., 552, 560, 574, 578 Spunt, A. L., 566, 574 Stark, K. D., 555, 577 Stein, M. A., 554, 577 Steinkamp, M. W., 560, 568, 577 Strider, F: D:, 5817 3827598 Sullivan, J., 560, 568, 575

Rosenblad, C., 560, 576

Sullivan, M. A., 566, 577

Rump, E. E., 559, 575

Sullivan, S., 584, 593, 602 Sundberg, N. D., 584, 593, 602 Svard, J., 558, 564, 577 Swartz, J. D., 596, 600 Sweeney, J. A., 582, 586, 587, 590, 595, 600, 602

Roemer, M. I., 583, 601

Rogers, W. L., 594, 599 Rohrbeck, C. A., 557, 577 Roper, B. L., 596, 602

Rutter, M., 563, 578

Ss Sacuzzo, D., 583, 587, 601

Salovey, P., 593, 601 Sandberg, S. T., 554, 557, 559, 561, 563, 577 Sandoval, J. H., 555, 576, 577

ae -Taube, C. A., 590, 602

Sassone, D. M., 555, 576

Taylor, E., 554, 559, 563, 577, 578

Satin, M. S., 558, 564, 577 Schachar, R., 563, 578 Schaughency, E. A., 557, 577 Schlosser, B., 593, 594, 597, 602

Taylor, S. E., Tellegen, A., 434, 437, 444, 450,

Schuldberg, D., 581, 601

Therrien, R. W., 557, 578

Scribanu, N., 561, 577 Seidel, W. T., 552, 574 Semands, S. G., 560, 577 Shaffer, D., 557, 561, 577 Sharma, V., 554, 555, 556, 557, 576 Shaw, D., 555, 576 Shaw, J. H., 560, 578 Sibulkin, A., 585, 600 Sigman, M., 558, 577 Simonds, J. F., 560, 577

Thompson, P., 557, 578 Thorley, G., 558, 559, 563, 578

593, 426, 438, 586,

602 427, 428, 429, 431, 440, 441, 442, 443, 599

Thornby, J. I., 440, 451

Thorpe, J. S., 596, 600 Tolchin, M., 583, 602

Tonsager, M. E., 596, 597, 599 Topinka, C., 558, 577 Tramontana, M. G., 470, 478, 506, 516

Trites, R. L., 552, 553, 558, 561, 578 Troland, K., 483, 512

AUTHOR INDEX Tryer, S., 554, 574 Tulkin, S., 582, 602 Tupper, D. E., 593, 602 Twentyman, C. T., 557, 577 Tyans, S., 490, 511 Tyler, F. T., 441, 452 Tyler, L. E., 587, 602

WAY

627

Weissbluth, M., 560, 578

Weissbrod, C. S., 561, 568, 575 Weissman, M. M., 490, 511, 516, 593,

602 Weithorn, c. J., 560, 578

Wells, K. C., 555, 567, 570, 575, 576 Welner, Z., 491, 515, 528, 549 Welsh, G. S., 426, 440, 441, 452 Wen, F. K., 591, 601 Werder, E. H., 559, 578 Wepman, J. M., 489, 516

Ullman, D. G., 552, 557, 578 Ullmann, L. P., 441, 452 Ulrich, R. F., 552, 553, 559, 561, 575 Vaidya, A. F., 483, 513 Vale, C. D., 596,602 Van Bourgondien, M. E., 556, 576

Werry, J. S., 552, 556, 558, 559, 576, 578 Whalen, C. K., 555, 578 White, J. L., 435, 450 Wickramaratne, P., 490, 511, 516 Wicks, J., 528, 549

Van Denburg, E., 466, 467, 478

Wieselberg, M., 557, 561, 563, 577

Van Kammen, W. B., 491, 515

Wigdor, A. K., 585, 600, 602

van Reken, M., 466, 478

Voelker, S., 505, 507, 516

Wiggins, J. S., 426, 441, 452, 461, 478 Williams, C. L., 426, 427, 428, 429, 431, 432, 434, 437, 438, 440, 441, 442, 443, 444, 450, 452, 461, 478 Williams, J. B. W., 596, 599

von Baeyer, C. L., 483, 515

Williamson, G. D., 561, 576

VandenBos, G., 583, 602

Velez, C. N., 490, 512 Villanueva, M., 593, 594, 599

Wilson, C. C., 555, 577

Wade, T. C., 582, 587, 602

Wilson, J. M., 559, 578 Winett, R. A., 585, 602 Winsberg, B. G., 558, 564, 577

Wagner, W. G., 483, 516

Winslow, R., 582, 602

Waldo, D., 582, 599

Winters, K. C., 498, 516

Wallace, D. J., 554, 576

Wirt, R. D., 481, 483, 490, 516 Withey, S. B., 407, 408, 409, 418 Wolf, L. E., 554, 555, 557, 556, 575, 576 Won Cho, D., 435, 451 Wood, J. D., 435, 451 Wright, W. R., 409, 411, 420 Wrobel, T. A., 432, 451 Wynne, M. E., 554, 556, 578

W

Walton, C., 581, 582, 600 Ware, J. E., Jr., 409, 411, 420 Warner, V., 490, 511, 513, 516 Warr, P. B., 594, 602 Waskow, I. E., 506, 516 Waters, B., 490, 514

Wearing, A. J., 594, 600 Webber, L. S., 561, 576 Wechsler, D., 499, 516 Weintraub, M. D., 483, 512 Weiss, D. J., 596, 602 Weiss, G., 560, 578 Weiss, M., 470, 478

Y Yager, T. J., 24, 30, 33, 34, 37, 51, 53 Yang, K. C., 235, 244

628 AUTHOR INDEX Yao, K. N., 559, 578 Yates, B. T., 105, 110, 118, 134 Yeager, D. C., 582, 599 Yeh, E. K., 528, 549 Yeh, W., 379, 380, 386, 400

Zs

Yetevanian, B., 281, 288

Zabora, J. R., 227, 248 Zackary, R. A., 118, 133 Zagami, E. A., 233, 245 Zelin, M. L., 301, 321 Zentall, S. S., 555, 560, 561, 568, 578 Zervas, I., 233, 242, 245

Yevzeroff, H., 217, 244

Zhiming, M., 233, 245

Yopenic, P. A., 24, 54

Zimet, C: N.- 5, 75,9) 10, 21 Zimmerman, D. W., 115, 134

Yenson, R., 393, 401 Yesavage, J. A., 392, 401

Yorkston, N. J., 206, 216

Young, 576 Young, YOung Young,

J. G., 554, 555, 556, 557, 575,

Zimmermann, R. L., 282, 283, 352,

J. L., 237, 245 ar. benzo? ese R. D., 554, 576

Zinowski, M., 114, 115, 133

361, 370, 398, 399

Youngstrom, N., 584, 592, 602 Yuan, H. A., 64, 74

Yudin, L. E., 392, 399

Zubin, J., 360, 369, 370 Zuckerman, M., 151, 159, 280, 291, 295, 321, 498, 516 Zung, E. M., 31, 54 Zung, W. K. W., 29, 31, 34, 54, 280, 291 Zuroff, D., 281, 288 Zwemer, W. A., 314, 321

Zwick, R., 406, 419 Zwick, W.R., 81, 95

Subject Index

MacAndrew Alcoholism Scale, 149, 155 MCMI-II and, 166, 175 Adaptive Category Test (ACAT), 180 Adolescent populations, tests for Child Behavior Checklist Teachers’ Report Form, 519-520 Youth Self’Report, 519-520 Conners Rating Scales (CRS) Millon Adolescent Clinical Inventory (MACI) Millon Adolescent Personality Inventory (MAPI) Millon Behavioral Health Inventory Millon Clinical Multiaxial Inventory Minnesota Multiphasic Personality Inventory Personal Inventory for Youth Personality Inventory for Children (PIC) Adolescents defense mechanisms, 469-472 hospital admissions, 6 identity problems, 520 Affect Adjective Check List (AACL), 295 Age norms and, 138, 520-521 testing effects on, 138 Agoraphobia, 77, 79, 84-85, 92-93 Fear Survey schedule, 84-85 Inderal and, 236 Phobic Anxiety and Avoidance Scale, 84-85 AIDS, health program,

226, 410

Alcohol abuse, 6-7, 227, 231, 232, 585, 587, 589, 595

MMPI and, 150-151, 155 Alcohol/drug dependence scales, 173, 175 see MacAndrew Alcoholism Scale, 149, 155 Anger anxiety and, 293, 300-301 control measures, 308-309 expression and control of, 305, 306 see Anger Expression Scale (AX) Anger-Expression Scale (AX), 293, 306-308 Anger-in/Anger-out, 306-307, see Anger Antidepressant drugs, 235 see Psychopharmacology Anxiety Scale Questionnaire (ASQ), DoS Anxiety as emotional state, 293 caffeine and, 239

exercise and, 239 measurement of, 293-294, 300-301

Anxiety/depression overlap, 68-69, 79 Anxiety/depressive disorder outcomes, 238-239 Asberg Rating Scale, 222 Assessment instruments compatibility with clinical theories, 107-108 criteria for, 98 easy/uncomplicated feedback, 107 investment for positive return, 104106

629

630 SUBJECT INDEX multiple respondents, 101-102 objective referrents, 100-101 process-identifying outcome measures, 102 psychometric strengths, 102-104 relevance to target group, 99-100 teachable methods, 100 understanding for nonprofessional audience, 106 use in clinical services, 107 see Outcome assessments, Instruments Assessment, full battery, 3-4, 10-11 Asthmatics, 227 Attention Deficit Disorder (ADD), 554, 558-559 Attention Problems syndrome, 524 Axis I (clinical) diagnosis, 232 Axis II (personality) diagnosis, 232

B Beck Depression Inventory (BDI), 13, 222 development of, 279-280 limitations, 282-283 research applications, 282 validity/reliability information, 280-

282 Beck Hopelessness Scale (BHS), 13 development of, 279-280

treatment planning and, 185-186 validity/reliability of, 284-285 Bereavement, psychotherapy and, 237 Brief Hopkins Psychiatric Rating Scale (BHPRS), 218 Brief Psychiatric Rating Scale (BPRS) children and, 387

depressive symptoms and, 379 development, 371 feedback, 389, 394 interpretation, 385-387 interview procedures, 375-376 negative symptoms and, 380 norms, 378

outcome assessment, 390-392 problems with, 390 questions about, 376-378 rating scales, 373-375 reliability, 382-384 self-report measurement, 381-382 thought disorder and, 381 treatment planning, 387 validity, 378, 379 with other evaluation data, 314 Brief Psychiatric Rating Scale for Children (PPRS-C), 387 Brief Symptom Inventory (BSI), 217218, 224 interpretation of, 227

norms, 225 reliability/validity, 225-227 treatment planning, 228-233 Bulimia, 221 Buss-Durkee Hostility Inventory (BDHI), 300, 301 c Caffeine, anxiety and, 239

Cancer (distress), 233 Child abuse, 240-241 Child Behavior Checklist (CBCL) assessment of competencies, 524 children and (BBCL 2/3), 526 clinical applications, 534 conflicting data, 544-547 coordinating ratings, 522, 523 cross-informant syndromes, 519-520 Direct Observation Form (DOF), 526-528 feedback, 535, 540 interpretation, 531-532 limitations, 535, 540 outcome assessment, 535-539 reliability/validity, 530-531 research findings, 533, 536 scoring, 520-521

treatment planning, 532-533 with other data, 534

SUBJECT INDEX Child abuse, SCL-90-R and, 240, 241 Children, assessment of self-report, 528 Semistructured Clinical Interview for Children, 528, 529 testing, see Conners Rating Scales (CRS) Chronic pain, MMPI and, 150-151 Client Satisfaction Questionnaire-8

(CSQ-8) administration of, 404-405 development of, 405-406 interpretation of, 408 norms, 406

outcome assessment, 425-427 research with, 408-409 scoring, 403

treatment planning, 418 validity/reliability of, 406-408 Computer-based scoring, 522-523 interpretation of MSI by, 332-333 MCMI-II and, 174 MMPI-A and, 432 psychological assessment, 138 therapy by, 238 Conners Teachers Rating Scale-39 (CTRS-39), 553 Conners Rating Scale (CRS) demographics, 553 description of, 551-552 development of, 550-551 interpretation, 562-565 limitations, 570 outcome assessment, 567-569 test restrictions, 553 treatment planning, 565-566 validity/reliability, 554-446, 559-561 Construct validity, of test, 220-221

Consumer-treatment interaction literature, 124-125 Continuous quality improvement (CQI), 16-18 Coping styles Eysenck Personality Inventory, 65 see Treatment planning

Core Conflictual Relationship Theme (CCRT), 63 Cornell Medical Index, 217 Cross-cultural research, 552-553 see Katz Adjustment Scales

D Data analysis, statistical measurement, 111-132 Depression Dysthymia scale, 173 MCMI-II and, 167 MMPI-2 and, 153-154 Diabetes, 221 Discomfort Scale, 217 Drug abuse, 6-7, 77 MCMI-II and, 166 MMPI and, 150-153 E Ego Strength Scale, 432 Emotion physiological/behavioral responses, 292-293 see State-Trait Anxiety Inventory (STAI) Ethnicity norms and, 138

testing and, 366 Exercise, anxiety and, 239

False positive measures, 68 Free-association, see Rorschach G Gender child effects on, 527 MCMI-II norms and, 174

norms, 553 scale affect on, 138

sex problems syndrome, 520-521 standardization by, 325, 330 Gender-keyed norms, SCL-90-R, 220

631

632 SUBJECT INDEX General Severity Index (GSI), 225 Geriatric norms, 234 Global Severity Index (GSI), 217 Group therapy, 170 H Hamilton Rating Scale for Depression (HRSD), 13, 77, 85-86, 294 Health legislation, 583, 584 Americans with Disabilities Act (ADA), 584 Civil Rights Act, 584-585 Health care crisis, 4-8

test specificity, 69 InterStudy, 13, 15-16 see Outcomes Management System Intervention studies, measurement in, 115-116 Inventory of Interpersonal Problems, 62

Joint Commission on Accreditation on

Healthcare Organizations (JCAHO), 13-14, 18 K

Health Insurance Study, 16

Health maintenance organization (HMO), 8-9 Health Status Questionnaire, 16 Independent practice association (IPA), 8-9 orientation, SCL-90-R, 233 outcome issues, 14-15

Preferred provider organization

(PPO), 8-9 psychiatric treatment and, 175 reimbursement limitations, 7

trends, 582-587 see Psychological assessment, future directions HIV research, 226, see AIDS Hopkins Psychiatric Rating Scale (HPRS), 218, 241-242 Hopkins Symptom Checklist (HSCL), 217 Hostility/aggression, see Anxiety, 300-301 Hyperactivity Index, 554-561

Katz Adjustment Scales cross-cultural research, 355 development, 352-355 diversity, 355-358

drug therapy, 365 ethnicity, 366 interpretation, 362-363 item content problems, 367 limitations, 364 norms, 358-359 outcome assessment, 364-365

patient feedback, 364, 367 reliability/validity, 359-362, 368 treatment planning, 363 M Manifest Anxiety Scale (MAS), 295

Marital Adjustment Inventory, 94 Marital Satisfaction Inventory (MSI) administration of/scoring, 325 clinical applications, 340-343 interpretation of, 330-333

I

limitations of, 338-339 norms, 325-326

Inderal, 236 Inpatient vs. outpatient treatment, 17 Instruments in treatment planning, 67- 69

other assessment data and, 336-337 outcome assessment, 333-336, 339-

340 overview, 323-325

SUBJECT INDEX patient feedback and, 337-338

reliability/validity, 326-330 treatment planning, 333 Measurement efforts of patient and, 118-124 patient stabilization, 131-132 see Outcome assessment, statistical procedures for Medicaid/Medicare,

5

Mellaril, 236 Middlesex Hospital Questionnaire, 221 Millon Adolescent Clinical Inventory (MACI) features of, 455, 457-458 norms/validity, 460-462 reliability, 463 scoring/interpretation, 463-465 theory, 458-460

treatment outcome, 470-472 treatment planning, 466-469 Millon Adolescent Personality Inventory (MAPI)

633

clinical applications, 139-147

computer interpretation of, 138 coping styles, 65 development of, 137 interpretation (codetypes) 138-139 limitations of, 152

norms of, 138 on cassette, 149 Personality Disorder Scales, 148

providing patient feedback, 148 research findings, 150-152 test validity, 138 treatment planning, 139, 149 treatment/evaluation outcome measures, 149-152 use with other evaluation data, 148, 152 Minnesota Multiphasic Personality Inventory-Adolescent (MMPI-A) clinical applications, 434, 435, 441442 computer interpretation,432

features of, 454-458

development, 423-425

norms/validity, 460-462

evaluation with other data,436-437,

outcome assessment, 472-473

442 feedback, 437 interpretion of, 429-432 limitations, 437-439, 443

reliability, 463 scoring/interpretation, 463-465 theory, 458-460

treatment outcome, 470-472 Millon Clinical Multiaxial Inventory-II (MCMI-II), 11 basic scales in, 169-172

clinical syndrome scales, 173-174 interpretation of, 167-168 item weighting, 164 MCMI revision, 163

norms, 426-429

outcome assessment, 439, 442-433 research findings, 440-441

treatment planning, 432-435 validity/reliability, 429 Minnesota Multiphasic Personality Inventory/MCMI compared, 161

standardization of test scores, 164-

165

N,O

test construction, 163, 180

test reliability/validity, 165-175 treatment outcome assessment, 174175, 184 treatment planning 168 Minnesota Multiphasic Personality Inventory-2 (MMPI-2), 13, 148

Nonverbal behavior, 528 Norms. antidepressant drugs and, 235 Blacks and Hispanics, 165, 174 geriatric, 234 Obesity, treatment of, 87, 89

634 SUBJECT INDEX Outcome assessment between-group contrasts, 117 calculating clinical significance of, 89-94 case-formulation method, 87 collecting data, 117 difference scores, 115-116 Goal Attainment Scaling (GAS), 8789 history of, 75-76 individualizing, 87-88 log-linear analysis, 120-123 measuring change, 80-86, 114, 128130 measuring patient’s efforts, 118-124, 131-132 pre- posttreatment differences, 115 probit analysis, 120 regression/variance analysis, 123125 selecting statistical questions, 113 Solomon four-group design, 117 specifying treatment goal, 114 statistical procedures for, 111-132 structural equation analysis, 127-128 testing for, 13-16, 75, 77

Outcome Management System (OMS), 15 Health Insurance Study, 16 Health Status Questionnaire,

16

see also Testing, Treatment planning

P Panic disorder, 231, 236

Patient-therapist matching Myers-Briggs Type Indicator (MBTI), 66-67 Personality Assessment Inventory (PAI) interpretation, 199

normative data, 186-193 outcome assessment, 209-211

patient feedback, 208 rationale and development, 185-186 reliability/validity, 193-197 suitability for psychotherapy, 204205 target populations, 206-207 treatment planning, 201, 205, 207208 treatment setting, 202-203 Personality Inventory for Children (PIC) interpretation, 483-487

intervention, 503-505 psychometric characteristics, 482 research applications, 507-509 special education needs, 489, 501 test norms, 480-482

treatment outcome, 506 treatment planning, 449-503 Personality Inventory for Youth (PIY) intervention, 503-505 norms, 495-496 overview, 490 profile validity, 494-495 psychometric characteristics, 496498 research applications, 507-509 screening, 493 special education needs, 501-503 treatment outcome, 506 treatment planning, 499-501 Personality theory antisocial, 232, 469 intervention, 469

disorders of, 163, 176-177 prototypes, 163-165, 176 two-dimensional nature of, 161-163 Pharmacotherapeutic drugs, 234-237 see Psychopharmacology Positive and Negative Syndrome Scale (PANSS), 384 Positive Symptom Distress Index (PSDI), 217, 225 Positive Symptom Total (PST), 217, 225

SUBJECT INDEX Posttraumatic stress disorders (PTSD), 239-240 Present State Examination (PSE), 221 Prozac, 236 Psychiatric Symptom Assessment Scale (PSAS), 385 Psychological assessment, future directions, 4, 10-11 health-care trends, 585-588 mental illness predictors, 595 prediction for 21st century, 597-598 testing, 590-597 treatment planning, 588-591 virtual reality, 597-598 Psychological testing screening and, 12-13 Psychopharmacology, 365 drug-drug comparisons, 235-236 Psychotic Reaction Profile (PRP), 361 Psychotropic medications, use of, 3

R

635

child abuse profiles, 240-241 computer scoring/interpretation, 217218 distress measure test, 222 drug effects and, 235-236

factorial variance, 219-220 gender keyed, 218 in translation, 217 interpretation of, 222-223 multidimensional symptom profile, 223 reliability, 219 screening and, 224 sexual dysfunction, 241 sleep disturbance, 242

stress disorder outcomes, 240 test-retest reliability, 219 treatment outcome measures, 234235, 238 treatment planning, 228-233 validity, 220-222 Screening model, epidemiologic, 24-25 Screening tests, 28-32

Receiver operating characteristic analyses (ROC), 221 Rorschach inkblot test, 148, 180, 294 assessment of, 249-277

basics, 250-251 features of, 251-253

interpretive strategies, 253-255 patient feedback and, 259

protocal interpretation, 267-269, 275-277 reliability of, 255-256 treatment outcome assessment, 259262 treatment planning with, 257-290

Beck Depression Inventory (BDI),

30-31 Center for Epidemiological Studies Depression Scale (CES-D), 30 General Health Questionnaire (GHQ), 30 Hamilton Anxiety Scale (HAS), 31 Hamilton Rating Scale for Depression (HRDS/HAM-D), 31-32 SCL-90-R/BSI, 30 Self-Rating Depression Scale (SDS), 31 Screening, academic setting, 35-37

community setting, 32 Ss SCL-90-Analogue Scale (SCL-90-A), 218, 241-242 SCL-90-R anxiety/depression disorder outcomes, 238-239

concept of, 23-24

for cognitive impairment, 38 geriatric populations, 41 history of, 25-26

medical setting, 32-34 Phipps Behavior Chart, 25 psychometric principles, 26-27

636 SUBJECT INDEX SCL-90-R, 224 self-report, 27, 28 test instruments, 39-41 see Psychological assessment Self-report scale Woodworth Personal Data Sheet, 217 Self-report Symptom Inventory, 217 Self-report anxiety level and, 294 Beck Depression Inventory (BDI), 77, 84-85 cancer patients and, 241-242 hostility and, 300-301 Locke-Wallace Marital Adjustment Inventory, 77, 94 MCMI-II and, 174 MMPI and, 77 over-/underexaggeration of psychopathology, 141 prescription drugs and, 236 Rotter Internal-External Locus of Control, 77 S-R Inventory of Anxiousness, 77 scales of outcome assessment, 76, 81 State-Trait Anxiety Inventory (STAN), 76-77 Symptom Checklist-90 (SCL-90), 77 Zung Self-Rating Scale for Depres_ sion (ZSRS), 85-86 see Beck Hopelessness Scale, Beck Depression Inventory, Marital Satisfaction Inventory Service Satisfaction Scale-30, 403-404

in Spanish, 409 interpretation, 414 norms, 410 outcome assessment, 415-417 research with, 408, 409

treatment planning, 418 validity/reliability, 411-413 Sexual dysfunction _ SCL-90-R and, 241 Shipley Institute of Living Scale (SILS), 180

Simulated Social Skills Test, 81 Sleep disturbance, MMPI and, 152 Smoking, tests for cessation of, 69 Society for Personality Assessment, 3 State-Trait Anger Expression Inventory (STAXI), 293, 309-310 assessing anger, 313-315 assessing anxiety, 313-315 interpretation guidelines, 310-311 treatment planning, 312 State-Trait Anger Scale (STAS), 293, 301-302 construct validity, 303-305 reliability of, 302-303 State-Trait Anxiety Inventory (STAI),

13, 295-297 construct validity, 298-300 reliability, 298-300 Statistical procedures for assessment, 111-132 Statistical procedures measuring patient effort, 119 Stress, 239

smokers/nonsmokers, 227 Suicide ideation, 155, 223, 231

; Technology of Patient Experience (TyPE), 16 condition-specific instruments, 16 see

Instruments

Test reliability, alternate forms

reliability, 226 Test validity, 220 Test variability, inpatient/outpatient, 165 Test-based assessment, see Assessment Test-retest reliability, MMPI-2 and, 138 Testing Continuous quality improvement (CQI), 16-18 demographic variability, 165 outcome assessment, 13-16

SUBJECT INDEX overlap, 461 see Psychological assessment, future directions Tests (see also, specific tests) for treatment planning, 11 objections to, 11-12 Stages of Change Questionnnaire, 69 see also, Adolescent populations, tests for Thematic Apperception Test (TAT), 180, 294 Treatment assessment, MCMI-II and, 174 Treatment planning (MCMI-II), 55-71, 168 assessment for, 3-4, 10-11, 313, 315 coping styles, 65-66 instruments in, 67-69

MMPI-2 and, 139 patient variables, 58

patient-to-treatment matching, 66-67 predictive dimension, 56-58 problem complexity, 62-64 problem-solving phase, 61-62 resistance-prone patients, 64-65 symptom severity, 59-60 test specificity and, 69 Tryon’s cluster scales, 221

637

Vv Validity index, 167 Valium, 236 Variable Response Inconsistency Scale (VRIN), 140 W Wiggins content scales, 221 World Health Organization (WHO) health screening programs, 23

(fe

¥SOW TOR SUS.

{

;

,

ALS

hid

> 1, S850

be yi

rhage |

.xsbal yay

vol

.

fg Ae Be

.

ys

"

iebe

2 y ots pore

ree

ao

eeital wets

4

¥

est Many, a

coll

ti totere ioe

i Pe

a

sae

>

bi

ta

—-

uta oe



a. 4

orserae

FR ea Hey. = © yltaginet wit

ba

“a

a



en

eats yaeoO

:

a

on ny.

Mis

4



a

Yas

ba

Gs ae

=

bs a

; fia %-

9

fa

.

Wh

" é

.



>

ty

i

?'

Te

|

~~

; A isa

awe

=

J

f

=

ol

e

*

,

®

a)

if

-~ ©

a



_

s

“re? eH

4s.

ma

he

he



wily ot

very

ae

2 Fos

ig

esai Re nce

aa

th

at

:

bd

7

:

aes |

é

he



a

et.

©

pies

:

: ie way Pie aiatants, 8

an ee =

:

oes

ata

SETON HAIL UNIVERSI Mcla::

:

s

a