A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) 9781544396330, 1544396333

The Third Edition of A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) guides readers through lea

2,019 170 8MB

English Pages 385 Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)
 9781544396330, 1544396333

Table of contents :
A PRIMER ON PARTIAL LEAST SQUARES STRUCTURAL EQUATION MODELING (PLS-SEM) - FRONT COVER
A PRIMER ON PARTIAL LEAST SQUARES STRUCTURAL EQUATION MODELING (PLS-SEM)
COPYRIGHT
BRIEF CONTENTS
DETAILED CONTENTS
PREFACE
ABOUT THE AUTHORS
CHAPTER 1 - AN INTRODUCTION TO STRUCTURAL EQUATION MODELING
CHAPTER 2 - SPECIFYING THE PATH MODEL AND EXAMINING DATA
CHAPTER 3 - PATH MODEL ESTIMATION
CHAPTER 4 - SSESSING PLS-SEM RESULTS—PART I: EVALUATION OF THE REFLECTIVE MEASUREMENT MODELS
CHAPTER 5 - ASSESSING PLS-SEM RESULTS—PART II: EVALUATION OF THE FORMATIVE MEASUREMENT MODELS
CHAPTER 6 - ASSESSING PLS-SEM RESULTS—PART III: EVALUATION OF THE STRUCTURAL MODEL
CHAPTER 7 - MEDIATOR AND MODERATOR ANALYSIS
CHAPTER 8 - OUTLOOK ON ADVANCED METHODS
GLOSSARY
REFERENCES
INDEX

Citation preview

A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Third Edition

To the Academy of Marketing Science (AMS) and its members

Sara Miller McCune founded SAGE Publishing in 1965 to support the dissemination of usable knowledge and educate a global community. SAGE publishes more than 1000 journals and over 600 new books each year, spanning a wide range of subject areas. Our growing selection of library products includes archives, data, case studies and video. SAGE remains majority owned by our founder and after her lifetime will become owned by a charitable trust that secures the company’s continued independence. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne

A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Third Edition

Joseph F. Hair, Jr. University of South Alabama

G. Tomas M. Hult Michigan State University

Christian M. Ringle Hamburg University of Technology, Germany and University of Waikato, New Zealand

Marko Sarstedt Ludwig-Maximilians-University Munich, Germany and Babeș-Bolyai University, Romania

FOR INFORMATION:

Copyright © 2022 by SAGE Publications, Inc.

SAGE Publications, Inc.

All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher.

2455 Teller Road Thousand Oaks, California 91320 E-mail: [email protected]

SAGE Publications Ltd. 1 Oliver’s Yard 55 City Road London EC1Y 1SP United Kingdom

SAGE Publications India Pvt. Ltd. B 1/I 1 Mohan Cooperative Industrial Area

All trademarks depicted within this book, including trademarks appearing as part of a screenshot, figure, or other image are included solely for the purpose of illustration and are the property of their respective holders. The use of the trademarks in no way indicates any relationship with, or endorsement by, the holders of said trademarks. Printed in the United States of America

Mathura Road, New Delhi 110 044 India

SAGE Publications Asia-Pacific Pte. Ltd. 18 Cross Street #10-10/11/12 China Square Central Singapore 048423

Library of Congress Cataloging-in-Publication Data Names: Hair, Joseph F., Jr., 1944- author. | Hult, G. Tomas M., author. | Ringle, Christian M., author. | Sarstedt, Marko. author. Title: A primer on partial least squares structural equation modeling (PLS-SEM) / Joe F. Hair, Jr., G. Tomas M. Hult, Christian M. Ringle, Marko Sarstedt. Description: Third edition. | Los Angeles : SAGE, [2022] | Includes bibliographical references and index. | Identifiers: LCCN 2021004786 | ISBN 9781544396408 (paperback) | ISBN 9781544396415 (epub) | ISBN 9781544396422 (epub) | ISBN 9781544396330 (pdf) Subjects: LCSH: Least squares. | Structural equation modeling. Classification: LCC QA275 .P88 2022 | DDC 511/.42— dc23 LC record available at https://lccn.loc.gov/2021004786 This book is printed on acid-free paper.

Acquisitions Editor:  Leah Fargotstein Editorial Assistant:  Kenzie Offley Production Editors: Natasha Tiwari, Gagan Mahindra Copy Editor:  Terri Lee Paulsen Typesetter:  C&M Digitals (P) Ltd. Proofreader:  Ellen Brink Indexer: Integra Cover Designer:  Candice Harman Marketing Manager:  Victoria Velasquez

21 22 23 24 25 10 9 8 7 6 5 4 3 2 1

BRIEF CONTENTS Prefacexi About the Authors

xviii

Chapter 1

• An Introduction to Structural Equation Modeling

1

Chapter 2

• Specifying the Path Model and Examining Data

40

Chapter 3

• Path Model Estimation

85

Chapter 4

• Assessing PLS-SEM Results—Part I: Evaluation of the Reflective Measurement Models

109

• Assessing PLS-SEM Results—Part II: Evaluation of the Formative Measurement Models

140

• Assessing PLS-SEM Results—Part III: Evaluation of the Structural Model

186

Chapter 7

• Mediator and Moderator Analysis

228

Chapter 8

• Outlook on Advanced Methods

271

Chapter 5 Chapter 6

Glossary305 References327 Index352

DETAILED CONTENTS Prefacexi About the Authors

Chapter 1  •  An Introduction to Structural Equation Modeling

xviii

1

Chapter Preview

1

What Is Structural Equation Modeling?

2

Considerations in Using Structural Equation Modeling

6

Composite Variables 6 Measurement7 Measurement Scales 8 Coding10 Data Distributions 10

Principles of Structural Equation Modeling Path Models With Latent Variables Testing Theoretical Relationships Measurement Theory Structural Theory

PLS-SEM, CB-SEM, and Regressions Based on Sum Scores

12 12 13 14 14

15

Considerations When Applying PLS-SEM18 Key Characteristics of the PLS-SEM Method Data Characteristics Minimum Sample Size Requirement Missing Value Treatment Nonnormal Data Scales of Measurement Secondary Data Model Characteristics

18 24 24 27 28 28 28 30

Guidelines for Choosing Between PLS-SEM and CB-SEM31 Organization of Remaining Chapters

32

Summary34 Review Questions

36

Critical Thinking Questions

36

Key Terms

36

Suggested Readings

37

Chapter 2  •  Specifying the Path Model and Examining Data

40

Chapter Preview

41

Stage 1: Specifying the Structural Model

41

Mediation44 Moderation45 Control Variables 47

Stage 2: Specifying the Measurement Models Reflective and Formative Measurement Models Single-Item Measures and Sum Scores Higher-Order Constructs

Stage 3: Data Collection and Examination

50 51 57 59

61

Missing Data 62 Suspicious Response Patterns 64 Outliers64 Data Distribution 65

Case Study Illustration—Specifying the PLS-SEM Model Application of Stage 1: Structural Model Specification Application of Stage 2: Measurement Model Specification Application of Stage 3: Data Collection and Examination Path Model Creation Using the SmartPLS Software

67 67 69 71 72

Summary79 Review Questions

81

Critical Thinking Questions

82

Key Terms

82

Suggested Readings

83

Chapter 3  •  Path Model Estimation

85

Chapter Preview

85

Stage 4: Model Estimation and the PLS-SEM Algorithm

86

How the Algorithm Works 86 Statistical Properties 92 Algorithmic Options and Parameter Settings to Run the Algorithm 94 Results96

Case Study Illustration—PLS Path Model Estimation (Stage 4) Model Estimation Estimation Results

97 97 99

Summary104 Review Questions

105

Critical Thinking Questions

106

Key Terms

106

Suggested Readings

106

Chapter 4  •  Assessing PLS-SEM Results—Part I: Evaluation of the Reflective Measurement Models

109

Chapter Preview

109

Overview of Stage 5: Evaluation of Measurement Models

110

Stage 5a: Assessing Results of Reflective Measurement Models

116

Step 1: Indicator Reliability Step 2: Internal Consistency Reliability Step 3: Convergent Validity Step 4: Discriminant Validity

Case Study Illustration—Evaluation of the Reflective Measurement Models (Stage 5a) Running the PLS-SEM Algorithm Reflective Measurement Model Evaluation

117 118 120 120

127 127 128

Summary136 Review Questions

137

Critical Thinking Questions

137

Key Terms

137

Suggested Readings

138

Chapter 5  •  Assessing PLS-SEM Results—Part II: Evaluation of the Formative Measurement Models

140

Chapter Preview

140

Stage 5b: Assessing Results of Formative Measurement Models

141

Step 1: Assess Convergent Validity Step 2: Assess Formative Measurement Models for Collinearity Issues Step 3: Assess the Significance and Relevance of the Formative Indicators Bootstrapping Procedure Concept Bootstrap Confidence Intervals

Case Study Illustration—Evaluation of the Formative Measurement Models (Stage 5b) Extending the Simple Path Model Reflective Measurement Model Evaluation (Recap) Formative Measurement Model Evaluation

143 145 148 152 152 156

159 159 169 171

Summary182 Review Questions

183

Critical Thinking Questions

183

Key Terms

184

Suggested Readings

184

Chapter 6  •  Assessing PLS-SEM Results—Part III: Evaluation of the Structural Model

186

Chapter Preview

186

Stage 6: Structural Model Results Evaluation

187

Step 1: Assess the Structural Model for Collinearity Step 2: Assess the Significance and Relevance of the Structural Model Relationships Step 3: Assess the Model’s Explanatory Power Step 4: Assess the Model’s Predictive Power Number of Folds Number of Repetitions Prediction Statistic Results Interpretation Treating Predictive Power Issues Step 5: Model Comparisons

Case Study Illustration—Evaluation of the Structural Model (Stage 6)

191 192 194 196 198 199 200 201 204 205

209

Summary223 Review Questions

225

Critical Thinking Questions

225

Key Terms

225

Suggested Readings

226

Chapter 7  •  Mediator and Moderator Analysis Chapter Preview

228 228

Mediation229 Introduction229 Measurement and Structural Model Evaluation in Mediation Analysis 233 Types of Mediating Effects 233 Testing Mediating Effects 236 Multiple Mediation 238 Case Study Illustration—Mediation 240

Moderation243 Introduction243 Types of Moderator Variables 245 Modeling Moderating Effects 247 Creating the Interaction Term 249 Product Indicator Approach 249 Orthogonalizing Approach 250 Two-Stage Approach 251 Guidelines for Creating the Interaction Term 253

Model Evaluation Results Interpretation Moderated Mediation and Mediated Moderation

Case Study Illustration—Moderation

253 254 257

260

Summary267 Review Questions

268

Critical Thinking Questions

268

Key Terms

269

Suggested Readings

269

Chapter 8  •  Outlook on Advanced Methods

271

Chapter Preview

271

Importance-Performance Map Analysis

273

Necessary Condition Analysis

276

Higher-Order Constructs

277

Confirmatory Tetrad Analysis

281

Examining Endogeneity

285

Treating Observed and Unobserved Heterogeneity

286

Multigroup Analysis Uncovering Unobserved Heterogeneity

287 290

Measurement Model Invariance

294

Consistent PLS-SEM

296

Summary298 Review Questions

300

Critical Thinking Questions

301

Key Terms

301

Suggested Readings

302

Glossary305 References327 Index352

PREFACE

I

t has been almost a decade since the first edition of our book was published in 2014. In that period of time, the field of structural equation modeling (SEM), and particularly partial least squares structural equation modeling (PLS-SEM), has changed considerably. While some traditional statistical methods have continued to evolve and extend their capabilities, PLS-SEM has expanded rapidly to include numerous additional analytical options. Much of the focus has been on the development of methods for confirming the quality of composite measures as representations of theoretical concepts (using procedures similar to the traditional confirmatory factor analysis in common factor models) and for assessing a model’s out-of-sample predictive power. But many somewhat smaller analytical improvements have emerged as well. When we wrote the first and second editions, we were confident that interest in PLS-SEM would increase. But even our wildest expectations were exceeded. We never anticipated that the interest in the PLS-SEM method would literally explode in a few years! The two previous editions of our book have been cited more than 25,000 times according to Google Scholar, and the books have been translated into seven other languages, including German (Hair et al., 2017b), Italian (Hair et al., 2020b), and Spanish (Hair et al., 2019a). Furthermore, the book now also comes in an R software edition (Hair et al., 2022). A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) has been the premier text in the field of PLS-SEM for many years, and based on the advances included in this new edition, we are confident it will remain the leading text in the future (Hair, Hult, Ringle, & Sarstedt, 2014, 2017). A review of major social sciences journals clearly demonstrates that applications of PLS-SEM have grown exponentially in the past decade, as evidenced in the popularity of the terms, “partial least squares structural equation modeling,” “PLS-SEM,” and “PLS path modeling” in the Web of Science database (Exhibit 0.1). Two journal articles published by our author team before the first edition also provide clear evidence of the popularity of PLS-SEM. The two articles have been the most widely cited in those journals since their publication—our 2012 article in the Journal of Academy of Marketing Science, “An Assessment of the Use of Partial Least Squares Structural Equation Modeling” in marketing research, cited over 5,000 times according to Google Scholar, has been the number- one highest-impact article published in the top 20 marketing journals, according to Shugan’s list of most cited marketing articles (http://www .marketingscience.org; e.g., Volume 2, Issue 3). It has also been awarded the 2015 Emerald Citations of Excellence award. Moreover, our 2011 article in the Journal of

xi

xii   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

1600 1400 1200 1000 800 600 400 200 0 2010

Number of articles

EXHIBIT 0.1  ■  Number of PLS–SEM–related Articles Per Year

Year Note: Number of articles returned from the Web of Science database for the search terms “partial least squares structural equation modeling,” “PLS-SEM,” and “PLS path modeling.”

Marketing Theory and Practice, “PLS-SEM: Indeed a silver bullet,” has surpassed the 10,000 citations mark. More recently, our 2015 Journal of the Academy of Marketing Science article “A New Criterion for Assessing Discriminant Validity in Variance-Based Structural Equation Modeling” was ranked as the top economic article in the Thompson Reuters Essential Science Indicators Ranking, which ranks it in the top 0.1% most cited research articles worldwide. PLS-SEM has also enjoyed increasingly widespread interest among methods researchers. A rapidly growing number of scholars have gained interest in PLSSEM, and they complement the initial core group of authors that have shaped the method (Khan et al., 2019). Their research papers offer novel perspectives on the method, sometimes sparking significant debates. Prominent examples include the rejoinders to Rigdon’s (2012) Long Range Planning article by Bentler and Huang (2014), Dijkstra (2014), Sarstedt, Ringle, Henseler, and Hair (2014), and Rigdon (2014b) himself. Under the general theme “rethinking partial least squares path modeling,” this exchange of thoughts offered the point of departure for some of the most important PLS-SEM developments in the last few years. Other articles have further clarified the similarities and differences between PLSSEM and covariance-based SEM, which has long been viewed as the default method for analyzing causal models. For example, Hair, Hult, Ringle, Sarstedt, and Thiele (2017) and Sarstedt, Hair, Ringle, Thiele, and Gudergan (2016) discuss the measurement philosophies underlying the two SEM methods and demonstrate the biases that occur when using PLS-SEM and covariance-based SEM (CB-SEM) on models, which are inconsistent with what the methods assume. Related to this debate, Rigdon, Sarstedt, and Ringle (2017) argue how differences in philosophy of science and different expectations about the

Preface  xiii

research situation tend to induce a preference for one method over the other (see also Hair & Sarstedt, 2019). Similar discussions have emerged in psychology where researchers increasingly acknowledge that reducing measurement to only the philosophy assumed by covariance-based SEM is a very restrictive view, which does not apply to nearly all constructs (Rhemtulla, van Bork, & Borsboom, 2020). Shmueli, Ray, Velasquez Estrada, and Chatla (2016) have made a substantial contribution to the field by shifting researchers’ focus to the assessment of PLS path models’ predictive power. Bemoaning the emphasis of explanatory model assessment in applications of PLS-SEM, the authors introduced the PLSpredict procedure, which allows for evaluating a model’s out-of-sample predictive power. Their research has sparked a series of follow-up studies, offering guidelines on how to use PLSpredict (Shmueli et al., 2019) and introducing tests that allow comparing different models in terms of their predictive power (Liengaard et al., 2021). Finally, Rönkkö and Evermann’s (2013) critique of the PLS-SEM method in Organizational Research Methods offered an excellent opportunity to show how uninformed and blind criticism of the PLS-SEM method leads to misleading, incorrect, and false conclusions (see the rejoinder by Henseler et al., 2014). While this debate also nurtured some advances in PLS-SEM (Rönkkö & Evermann, 2021)—such as the new heterotrait-monotrait (HTMT) criterion to assess discriminant validity (Franke & Sarstedt, 2019; Henseler, Ringle, & Sarstedt, 2015)—we believe it is important to reemphasize our previous call: “Any extreme position that (often systematically) neglects the beneficial features of the other technique and may result in prejudiced boycott calls [. . .] is not good research practice and does not help to truly advance our understanding of methods (or any other research subject)” (Hair, Ringle, & Sarstedt, 2012, p. 313; see also Petter, 2018; Sarstedt, Ringle, Henseler, & Hair, 2014). Enhancement of the methodological foundations of the PLS-SEM method has been accompanied by the release of multiple new versions of SmartPLS 3 (Ringle, Wende, & Becker, 2015), which implement most of these latest extensions in this very user-friendly software (see https://www.smartpls.com). These updates are much more than just a simple revision. They incorporate a broad range of new algorithms and major new features that previously were not available or had to be executed manually (Sarstedt & Cheah, 2019). In light of the developments in terms of the much more widespread utilization of PLS-SEM, and further enhancements and extensions of the method and software support, a new edition of the book is clearly timely and warranted. While there are numerous published articles on the method, until our first two editions and even today, there are very few other comprehensive books that explain the fundamental aspects of the method, particularly in a way that can be understood by individuals with limited statistical and mathematical training. This third edition of our book updates and extends the coverage of PLS-SEM for social sciences researchers and creates awareness of the most recent developments

xiv   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

in an analytical tool that will enable scholars and practitioners to pursue research opportunities in many new and different ways. The approach of this book is based on the authors’ many years of conducting and teaching research, as well as the desire to communicate the fundamentals of the PLS-SEM method to a much broader audience. To accomplish this goal, we have limited the emphasis on equations, formulas, Greek symbols, and so forth that are typical of most books and articles. Instead, we explain in detail the basic fundamentals of PLS-SEM and provide rules of thumb that can be used as general guidelines for understanding and evaluating the results of applying the method. We also rely on a single software package (SmartPLS 3; https://www .smartpls.com) that can be used not only to complete the exercises in this book but also in the reader’s own research. As a further effort to facilitate learning, we use a single case study throughout the book. The case is drawn from a published study on corporate reputation and, we believe it is general enough to be understood by many different areas of social science research, thus further facilitating comprehension of the method. Review and critical thinking questions are posed at the end of the chapters, and key terms are defined to better understand the concepts. Finally, suggested readings and extensive references are provided to enhance more advanced coverage of the topic. We are excited to share with you the many new topics we have included in this edition. These include the following: • An overview of the latest research on the nature of composite-based modeling, which is the conceptual foundation for PLS-SEM. • More on the distinction between PLS-SEM and CB-SEM and the model constellations, which are favorable toward the use of PLS-SEM. • Application of PLS-SEM with secondary (archival) data. • Information on how to treat control variables in PLS path models. • Extended discussion of model fit in PLS-SEM. • Further coverage of internal consistency reliability using ρA and inference testing in discriminant validity assessment. • Enhanced guidelines for generating and validating single-item measures for redundancy analyses. • Improved guidelines for determining minimum sample sizes using the inverse square root method. • Coverage of the weighted PLS-SEM algorithm. • Latest research on bootstrapping settings and assessment.

Preface  xv

• Analyzing a model’s out-of-sample predictive power using the PLSpredict procedure. • Metrics for model comparisons and selection (e.g., the Bayesian information criterion), including cross-validation of alternative models. • Revision and extension of the chapter on mediation, which now covers more types of mediation, including multiple mediation, and demonstrates why PLS-SEM is superior to PROCESS-based mediation analyses. • Explanation and guidelines on moderated mediation. • Latest research on specifying and estimating higher-order constructs. • Updated recommendations for multigroup analysis. • Extended coverage of advanced concepts and methods such as necessary condition analysis and endogeneity. • Coverage of the latest literature on PLS-SEM. All examples in the edition are updated using the newest version of the most widely applied PLS-SEM software—SmartPLS 3. The book chapters and learning support supplements are organized around the learning outcomes shown at the beginning of each chapter. Moreover, instead of a single summary at the end of each chapter, we present a separate concise summary for each learning outcome. This approach makes the book more understandable and usable for both students and teachers. The SAGE website for the book also includes other support materials to facilitate learning and applying the PLS-SEM method. Additionally, the PLS-SEM Academy (https://www.pls-sem-academy.com) offers video-based online courses based on this book and its earlier editions and also on advanced PLS-SEM topics following the explanations of Hair, Sarstedt, Ringle, and Gudergan (2018). Exhibit 0.2 explains how owners of this book can obtain a discounted access to the courses offered by the PLS-SEM Academy. We would like to acknowledge the many insights and suggestions provided by the reviewers: Maxwell K. Hsu (University of Wisconsin), Toni M. Somers (Wayne State University), and Lea Witta (University of Central Florida), as well as a number of our colleagues and students. Most notably, we thank Jan-Michael Becker (BI Norwegian Business School), Zakariya Belkhamza (Ahmed Bin Mohammed Military College), Charla Brown (Troy University), Roger Calantone (Michigan State University), Fabio Cassia (University of Verona), Gabriel Cepeda-Carrión (University of Seville), Jacky Jun Hwa Cheah (Universiti Putra Malaysia), Nicholas Danks (Trinity College Dublin), Adamantios Diamantopoulos (University of Vienna), Markus Eberl (Kantar), George Franke (University of Alabama), Anne Gottfried (University of Texas, Arlington),

xvi   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 0.2  ■  Discounted PLS-SEM Academy Access The PLS-SEM Academy (www.pls-sem-academy.com) offers video-based online courses on the PLS-SEM method. The courses include contents such as: •• An introduction to PLS-SEM, •• Reflective and formative measurement model assessment, •• Structural model assessment, goodness of fit, and predictive model assessment, •• Advanced topics, like importance-performance-map analysis (IPMA), mediation, higher-order models, moderation, measurement invariance, multigroup analysis, nonlinear effects, and latent class analysis. Besides several hours of online video material presented by worldwide renowned instructors, the PLS-SEM Academy provides comprehensive lecturing slides and annotated outputs from SmartPLS that illustrate all analyses step by step. Registered users can claim course certificates after successful completion of each end of section exam. The PLS-SEM Academy offers all owners of this book a 15% discount on the purchase of access to its course offerings. All you have to do is send a photo of yourself with the book in your hand and your name and address to the e-mail address [email protected]. A short time later you will receive a 15% discount code, and which you can use on the website https://www.pls-semacademy.com. We hope you enjoy perfecting your PLS-SEM skills with the help of these courses and wish you every success in obtaining the certificates.

Siegfried P. Gudergan (University of Waikato), Saurabh Gupta (Kennesaw State University), Karl-Werner Hansmann (University of Hamburg), Dana Harrison (East Tennessee State University), Sven Hauff (Helmut Schmidt University), Mike Hollingsworth (Old Dominion University), Philip Holmes (Pensacola Christian College), Chris Hopkins (Auburn University), Lucas Hopkins (Florida State University), Heungsun Hwang (McGill University), Ida Rosnita Ismail (Universiti Kebangsaan Malaysia), April Kemp (Southeastern Louisiana University), David Ketchen (Auburn University), Ned Kock (Texas A&M University), Marcel Lichters (TU Chemnitz), Benjamin Liengaard (Aarhus Universitet), Chein-Hsin Lin (DaYeh University), Yide Liu (Macau University of Science and Technology), Francesca Magno (University of Bergamo), Lucy Matthews (Middle Tennessee State University), Jay Memmott (University of South Dakota), Mumtaz Ali Memon (NUST Business School), Adam Merkle (University of South Alabama), Ovidiu I. Moisescu (Babeş-Bolyai University), Zach Moore (University of Louisiana at Monroe), Arthur Money (Henley Business School), Christian Nitzl (Universität der Bundeswehr München), Torsten Pieper (University of North Carolina),

Preface  xvii

Lacramioara Radomir (Babeş-Bolyai University), Arun Rai (Georgia State University), Sascha Raithel (Freie Universität Berlin), S. Mostafa Rasoolimanesh (Taylor’s University), Soumya Ray (National Tsing Hua University), Nicole Richter (University of Southern Denmark), Edward E. Rigdon (Georgia State University), Jeff Risher (Southeastern Oklahoma University), José Luis Roldán (University of Seville), Amit Saini (University of Nebraska-Licoln), Phillip Samouel (University of Kingston), Francesco Scafarto (University of Rome “Tor Vergata”), Bruno Schivinski (University of London), Rainer Schlittgen (University of Hamburg), Manfred Schwaiger (Ludwig-MaximiliansUniversity Munich), Pratyush N. Sharma (University of Alabama), Wen-Lung Shiau (Zhejiang University of Technology), Galit Shmueli (National Tsing Hua University), Donna Smith (Ryerson University), Detmar W. Straub (Georgia State University), Hiram Ting (UCSI University), Ramayah Thurasamy (Universiti Sains Malaysia) Ron Tsang (Agnes Scott College), Huiwen Wang (Beihang University), Sven Wende (SmartPLS GmbH), and Anita Whiting (Clayton State University) for their helpful remarks. Also, we thank the team of doctoral students and research fellows at Hamburg University of Technology and Otto-von-Guericke-University Magdeburg— namely, Susanne Adler, Michael Canty, Svenja Damberg, Zita K. Eggardt, Lena Frömbling, Frauke Kühn, Benjamin Maas, Mandy Pick, and Martina Schöniger— for their kind support. In addition, at SAGE we thank Leah Fargotstein for her support and great work. We hope this book will expand knowledge of the capabilities and benefits of PLS-SEM to a much broader group of researchers and practitioners. Last, if you have any remarks, suggestions, or ideas to improve this book, please get in touch with us. We appreciate any feedback on the book’s concept and contents! Joseph F. Hair, Jr. University of South Alabama G. Tomas M. Hult Michigan State University Christian M. Ringle Hamburg University of Technology, Germany and University of Waikato, New Zealand Marko Sarstedt Ludwig-Maximilians-University Munich, Germany and Babeș-Bolyai University, Romania

Visit the companion site for this book at https://www.pls-sem.net/

ABOUT THE AUTHORS Joseph F. Hair, Jr. is Cleverdon Chair of Business, and director of the PhD degree in business administration, Mitchell College of Business, University of South Alabama. He previously held the Copeland Endowed Chair of Entrepreneurship and was director of the Entrepreneurship Institute, Ourso College of Business Administration, Louisiana State University. Joe was recognized by Clarivate Analytics in 2018, 2019, and 2020 for being in the top 1% globally of all business and economics professors based on his citations and scholarly accomplishments, which exceed 238,000 over his career. He has authored more than 75 books, including Multivariate Data Analysis (8th edition, 2019; cited 140,000+ times), MKTG (13th edition, 2020), Essentials of Business Research Methods (2020), and Essentials of Marketing Research (4th edition, 2020). He also has published numerous articles in scholarly journals and was recognized as the Academy of Marketing Science Marketing Educator of the Year. As a popular guest speaker, Professor Hair often presents seminars on research techniques, multivariate data analysis, and marketing issues for organizations in Europe, Australia, China, India, and South America. He has a new book on Marketing Analytics (McGraw-Hill). G. Tomas M. Hult is professor and Byington Endowed Chair at Michigan State University (USA), and holds a visiting chaired professorship at Leeds University Business School (United Kingdom) and a visiting professorship at Uppsala University (Sweden). Professor Hult is a member of the Expert Networks of the World Economic Forum and United Nations/UNCTAD’s World Investment Forum and is also part of the Expert Team at the American Customer Satisfaction Index (ACSI). Dr. Hult was recognized in 2016 as the Academy of Marketing Science/CUTCOVector Distinguished Marketing Educator; he is an elected fellow of the Academy of International Business; and he ranks in the top 10 scholars in marketing per the prestigious World Ranking of Scientists. At Michigan State University, Dr. Hult was recognized with the Beal Outstanding Faculty Award in 2019 (MSU’s highest award “for outstanding total service to the University”), and he has also been

xviii

About the Authors  

xix

recognized with the John Dunning AIB Service Award for outstanding service to AIB as the longest-serving executive director in AIB’s history (2004–2019) (the most prestigious service award given by the Academy of International Business). Professor Hult regularly teaches doctoral seminars on multivariate statistics, structural equation modeling, and hierarchical linear modeling worldwide. He is a dual citizen of Sweden and the United States. More information about Professor Hult can be found at http://www.tomashult.com. Christian M. Ringle is a chaired professor of management at the Hamburg University of Technology (Germany) and an adjunct professor at the University of Waikato (New Zealand). His research addresses management of organizations, human resource management, and methods development for business analytics and their application to business research. His contributions in these fields have been published in journals such as International Journal of Research in Marketing, Information Systems Research, Journal of the Academy of Marketing Science, MIS Quarterly, Organizational Research Methods, and The International Journal of Human Resource Management. Since 2018, he has been named member of Clarivate Analytics’ Highly Cited Researchers List. In 2014, Professor Ringle co-founded SmartPLS (https://www.smartpls.com/), a software tool with a graphical user interface for the application of the partial least squares structural equation modeling (PLS-SEM) method. Besides supporting consultancies and international corporations, he regularly teaches doctoral seminars on business analytics and multivariate statistics, the PLS-SEM method, and the use of SmartPLS worldwide. More information about Professor Christian M. Ringle can be found at https://www.tuhh.de/ hrmo/team/prof-dr-c-m-ringle.html. Marko Sarstedt is a chaired professor of marketing at the Ludwig-Maximilians-University Munich (Germany) and an adjunct professor at Babeș-Bolyai University, Cluj (Romania). His main research interest is the advancement of research methods to enhance the understanding of consumer behavior. His research has been published in Nature Human Behavior, Journal of Marketing Research, Journal of the Academy of Marketing Science, Multivariate Behavioral Research, Organizational Research Methods, MIS Quarterly, and Psychometrika, among others. His research ranks among the most frequently cited in the social sciences. Professor Sarstedt has won numerous best paper and citation awards, including five Emerald Citations

xx   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

of Excellence awards and two AMS William R. Darden Awards. According to the 2020 F.A.Z. ranking, he is among the most influential researcher in Germany, Austria, and Switzerland. Professor Sarstedt has been named member of Clarivate Analytics’ Highly Cited Researchers List, which includes the “world’s most impactful scientific researchers.”

1 AN INTRODUCTION TO STRUCTURAL EQUATION MODELING LEARNING OUTCOMES 1. Understand the meaning of structural equation modeling (SEM) and its relationship to multivariate data analysis. 2. Describe the basic considerations in applying multivariate data analysis. 3. Comprehend the basic concepts of partial least squares structural equation modeling (PLS-SEM). 4. Explain the differences between covariance-based structural equation modeling (CB-SEM) and PLS-SEM and when to use each.

CHAPTER PREVIEW Social science researchers have been using statistical analysis tools for many years to extend their ability to develop, explore, and confirm research findings. Application of first-generation statistical methods, such as factor analysis and regression analysis, dominated the research landscape through the 1980s. But since the early 1990s, second-generation methods have expanded rapidly and, in

1

2   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

some disciplines, represent almost 50% of the statistical tools applied in empirical research. In this chapter, we explain the fundamentals of second-generation statistical methods and establish a foundation that will enable you to understand and apply one of the emerging second-generation tools, referred to as partial least squares structural equation modeling (PLS-SEM).

WHAT IS STRUCTURAL EQUATION MODELING? Statistical analysis has been an essential tool for social science researchers for more than a century. Applications of statistical methods have expanded dramatically with the advent of computer hardware and software, particularly in recent years with widespread access to many more methods due to user-friendly interfaces with technology-delivered knowledge. Researchers initially relied on univariate and bivariate analysis to understand data and relationships. To comprehend more complex relationships associated with current research directions in the social science disciplines, it is increasingly necessary to apply more sophisticated multivariate data analysis methods. Multivariate analysis involves the application of statistical methods that simultaneously analyze multiple variables. The variables typically represent measurements associated with individuals, companies, events, activities, situations, and so forth. The measurements are often obtained from surveys or observations that are used to collect primary data, but they may also be obtained from databases consisting of secondary data. Exhibit 1.1 displays some of the major types of statistical methods associated with multivariate data analysis. EXHIBIT 1.1  ■  Organization of Multivariate Methods

Firstgeneration techniques

Primarily Exploratory

Primarily Confirmatory

•• Cluster analysis

•• Analysis of variance

•• Exploratory factor analysis

•• Logistic regression

•• Multidimensional scaling

•• Multiple regression •• Confirmatory factor analysis (CFA)

Secondgeneration techniques

•• Partial least squares structural equation modeling (PLS-SEM)

•• Covariance-based structural equation modeling (CB-SEM)

Chapter 1  ■  An Introduction to Structural Equation Modeling  

3

The statistical methods often used by social scientists are typically called firstgeneration techniques (Fornell, 1982, 1987). These techniques, shown in the upper part of Exhibit 1.1, include regression-based approaches, such as multiple regression, logistic regression, and analysis of variance, but also techniques, such as exploratory and confirmatory factor analysis, cluster analysis, and multidimensional scaling. When applied to a research question, these methods can be used to either confirm a priori established theories or identify data patterns and relationships. Specifically, they are confirmatory when testing the hypotheses of existing theories and concepts, and exploratory when they search for patterns in the data in case there is no or only little prior knowledge on how the variables are related. It is important to note that the distinction between confirmatory and exploratory is not always as clear-cut as it seems. For example, when running a regression analysis, researchers usually select the dependent and independent variables based on established theories and concepts. The goal of the regression analysis is then to test these theories and concepts. However, the technique can also be used to explore whether additional independent variables prove valuable for extending the concept being tested. The findings typically focus first on which independent variables are statistically significant predictors of the single dependent variable (more confirmatory) and then on which independent variables are, relatively speaking, better predictors of the dependent variable (more exploratory). In a similar fashion, when exploratory factor analysis is applied to a data set, the method searches for relationships between the variables in an effort to reduce a large number of variables to a smaller set of composite factors (i.e., linear combinations of variables). The final set of composite factors is a result of exploring relationships in the data and reporting the relationships that are found (if any). Nevertheless, while the technique is exploratory in nature (as the name already suggests), researchers often have theoretical knowledge that may, for example, guide their decision on how many composite factors to extract from the data (Sarstedt & Mooi, 2019; Chapter 8.3.3). In contrast, the confirmatory factor analysis is specifically designed for testing and substantiating an a priori determined factor(s) and its assigned indicators. First-generation techniques have been widely applied by social science researchers, and they have significantly shaped the way we see the world today. In particular, methods such as multiple regression, logistic regression, and analysis of variance have been used to empirically test relationships among variables. However, what is common to these techniques is that they share three limitations, namely (1) the postulation of a simple model structure, (2) the assumption that all variables can be considered observable, and (3) the conjecture that all variables are measured without error (Haenlein & Kaplan, 2004). With regard to the first limitation, multiple regression analysis and its extensions postulate a simple model structure involving one layer of dependent and independent variables. Causal chains such as “A leads to B leads to C” or more complex nomological networks involving a great number of intervening

4   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

variables can only be estimated piecewise rather than simultaneously, which can have severe consequences for the results’ quality (Sarstedt, Hair, Nitzl, Ringle, & Howard, 2020). With regard to the second limitation, regression-type methods are restricted to processing observable variables, such as age or sales (in units or dollars). Theoretical concepts, which are “abstract, unobservable properties or attributes of a social unit or entity” (Bagozzi & Philipps, 1982, p. 465), can only be considered after prior stand-alone validation by means of, for example, a confirmatory factor analysis. The ex post inclusion of measures of theoretical concepts, however, comes with various limitations. With regard to the third limitation and related to the previous point, one has to bear in mind that each observation of the real world is accompanied by a certain measurement error, which can be systematic or random (Chapter 4). First-generation techniques are, strictly speaking, only applicable when there is neither systematic, nor random error. This situation is, however, rarely encountered in reality, particularly when the aim is to estimate relationships among measures of theoretical concepts. As the social sciences, many other fields of scientific inquiry routinely deal with theoretical concepts such as perceptions, attitudes, and intentions, these limitations of first-generation techniques are fundamental. To overcome these limitations, researchers have increasingly been turning to second-generation techniques. These methods, referred to as structural equation modeling (SEM), enable researchers to simultaneously model and estimate complex relationships among multiple dependent and independent variables. The concepts under consideration are typically unobservable and measured indirectly by multiple indicator variables. In estimating the relationships, SEM accounts for measurement error in observed variables. As a result, the method obtains a more precise measurement of the theoretical concepts of interest (Cole & Preacher, 2014). We will discuss these aspects in the following sections and chapters in greater detail. There are two types of SEM methods: covariance-based structural equation modeling (CB-SEM) and partial least squares structural equation modeling (PLS-SEM; also called PLS path modeling). CB-SEM is primarily used to confirm (or reject) theories (i.e., a set of systematic relationships between multiple variables that can be tested empirically). It does this by determining how well a proposed theoretical model can estimate the covariance matrix for a sample data set. In contrast, PLS has been introduced as a “causal-predictive” approach to SEM (Jöreskog & Wold, 1982, p. 270), which focuses on explaining the variance in the model’s dependent variables (Chin et al., 2020). We explain these differences in more detail later in the chapter. PLS-SEM is evolving rapidly as a statistical modeling technique. Over the last decades, there have been numerous introductory articles on the method (e.g., Chin, 1998; Haenlein & Kaplan, 2004; Hair, Risher, Sarstedt, & Ringle, 2019; Nitzl & Chin, 2017; Rigdon, 2013; Roldán & Sánchez-Franco, 2012; Tenenhaus, Esposito Vinzi, Chatelin, & Lauro, 2005; Wold, 1985) as well as review articles

Chapter 1  ■  An Introduction to Structural Equation Modeling  

5

examining how researchers across different disciplines have used the method (Exhibit 1.2). In light of the increasing maturation of the field, researchers have also started exploring the knowledge infrastructure of methodological research on PLS-SEM by analyzing the structures of authors, countries, and co-citation networks (Hwang, Sarstedt, Cheah, & Ringle, 2020; Khan et al., 2019).

EXHIBIT 1.2  ■  Review Articles on PLS-SEM Usage Discipline

References

Accounting

Lee, Petter, Fayard, & Robinson (2011) Nitzl (2016)

Construction management

Zeng, Liu, Gong, Hertogh, & König (2021)

Entrepreneurship

Manley, Hair, Williams, & McDowell (2020)

Family business

Sarstedt, Ringle, Smith, Reams, & Hair (2014)

Higher education

Ghasemy, Teeroovengadum, Becker, & Ringle (2020)

Hospitality and tourism

Ali, Rasoolimanesh, Sarstedt, Ringle, & Ryu (2018) Do Valle & Assaker (2016) Usakli & Kucukergin (2018)

Human resource management

Ringle, Sarstedt, Mitchell, & Gudergan (2020)

International business research

Richter, Sinkovics, Ringle, & Schlägel (2016)

Knowledge management

Cepeda-Carrión, Cegarra-Navarro, & Cillo (2019)

Management

Hair, Sarstedt, Pieper, & Ringle (2012)

Management information systems

Hair, Hollingsworth, Randolph, & Chong (2017)

Marketing

Hair, Sarstedt, Ringle, & Mena (2012)

Operations management

Bayonne, Marin-Garcia, & Alfalla-Luque (2020)

Ringle, Sarstedt, & Straub (2012)

Peng & Lai (2012) Psychology

Willaby, Costa, Burns, MacCann, & Roberts (2015)

Software engineering

Russo & Stol (2021)

Supply chain management

Kaufmann & Gaeckler (2015)

6   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Until the first edition of this book, published in 2014, there was no comprehensive textbook that explained the fundamental aspects of the method, particularly in a way that can be comprehended by the non-statistician. In recent years, a growing number of follow-up textbooks (e.g., Garson, 2016; Henseler, 2020; Ramayah, Cheah, Chuah, Ting, & Memon, 2016; Wong, 2019) and edited books on the method (e.g., Avkiran & Ringle, 2018; Esposito Vinzi, Chin, Henseler, & Wang, 2010; Latan & Noonan, 2017) have been published, which helped to further popularize PLS-SEM. This third edition of our book expands and clarifies the nature and role of PLS-SEM in social science research and hopefully makes researchers aware of a tool that will enable them to pursue research opportunities in many new and different ways.

CONSIDERATIONS IN USING STRUCTURAL EQUATION MODELING Depending on the underlying research question and the empirical data available, researchers must select an appropriate multivariate analysis method. Regardless of whether a researcher is using first- or second-generation multivariate analysis methods, several considerations are necessary in deciding to use multivariate analysis, particularly SEM. Among the most important are the following five elements: (1) composite variables, (2) measurement, (3) measurement scales, (4) coding, and (5) data distributions.

Composite Variables A composite variable (also referred to as a variate) is a linear combination of several variables that are chosen based on the research problem at hand (Hair, Black, Babin, & Anderson, 2019). The process for combining the variables involves calculating a set of weights, multiplying the weights (e.g., w1 and w2) times the associated data observations for the variables (e.g., x1 and x 2), and summing them. The mathematical formula for this linear combination with five variables is shown as follows (note that the composite value can be calculated for any number of variables): Composite value = w1 · x1 + w2 · x 2 + . . . + w5 · x5, where x stands for the individual variables and w represents the weights. All x variables (e.g., questions in a questionnaire) have responses from many respondents that can be arranged in a data matrix. Exhibit 1.3 shows such a data matrix, where i is an index that stands for the number of responses (i.e., cases). A composite value is calculated for each of the i respondents in the sample.

Chapter 1  ■  An Introduction to Structural Equation Modeling  

7

EXHIBIT 1.3  ■  Data Matrix Case

x1

x2

...

x5

Composite Value

1

x11

x21

...

x51

v1

...

...

...

...

...

...

i

x1i

x2i

...

x5i

vi

Measurement Measurement is a fundamental concept in conducting social science research. When we think of measurement, the first thing that comes to mind is often a ruler, which could be used to measure someone’s height or the length of a piece of furniture. But there are many other examples of measurement in life. When you drive, you use a speedometer to measure the speed of your vehicle, a heat gauge to measure the temperature of the engine, and a gauge to determine how much fuel remains in your tank. If you are sick, you use a thermometer to measure your temperature, and when you go on a diet, you measure your weight on a bathroom scale. Measurement is the process of assigning numbers to a variable based on a set of rules (Hair, Page, & Brunsveld, 2020). The rules are used to assign the numbers to the variable in a way that accurately represent the variable. With some variables, the rules are easy to follow, while with other variables, the rules are much more difficult to apply. For example, if the variable is gender, then it is easy to assign a 1 for females and a 0 for males. Similarly, if the variable is age or height, it is again easy to assign a number. But what if the variable is satisfaction or trust? Measurement in these situations is much more difficult because the phenomenon that is supposed to be measured is abstract, complex, and not directly observable. We therefore talk about the measurement of latent variables or constructs. We cannot directly measure abstract concepts such as satisfaction or trust. However, we can measure indicators of what we have agreed to call satisfaction or trust, for example, in a brand, product, or company. Specifically, when concepts are difficult to measure, one approach is to measure them indirectly by using a set of directly observable and measurable indicators (also called items or manifest variables). Each indicator represents a single separate aspect of a larger abstract concept. For example, if the concept is restaurant satisfaction, then the several indicators that could be used to measure this might be the following: 1. The taste of the food was excellent. 2. The speed of service met my expectations.

8   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

3. The waitstaff was very knowledgeable about the menu items. 4. The background music in the restaurant was pleasant. 5. The meal was a good value compared with the price. By combining several indicators to form a scale (or index; Chapter 2), we can indirectly measure the overall concept of restaurant satisfaction. Usually, researchers use several items to form a multi-item scale, which indirectly measures a concept, as in the restaurant satisfaction example above. The several measures are combined to form a single composite score (i.e., the score of the variate). In some instances, the composite score is a simple summation of the several measures. In other instances, the scores of the individual measures are combined to form a composite score by using a linear weighting process. The logic of using several individual variables to measure an abstract concept such as restaurant satisfaction is that the measure will be more accurate. The anticipated improved accuracy is based on the assumption that using several items to measure a single concept is more likely to represent all the different aspects of that concept. This involves reducing measurement error, which is the difference between the true value of a variable and the value obtained by a measurement. There are many sources of measurement error, including poorly worded questions in a survey, misunderstanding of the scaling approach, and incorrect application of a statistical method. Indeed, all measurements used in multivariate analysis are likely to contain some measurement error. The objective, therefore, is to reduce the measurement error as much as possible. Rather than using multiple items, researchers sometimes opt for the use of single-item constructs to measure concepts such as satisfaction or purchase intention. For example, we may use only “Overall, I’m satisfied with this restaurant” to measure restaurant satisfaction instead of all five items described above. While this is a good way to make the questionnaire shorter, it also reduces the quality of your measurement. We discuss the fundamentals of measurement and measurement evaluation in the following chapters.

Measurement Scales A measurement scale is a tool with a predetermined number of closed-ended responses that can be used to obtain an answer to a question. There are four types of measurement scales, each representing a different level of measurement— nominal, ordinal, interval, and ratio. Nominal scales are the lowest level of scales because they are the most restrictive in terms of the type of analysis that can be carried out. A nominal scale assigns numbers that can be used to identify and classify objects (e.g., people, companies, products) and is also referred to as a categorical scale. For example, if a survey asked a respondent to identify his or her profession and the categories are doctor, lawyer, teacher, engineer, and so

Chapter 1  ■  An Introduction to Structural Equation Modeling  

9

forth, the question has a nominal scale. Nominal scales can have two or more categories, but each category must be mutually exclusive, and all possible categories must be included. A number could be assigned to identify each category, and the numbers could be used to count the responses in each category, or the modal response or percentage in each category. If we have a variable measured on an ordinal scale, we know that if the value of that variable increases or decreases, this gives meaningful information. For example, if we code customers’ use of a product as nonuser = 0, light user = 1, and heavy user = 2, we know that if the value of the use variable increases, the level of use also increases. Therefore, when an attribute or characteristic is measured on an ordinal scale, the values provide information about the order of our observations. However, we cannot assume that the differences in the order are equally spaced. That is, we do not know if the difference between “nonuser” and “light user” is the same as between “light user” and “heavy user,” even though the differences in the values (i.e., 0–1 and 1–2) are equal. Therefore, it is not appropriate to calculate arithmetic means or variances for ordinal data. If an attribute or characteristic is measured with an interval scale, we have precise information on the rank order at which something is measured and, in addition, we can interpret the magnitude of the differences in values directly. For example, if the temperature is 80°F, we know that if it drops to 75°F, the difference is exactly 5°F. This difference of 5°F is the same as the increase from 80°F to 85°F. This exact “spacing” is called equidistance, and equidistant scales are necessary for certain analysis techniques, such as SEM. What the interval scale does not give us is an absolute zero point. If the temperature is 0°F, it may feel cold, but the temperature can drop further. The value of 0 therefore does not mean that there is no temperature at all (Sarstedt & Mooi, 2019; Chapter 3.6). The value of interval scales is that almost any type of mathematical computations can be carried out, including the mean and standard deviation. Moreover, you can convert and extend interval scales to alternative interval scales. For example, instead of degrees Fahrenheit (°F), many countries use degrees Celsius (°C) to measure the temperature. While 0°C marks the freezing point, 100°C depicts the boiling point of water. You can convert temperature from Fahrenheit into Celsius by using the following equation: Degrees Celsius (°C) = (degrees Fahrenheit (°F) − 32) · 5 / 9. In a similar way, you can convert data (via rescaling) on a scale from 1 to 5 into data on a scale from 0 to 100: (([data point on the scale from 1 to 5] − 1) / (5 − 1)) · 100. A ratio scale provides the most information. If something is measured on a ratio scale, we know that a value of 0 means that a particular characteristic for a variable is not present. For example, if a customer buys no products (value = 0), then he or she really buys no products. Or, if we spend no money on advertising a new product (value = 0), we really spend no money. Therefore, the zero point or origin of the variable is equal to 0. The measurement of length, mass, and volume as well as time elapsed uses ratio scales. With ratio scales, all types of mathematical computations are possible.

10   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Coding The assignment of numbers to categories in a manner that facilitates measurement is referred to as coding. In survey research, data are often precoded. Precoding is assigning numbers ahead of time to answers (e.g., scale points) that are specified on a questionnaire. For example, a 10-point agree–disagree scale typically would assign the number 10 to the highest endpoint “agree” and a 1 to the lowest endpoint “disagree,” and the points between would be coded 2 to 9. Postcoding is assigning numbers to categories of responses after data are collected. The responses might be to an open-ended question in a quantitative survey or to an interview response in a qualitative study. Coding is very important in the application of multivariate analysis because it determines when and how various types of scales can be used. For example, variables measured with interval and ratio scales can always be used with multivariate analysis. However, when using ordinal scales such as Likert scales (which is common within an SEM context), researchers have to pay special attention to the coding to fulfill the requirement of equidistance. For example, when using a typical 7-point Likert scale with the categories (1) fully disagree, (2) disagree, (3) somewhat disagree, (4) neither agree nor disagree, (5) somewhat agree, (6) agree, and (7) fully agree, the inference is that the “distance” between categories 1 and 2 is the same as between categories 3 and 4. In contrast, the same type of Likert scale but using the categories (1) fully disagree, (2) disagree, (3) neither agree nor disagree, (4) somewhat agree, (5) agree, (6) strongly agree, and (7) fully agree is unlikely to be equidistant, as there are only two categories below the neutral category “neither agree nor disagree,” whereas four categories score above the neutral category. This would clearly bias any result in favor of a better outcome. A suitable Likert scale, as in our first example above, will present symmetry of Likert items about a middle category that have clearly defined linguistic qualifiers for each category. In such symmetric scaling, equidistant attributes will typically be more clearly observed or, at least, inferred. When a Likert scale is perceived as symmetric and equidistant, it will behave more like an interval scale. So, while a Likert scale is ordinal, if it is well presented, then it is likely that the Likert scale can approximate an interval-level measurement, and the corresponding variables can be used in SEM.

Data Distributions When researchers collect quantitative data, the answers to the questions asked are reported as a distribution across the available (predefined) response categories. For example, if responses are requested using a 7-point agree– disagree scale, then a distribution of the answers in each of the possible response categories (1, 2, 3, . . . , 7) can be calculated and displayed in a table or chart.

Chapter 1  ■  An Introduction to Structural Equation Modeling  

11

Exhibit 1.4 shows an example of the frequencies of a corresponding variable x. As can be seen, most respondents indicated a 4 on the 7-point scale, followed by 3 and 5, and finally (barely visible), 1 and 7. Overall, the frequency count approximately follows a bell-shaped, symmetric curve around the mean value of 4. This bell-shaped curve is the normal distribution, which many statistical methods require for their analyses.

EXHIBIT 1.4  ■  Distribution of Responses

3,000.0

Frequency

Mean = 3.50 Std. Dev. = 0.748 N = 5,000 2,000.0

1,000.0

0.0

.00

2.00 1

2

4.00 3

4 x

6.00 5

6

7

While many different types of distributions exist (e.g., normal, binomial, Poisson), researchers working with SEM generally only need to distinguish normal from nonnormal distributions. Normal distributions are usually desirable, especially when working with CB-SEM. In contrast, PLS-SEM generally makes no assumptions about the data distributions. However, for reasons discussed in later chapters, it is worthwhile to consider the distribution when working with PLS-SEM. To assess whether the data follow a normal distribution, researchers can apply statistical tests such as the Kolmogorov–Smirnov test and Shapiro–Wilk test (Sarstedt & Mooi, 2019; Chapter 6.3.3.3). In addition, researchers can examine two measures of distributions—skewness and kurtosis (Chapter 2)—which allow assessing to what extent the data deviate from normality (Hair, Black, Babin, & Anderson, 2019).

12   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

PRINCIPLES OF STRUCTURAL EQUATION MODELING Path Models With Latent Variables Path models are diagrams used to visually display the hypotheses and variable relationships that are examined when SEM is applied (Hair, Page, & Brunsveld, 2020; Hair, Ringle, & Sarstedt, 2011). An example of a path model is shown in Exhibit 1.5.

EXHIBIT 1.5  ■  A Simple Path Model Measurement model/outer model of exogenous latent variables

Measurement model/outer model of endogenous latent variables z3

x1 x2

Y1

Y3

x3

x7

e7

x8

e8

x9

e9

x4 x5

Y2

Y4

x10

x6 z4

Structural model/inner model

Constructs (i.e., variables that are not directly measured) are represented in path models as circles or ovals (Y1 to Y4). The indicators, also called items or manifest variables, are the directly measured variables that contain the raw data. They are represented in path models as rectangles (x1 to x10). Relationships between constructs as well as between constructs and their assigned indicators are shown as arrows. In PLS-SEM, the arrows are always single-headed, thus representing directional relationships. Single-headed arrows are considered predictive relationships and, with strong theoretical support, can be interpreted as causal relationships.

Chapter 1  ■  An Introduction to Structural Equation Modeling  

13

A PLS path model consists of two elements. First, there is a structural model (also called the inner model in the context of PLS-SEM) that links together the constructs (circles or ovals). The structural model also displays the relationships (paths) between the constructs. Second, a construct’s measurement model (also referred to as the outer model in PLS-SEM) displays the relationships between the construct and its indicator variables (rectangles). In Exhibit 1.5, there are two types of measurement models: one for the exogenous latent variables (i.e., those constructs that explain other constructs in the model) and one for the endogenous latent variables (i.e., those constructs that are being explained in the model). Rather than referring to measurement models of exogenous and endogenous latent variables, researchers often refer to the measurement model of one specific latent variable. For example, x1 to x 3 are the indicators used in the measurement model of Y1 while Y4 has only the x10 indicator in the measurement model. The error terms (e.g., e7 or e8; Exhibit 1.5) are connected to the (endogenous) constructs and (reflectively) measured variables by single-headed arrows. Error terms represent the unexplained variance when path models are estimated (i.e., the difference between the model’s in-sample prediction of a value and an observed value of a manifest or latent variable). In Exhibit 1.5, error terms e7 to e9 are on those indicators whose relationships point from the construct (Y3) to the indicator (i.e., reflectively measured indicators). In contrast, the formatively measured indicators x1 to x6, where the relationship goes from the indicator to the construct (Y1 and Y2), do not have error terms (Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016). Finally, for the single-item construct Y4, the direction of the relationships between the construct and the indicator is not relevant, as construct and item are equivalent. For the same reason, there is no error term connected to x10. The structural model also contains error terms. In Exhibit 1.5, z3 and z4 are associated with the endogenous latent variables Y3 and Y4 (note that error terms on constructs and measured variables are labeled differently). In contrast, the exogenous latent variables (Y1 and Y2) that only explain other latent variables in the structural model do not have an error term, regardless of whether they are specified reflectively or formatively.

Testing Theoretical Relationships Path models are developed based on theory and are often used to test theoretical relationships. Theory is a set of systematically related hypotheses developed following the scientific method that can be used to explain and predict outcomes. Thus, hypotheses are individual conjectures, whereas theories are multiple hypotheses that are logically linked together and can be tested empirically. Two types of theory are required to develop path models: measurement theory and structural theory. Measurement theory specifies which indicators and how these are used to measure a certain construct. In contrast, structural theory specifies how the constructs are related to each other in the structural model.

14   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Testing theory using PLS-SEM follows a two-step process (Hair, Black, Babin, & Anderson, 2019). We first test the measurement theory to confirm the reliability and validity of the measurement models. After the measurement models are confirmed, we move on to testing the structural theory. The logic is that we must first confirm the measurement theory before testing the structural theory, because structural theory cannot be confirmed if the measures are unreliable or invalid.

Measurement Theory Measurement theory specifies how the latent variables (constructs) are measured. Generally, there are two different ways to measure unobservable variables. One approach is referred to as reflective measurement, and the other is formative measurement. Constructs Y1 and Y2 in Exhibit 1.5 are modeled based on a formative measurement model. Note that the directional arrows are pointing from the indicator variables (x1 to x3 for Y1 and x4 to x6 for Y2) to the construct, indicating a predictive (causal) relationship in that direction. In contrast, Y3 in the exhibit is modeled based on a reflective measurement model. With reflective indicators, the direction of the arrows is from the construct to the indicator variables, indicating the assumption that the construct causes the measurement (more precisely, the covariation) of the indicator variables. As indicated in Exhibit 1.5, reflective measures have an error term associated with each indicator, which is not the case with formative measures. The latter are assumed to be error free (Diamantopoulos, 2006). Finally, note that Y4 is measured using a single item rather than multi-item measures. Therefore, the relationship between construct and indicator is undirected. Deciding whether to measure the constructs reflectively vs. formatively and whether to use multiple items or a single-item measure are fundamental when developing path models. We therefore explain these two approaches to modeling constructs as well as their variations in more detail in Chapter 2. Structural Theory Structural theory shows how the latent variables are related to each other (i.e., it shows the constructs and their path relationships in the structural model). The location and sequence of the constructs are either based on theory or the researcher’s experience and accumulated knowledge, or both. When path models are developed, the sequence is from left to right. The variables on the left side of the path model are independent variables, and any variable on the right side is the dependent variable. Moreover, variables on the left are shown as sequentially preceding and predicting the variables on the right. However, when variables are in the middle of the path model (between the variables that serve only as independent or dependent variables – Y3) they may also serve as both independent and dependent variables in the structural model.

Chapter 1  ■  An Introduction to Structural Equation Modeling  

15

When latent variables serve only as independent variables, they are called exogenous latent variables (Y1 and Y2). When latent variables serve only as dependent variables (Y4) or as both independent and dependent variables (Y3), they are called endogenous latent variables. Any latent variable that has only single-headed arrow going out of it is an exogenous latent variable. In contrast, endogenous latent variables can have either single-headed arrows going both into and out of them (Y3) or only going into them (Y4). Note that the exogenous latent variables Y1 and Y2 do not have error terms since these constructs are the entities (independent variables) that are explaining the dependent variables in the path model.

PLS-SEM, CB-SEM, AND REGRESSIONS BASED ON SUM SCORES There are two main approaches to estimating the relationships in a structural equation model (Hair, Black, Babin, & Anderson, 2019; Hair, Ringle, & Sarstedt, 2011). One is CB-SEM, the other is PLS-SEM, the latter being the focus of this book. Each is appropriate for a different research context, and researchers need to understand the differences in order to apply the correct method (Marcoulides & Chin, 2013; Rigdon, Sarstedt, & Ringle, 2017). Finally, some researchers have argued for using regressions based on sum scores, instead of some type of indicator weighting as done by PLS-SEM. The sum scores approach offers practically no value compared to the PLS-SEM weighted approach. For this reason, in the following, we only briefly discuss sum scores and instead focus on the PLS-SEM and CB-SEM methods. A crucial conceptual difference between PLS-SEM and CB-SEM relates to the way each method treats the latent variables included in the model. CB-SEM represents a common factor-based SEM method that considers the constructs as common factors that explain the covariation between its associated indicators. This approach is consistent with the measurement philosophy underlying reflective measurement, in which the indicators and their covariations are regarded as manifestations of the underlying construct. In principle, CB-SEM can also accommodate formative measurement models, even though the method follows a common factor model estimation approach (Diamantopoulos, Riefler, & Roth, 2008). To estimate this model type, however, researchers must follow rules that require specific constraints on the model to ensure model identification (Bollen & Davies, 2009; Diamantopoulos & Riefler, 2011), which means that the method can calculate estimates for all model parameters. As Hair, Sarstedt, Ringle, and Mena (2012, p. 420) note, “these constraints often contradict theoretical considerations, and the question arises whether model design should guide theory or vice versa.” PLS-SEM, on the other hand, assumes the concepts of interest can be measured as composites (Jöreskog & Wold, 1982), which is why the method is

16   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

regarded as a composite-based SEM method (Hwang et al., 2020). Model estimation in PLS-SEM involves combining the indicators based on a linear method to form composite variables (Chapter 3). The composite variables are assumed to be comprehensive representations of the constructs and, therefore, valid proxies of the conceptual variables being examined (e.g., Hair & Sarstedt, 2019). The composite-based approach is consistent with the measurement philosophy underlying formative measurement, but this does not imply that PLSSEM is only capable of estimating formatively specified constructs. The reason is that the estimation perspective (i.e., forming composites to represent conceptual variables) should not be confused with the measurement theory perspective (i.e., specifying measurement models as reflective or formative). The way a method like PLS-SEM estimates the model parameters needs to be clearly distinguished from any measurement theoretical considerations on how to operationalize constructs (Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016). Researchers can include reflectively and formatively specified measurement models, which PLSSEM estimates without any limitations. In following a composite-based approach to SEM, PLS relaxes the strong assumptions of CB-SEM that all of the covariation between the sets of indicators is explained by a common factor (Henseler et al., 2014; Rigdon, 2012; Rigdon et al., 2014). At the same time, using weighted composites of indicator variables facilitates accounting for measurement error, thus making PLS-SEM superior compared with multiple regression using sum scores. If multiple regression with sum scores is used, the researcher assumes an equal weighting of indicators, which means that each indicator contributes equally to forming the composite (Hair & Sarstedt, 2019; Henseler et al., 2014). Referring to our descriptions on composite variables at the very beginning of this chapter, this would imply that all indicator weights w are set to 1. As noted earlier, the resulting mathematical formula for a linear combination with five variables would be as follows: Composite value = 1 · x1 + 1 · x 2 + . . . + 1 · x5. For example, if a respondent has the scores 4, 5, 4, 6, and 7 on the five variables, the corresponding composite value would be 26. While easy to apply, regressions using sum scores equalize any differences in the individual item weights. Such differences are, however, common in research reality, and ignoring them entails substantial biases in the parameter estimates (e.g., Hair, Hollingsworth, Randolph, & Chong, 2017). Furthermore, learning about individual item weights offers important insights, as the researcher learns about each item’s importance for forming the composite in a certain context (i.e., its relationships with other composites in the structural model). When measuring customer satisfaction, for example, the researcher learns which aspects covered by the individual items are of particular importance for the shaping of satisfaction. It is important to note that the composites produced by PLS-SEM are not assumed to be identical to the constructs, which they replace. They are explicitly

Chapter 1  ■  An Introduction to Structural Equation Modeling  

17

recognized as approximations (Rigdon, 2012). As a consequence, some scholars view CB-SEM as a more direct and precise method to empirically measure theoretical concepts (e.g., Rönkkö, McIntosh, & Antonakis, 2015), while PLS-SEM provides approximations. Other scholars contend, however, that such a view is quite shortsighted as common factors derived in CB-SEM are also not necessarily equivalent to the theoretical concepts that are the focus of research (Rigdon, 2012; Rigdon, Sarstedt, & Ringle, 2017; Rossiter, 2011; Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016). Rigdon, Becker, and Sarstedt (2019a) show that common factor models can be subject to considerable degrees of metrological uncertainty. Metrological uncertainty refers to the dispersion of the measurement values that can be attributed to the object or concept being measured (JCGM/WG1, 2008). Numerous sources contribute to metrological uncertainty, such as definitional uncertainty or limitations related to the measurement scale design, which go well beyond the simple standard errors considered in CB-SEM analyses (Hair & Sarstedt, 2019). As such, uncertainty is a validity threat to measurement and has adverse consequences for the replicability of study findings (Rigdon, Sarstedt, & Becker, 2020). While uncertainty also applies to composite-based SEM, the way researchers treat models in CB-SEM analyses typically leads to a pronounced increase in uncertainty. More precisely, in an effort to improve model fit, researchers typically restrict the number of indicators per construct, which in turn increases uncertainty (Hair, Matthews, Matthews, & Sarstedt, 2017; Rigdon, Becker, & Sarstedt, 2019a). These issues do not necessarily imply that composite models are superior, but they cast considerable doubt on the assumption of some researchers that CB-SEM constitutes the gold standard when measuring unobservable concepts. In fact, researchers in various fields of science show increasing appreciation that common factors may not always be the right approach to measure concepts (e.g., Rhemtulla, van Bork, & Borsboom, 2020; Rigdon, 2016). Similarly, Rigdon, Becker, and Sarstedt (2019b) show that using sum scores can significantly increase the degree of metrological uncertainty, which casts additional doubt on this measurement practice. Apart from differences in the philosophy of measurement, the differing treatment of latent variables and, more specifically, the availability of latent variable scores also has consequences for the methods’ areas of application. Specifically, while it is possible to estimate latent variable scores within a CB-SEM framework, these estimated scores are not unique. That is, an infinite number of different sets of latent variable scores that will fit the model equally well are possible. A crucial consequence of this factor (score) indeterminacy is that the correlations between a common factor and any variable outside the factor model are themselves indeterminate (Guttman, 1955). That is, they may be high or low, depending on which set of factor scores one chooses. As a result, this limitation makes CB-SEM grossly unsuitable for prediction (e.g., Hair & Sarstedt, 2021a; Dijkstra, 2014). In contrast, a major advantage of PLS-SEM is that it always produces a single specific (i.e., determinate) composite score for each case, once the weights are established. These determinate scores are proxies of the concepts being

18   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

measured, just as factors are proxies for the conceptual variables in CB-SEM (Rigdon, Sarstedt, & Ringle, 2017; Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016). Using these proxies as input, PLS-SEM applies ordinary least squares regression with the objective of minimizing the error terms (i.e., the residual variance) of the endogenous constructs. In short, PLS-SEM estimates coefficients (i.e., path model relationships) with the goal of maximizing the R² values (i.e., the amount of explained variance) of the (target) endogenous constructs. This feature achieves the (in-sample) prediction objective of PLS-SEM, which is therefore the preferred method when the research objective is theory development and explanation of variance (prediction of the constructs). For this reason, PLS-SEM is regarded a variance-based SEM approach. Specifically, the logic of the PLS-SEM approach is that all of the indicators’ variance should be used to estimate the model relationships, with particular focus on prediction of the dependent variables (e.g., McDonald, 1996). In contrast, CB-SEM divides the total variance into three types— common, unique, and error variance—but utilizes only common variance (i.e., the variance shared with other indicators in the same measurement model) in the model estimation (Hair, Black, Babin, & Anderson, 2019). That is, CB-SEM only explains the covariation between the indicators (Jöreskog, 1973) and does not focus on predicting dependent variables (Hair, Matthews, Matthews, & Sarstedt, 2017). Note that PLS-SEM is similar but not equivalent to PLS regression, another popular multivariate data analysis technique (Abdi, 2010; Wold, Sjöström, & Eriksson, 2001). PLS regression is a regression-based approach that explores the linear relationships between multiple independent variables and a single or multiple dependent variable(s). PLS regression differs from regular regression, however, because in developing the regression model, it derives composite factors from the multiple independent variables by means of principal component analysis. PLSSEM, on the other hand, relies on prespecified networks of relationships between constructs as well as between constructs and their measures (see Mateos-Aparicio, 2011, for a more detailed comparison between PLS-SEM and PLS regression).

CONSIDERATIONS WHEN APPLYING PLS-SEM Key Characteristics of the PLS-SEM Method Several considerations are important when deciding whether or not to apply PLS-SEM. These considerations also have their roots in the method’s characteristics. The statistical properties of the PLS-SEM algorithm have important features associated with the characteristics of the data and model used. Moreover, the properties of the PLS-SEM method affect the evaluation of the results. There are four critical issues relevant to the application of PLS-SEM (Hair, Ringle,

Chapter 1  ■  An Introduction to Structural Equation Modeling  

19

& Sarstedt, 2011; Hair, Risher, Sarstedt, & Ringle, 2019): (1) data characteristics, (2) model characteristics, (3) model estimation, and (4) model evaluation. Exhibit 1.6 summarizes the key characteristics of the PLS-SEM method. An initial overview of these issues is provided in this chapter, and a more detailed explanation is provided in later chapters of the book, particularly as they relate to the PLS-SEM algorithm and evaluation of results. EXHIBIT 1.6  ■  Key Characteristics of PLS-SEM Data Characteristics Sample size

•• Neglectable identification issues with small sample sizes •• Achieves high levels of statistical power with small sample sizes •• Larger sample sizes increase the precision (i.e., consistency) of PLS-SEM estimations

Distribution

•• No distributional assumptions; PLS-SEM is a nonparametric method •• Influential outliers and collinearity may influence the results

Missing values

•• Highly robust as long as missing values are below a reasonable level (less than 5%)

Scale of measurement

•• Works with metric data and quasi-metric (ordinal) scaled variables •• The standard PLS-SEM algorithm also accommodates binary coded variables, but additional considerations are required when they are used as control variables, moderators, and in the analysis of data from discrete choice experiments

Model Characteristics Number of items in each construct’s measurement model

•• Handles constructs measured with single- and multi-item measures

Relationships between constructs and their indicators

•• Easily incorporates reflective and formative measurement models

(Continued)

20   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 1.6  ■  (Continued) Model Characteristics Model complexity

•• Handles complex models with many structural model relationships

Model setup

•• No causal loops (no circular relationships) are allowed in the structural model

Model Estimation Objective

•• Aims at maximizing the amount of unexplained variance in the dependent measures (i.e., maximizes the R² values)

Efficiency

•• Converges after a few iterations (even in situations with complex models and/or large sets of data) to the optimum solution (i.e., the algorithm is very efficient)

Nature of constructs

•• Viewed as proxies of the latent concept under investigation, represented by composites

Construct scores

•• Estimated as linear combinations of their indicators (i.e., they are determinate) •• Used for predictive purposes •• Can be used as input for subsequent analyses •• Not affected by data limitations and inadequacies

Parameter estimates

•• Structural model relationships are generally underestimated, and measurement model relationships are generally overestimated when solutions are obtained using data from common factor models •• Unbiased and consistent when estimating data from composite models •• High levels of statistical power compared to alternative methods such as CB-SEM

Model Evaluation Evaluation of the overall model

The concept of fit—as defined in CB-SEM—does not apply to PLS-SEM. Efforts to introduce model fit measures have generally proven unsuccessful (Continued)

Chapter 1  ■  An Introduction to Structural Equation Modeling  

21

EXHIBIT 1.6  ■  (Continued) Model Evaluation Evaluation of the measurement models

•• Reflective measurement models are assessed on the grounds of indicator reliability, internal consistency reliability, convergent validity, and discriminant validity •• Formative measurement models are assessed on the grounds of convergent validity, indicator collinearity, and the significance and relevance of indicator weights

Evaluation of the structural model

•• Collinearity among sets of predictor constructs •• Significance and relevance of path coefficients •• Criteria to assess the model’s in-sample (i.e., explanatory) power and out-of-sample predictive power (PLSpredict)

Additional analyses

•• Methodological research has substantially extended the original PLS-SEM method by introducing advanced modeling, assessment, and analysis procedures. Some examples include: {{

Confirmatory tetrad analysis

{{

Discrete choice modeling

{{

Endogeneity assessment

{{

Higher-order constructs

{{

Latent class analysis

{{

Measurement model invariance

{{

Mediation analysis

{{

Model selection

{{

Moderating effects

{{

Multigroup analysis

{{

Necessary condition analysis

{{

Nonlinear effects

Source: Adapted and extended from Hair, J. F., Ringle, C. M., & Sarstedt, M. (2011). PLS-SEM: Indeed a silver bullet. Journal of Marketing Theory and Practice, 19(2), 139–151. Copyright © 2011 by M. E. Sharpe, Inc. Reprinted with permission of the publisher (Taylor & Francis Ltd., http://www .tandfonline.com).

22   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

PLS-SEM works efficiently with small sample sizes and complex models (Cassel, Hackl, & Westlund, 1999; Chin, 2010). In addition, different from maximum likelihood–based CB-SEM, which requires normally distributed data, PLS-SEM makes no distributional assumptions (i.e., it is nonparametric). PLSSEM can easily handle reflective and formative measurement models, as well as single-item constructs, with no identification problems. It can therefore be applied in a wide variety of research situations. When applying PLS-SEM, researchers also benefit from high efficiency in parameter estimation, which is manifested in the method’s greater statistical power compared to CB-SEM. Greater statistical power means that PLS-SEM is more likely to render a specific relationship significant when it is in fact present in the population. The same holds for the comparison with regression based on sum scores, which lags behind PLS-SEM in terms of statistical power (Hair, Hult, Ringle, Sarstedt, & Thiele, 2017). There are, however, several limitations of PLS-SEM. In its basic form, the technique cannot be applied when structural models contain causal loops or circular relationships between the latent variables (i.e., non-recursive models). Early extensions of the basic PLS-SEM algorithm that have not yet been implemented in standard PLS-SEM software packages, however, enable handling of circular relationships (Lohmöller, 1989). Furthermore, since PLS-SEM does not have an established global goodness-of-fit measure, its use for theory testing and confirmation is more limited in certain situations. Recent research has attempted to promote common goodness-of-fit measures within a PLS-SEM framework (Schuberth, Henseler, & Dijkstra, 2018), but with very limited success. The concept of model fit—as defined in CB-SEM—is not applicable to PLSSEM because of the methods’ differing functioning principles (see Chapter 6 for details). Instead, PLS-SEM-based model estimation and assessment follows a causal–predictive paradigm, where the aim is to test the predictive power in the confinements of a model carefully developed on the ground of theory and logic. The underlying causal–predictive logic follows what Gregor (2006) refers to as explaining and predicting (EP) theories. EP theories imply an understanding of the underlying causes and prediction as well as description of theoretical constructs and their relationships. According to Gregor (2006, p. 626), this type of theory “corresponds to commonly held views of theory in both the natural and social sciences.” Numerous seminal theories and models such as Oliver’s (1980) expectation–disconfirmation theory or the various technology acceptance models (e.g., Davis, 1989; Venkatesh, Morris, Davis, & Davis, 2003) follow an EP– theoretic approach in that they aim to explain and predict. PLS-SEM is perfectly suited to investigate models derived from EP theories as the method strikes a balance between machine learning methods, which are fully predictive in nature, and CB-SEM, which focuses on confirmation and model fit (Richter, CepedaCarrión, Roldán, & Ringle, 2016). Its causal–predictive nature makes PLS-SEM particularly appealing for research in fields that aim to derive recommendations for practice. For example, recommendations in managerial implications sections that populate business research journals always come in the form of predictive

Chapter 1  ■  An Introduction to Structural Equation Modeling  

23

statements (“our results suggest that managers should . . .”). Making such statements requires a prediction focus in model estimation and evaluation. PLS-SEM perfectly emphasizes this need as the method sheds light on the mechanisms (i.e., the structural model relationships) through which the predictions were generated (Hair, 2020; Hair & Sarstedt, 2019, 2021b). In early writing, researchers noted that PLS estimation is “deliberately approximate” to factor-based SEM (Hui & Wold 1982, p. 127), a characteristic that has come to be known as the PLS-SEM bias (e.g., Chin, Marcoulin, & Newsted, 2003). A number of studies have used simulations to demonstrate the alleged PLSSEM bias (e.g., Goodhue, Lewis, & Thompson, 2012; McDonald, 1996; Rönkkö & Evermann, 2013), which manifests itself in measurement model estimates that are higher, while structural model estimates that are lower compared to the prespecified values. The studies conclude that parameter estimates will approach what has been labeled the “true” parameter values when both the number of indicators per construct and sample size increase (Hui & Wold, 1982). However, all these simulation studies used CB-SEM as the benchmark against which the PLSSEM estimates were evaluated with the assumption that they should be the same. Because PLS-SEM is a composite-based approach, which uses the total variance to estimate parameters, biases can be expected in such an assessment (Lohmöller, 1989; Schlittgen, Sarstedt, & Ringle, 2020; Schneeweiß, 1991). Not surprisingly, the very same issues apply when composite models are used to estimate CB-SEM results. In fact, Sarstedt, Hair, Ringle, Thiele, and Gudergan (2016) show that the biases produced by CB-SEM are far more severe than those of PLS-SEM, when applying the method to the wrong type of model (i.e., estimating composite models with CB-SEM vs. estimating common factor models with PLS-SEM). When acknowledging the different nature of the construct measures, most of the criticism voiced by critics of the PLS-SEM method (Rönkkö, McIntosh, Antonakis, & Edwards, 2016) are no longer an issue (Cook & Forzani, 2020). Apart from these conceptual concerns, simulation studies show that the differences between PLS-SEM and CB-SEM estimates when assuming the latter as a standard of comparison are very small, provided that measurement models meet minimum recommended standards in terms of measurement quality (i.e., reliability and validity). Specifically, when the measurement models have four or more indicators and indicator loadings meet the common standards (≥0.70), there is practically no difference between the two methods in terms of parameter accuracy (e.g., Reinartz, Haenlein, & Henseler, 2009; Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016). Thus, the extensively discussed PLS-SEM bias is of no practical relevance for the vast majority of applications (e.g., Binz Astrachan, Patel, & Wanzenried, 2014). Finally, methodological research has substantially extended the original PLSSEM method by introducing advanced modeling, assessment, and analysis procedures. Examples include different types of robustness checks (Sarstedt, Ringle et al., 2020), discrete choice modeling (Hair, Ringle et al., 2018), necessary condition analysis (Richter, Schubring, Hauff, Ringle, & Sarstedt, 2020), outof-sample prediction metrics (Hair, 2020), endogeneity assessment (Hult et al.,

24   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

2018), and higher-order constructs (Sarstedt, Hair, Cheah, Becker, & Ringle, 2019). Chapter 8 and Hair, Sarstedt, Ringle, and Gudergan (2018) offer an introduction into several of these advanced issues. In the following, we discuss aspects related to data characteristics (e.g., minimum sample size requirements) and model characteristics (e.g., model complexity).

Data Characteristics Data characteristics, such as minimum sample size requirements, nonnormal data, and scales of measurement (i.e., the use of different scale types), are among the most often stated reasons for applying PLS-SEM across numerous disciplines (e.g., Ghasemy, Teeroovengadum, Becker, & Ringle, 2020; Hair, Sarstedt, Ringle, & Mena, 2012; Ringle, Sarstedt, Mitchell, & Gudergan, 2020). While some of the arguments are consistent with the method’s capabilities, others are not. In the following sections, we discuss these and related data characteristics.

Minimum Sample Size Requirements Small sample size is probably the most often misused argument for using PLS-SEM, with some researchers considering unacceptably low sample sizes (Goodhue et al., 2012; Marcoulides & Saunders, 2006). These researchers oftentimes believe there is some “magic” in the PLS-SEM approach that allows them to use a very small sample (e.g., less than 100) to obtain results representing the effects that exist in a population of several million elements or individuals. No multivariate analysis technique, including PLS-SEM, has this kind of “magic” capabilities (Petter, 2018). PLS-SEM can certainly be used with smaller samples, but the population’s nature determines the situations in which small sample sizes are acceptable (Rigdon, 2016). For example, in business-to-business research, populations are often restricted in size. Assuming that other situational characteristics are equal, the more heterogeneous the population, the larger the sample size needed to achieve an acceptable sampling error (Cochran, 1977). If basic sampling theory guidelines are not considered (Sarstedt, Bengart, Shaltoni, & Lehmann, 2018), questionable results are produced. In addition, when applying multivariate analysis techniques, the technical dimension of the sample size becomes relevant. Adhering to the minimum sample size guidelines ensures the results of a statistical method such as PLS-SEM have adequate statistical power. In these regards, an insufficient sample size may not reveal an effect that exists in the underlying population (which results in committing a type II error). Moreover, executing statistical analyses based on minimum sample size guidelines will ensure the results of the statistical method are robust and the model is generalizable to another sample from that same population. Thus, an insufficient sample size may lead to PLS-SEM results that differ from

Chapter 1  ■  An Introduction to Structural Equation Modeling  

25

those of another sample. In the following, we focus on the PLS-SEM method and its technical requirements of the minimum sample size. The overall complexity of a structural model has little influence on the sample size requirements for PLS-SEM. The reason is the PLS-SEM algorithm does not compute all relationships in the structural model at the same time. Instead, it uses ordinary least squares regressions to estimate the model’s partial regression relationships. Two early studies systematically evaluated the performance of PLS-SEM with small sample sizes and concluded that the method performed well (e.g., Chin & Newsted, 1999; Hui & Wold, 1982). Subsequent simulation studies by, for example, Hair, Hult, Ringle, Sarstedt, and Thiele (2017) and Reinartz, Haenlein, and Henseler (2009) indicate that PLS-SEM is the method of choice when the sample size is small. Moreover, compared with its covariancebased counterpart, PLS-SEM has higher levels of statistical power in situations with complex model structures and smaller sample sizes. Similarly, Henseler et al. (2014) show that solutions can be obtained with PLS-SEM when other methods such as CB-SEM do not converge or provide inadmissible solutions. For instance, problems often are encountered when using CB-SEM on complex models, especially when the sample size is limited. Finally, CB-SEM suffers from identification and convergence issues when formative measures are involved (e.g., Diamantopoulos & Riefler, 2011). Unfortunately, some researchers believe that sample size considerations do not play a role in the application of PLS-SEM. This idea has been fostered by the often-cited 10 times rule (Barclay, Higgins, & Thompson, 1995), which suggests the sample size should be equal to 10 times the number of independent variables in the most complex regression in the PLS path model (i.e., considering both measurement and structural models). This rule of thumb is equivalent to saying the minimum sample size should be 10 times the maximum number of arrowheads pointing at a latent variable anywhere in the PLS path model. While this rule offers a rough guideline, the minimum sample size requirement should consider the statistical power of the estimates. To assess statistical power, researchers can consider power tables (Cohen, 1992) or power analyses using programs such as G*Power (Faul, Erdfelder, Buchner, & Lang, 2009), which is available free of charge at http://www.gpower.hhu.de/. These approaches do not explicitly consider the entire model but use the most complex regression in the (formative) measurement models and structural model of a PLS path model as point of reference for assessing the statistical power. In doing so, researchers typically aim at achieving a power level of 80%. However, the minimum sample size resulting from these calculations may still be too small (Kock & Hadaya, 2018). Addressing these concerns, Kock and Hadaya (2018) proposed the inverse square root method, which considers the probability that the ratio of a path coefficient and its standard error will be greater than the critical value of a test statistic for a specific significance level. Therefore, the results for the technically required minimum sample size depend on only one path coefficient and do not depend on the size of the most complex regression in the model. Assuming a

26   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

common power level of 80% and significance levels of 1%, 5%, and 10%, the minimum sample size (nmin) is given by the following equations, where pmin is the value of the path coefficient with the minimum magnitude in the PLS path model, which is expected to be statistically significant: 2

Significance level = 1% : nmin

 3.168  >   p  min 

Significance level = 5% : nmin

 2.486  >   p  min 

Significance level = 10% : nmin

 2.123  >   p  min 

2

2

For example, assuming a significance level of 5% and a minimum path coefficient of 0.2, the minimum sample size is given by 2

nmin

 2.486  >  = 154.505  0.2 

This result needs to be rounded to the next integer, so the minimum sample size is 155. The inverse square root method is rather conservative in that it slightly overestimates the sample size required to render an effect significant at a given power level. Most importantly, the method stands out due to its ease of use as it can be readily implemented. Nevertheless, two considerations are important when using the inverse square root method. First, by using the smallest statistical path coefficient as point of reference, the method can be misleading as researchers will not expect marginal effects to be significant. For example, assuming a 5% significance level and a minimum path coefficient of 0.01 would require a sample size of 61,802! Hence, researchers should choose a higher path coefficient as input, either depending on whether the model produces overall weak or strong effects or depending on the smallest relevant (to be detected) effect. Second, by relying on model estimates, the inverse square root method follows a retrospective approach. Such an assessment can be used as a basis for additional data collection or adjustments in the model. If possible, however, researchers should follow a prospective approach by trying to derive the minimum expected effect size prior to data analysis. To do so, researchers can draw on prior research

Chapter 1  ■  An Introduction to Structural Equation Modeling  

27

involving a comparable conceptual background or models with similar complexity or, preferably, the results of a pilot study, which tested the hypothesized model using a smaller sample of respondents from the same population. For example, if the pilot study produced a minimum path coefficient of 0.15, this value should be chosen as input for computing the required sample size for the main study. In most cases, however, researchers have only limited information regarding the expected effect sizes, even if a pilot study has been conducted. Hence, it is reasonable to consider ranges of effect sizes rather than specific values to determine the sample size required for a specific study. Exhibit 1.7 shows the minimum sample size requirement for different significance levels and varying ranges of pmin. In deriving the minimum sample size, it is reasonable to consider the upper boundary of the effect range as reference as the inverse square root method is rather conservative. For example, when assuming that the minimum path coefficient expected to be significant is between 0.11 and 0.20, one would need approximately 155 observations to render the corresponding effect significant at 5%. Similarly, if the minimum path coefficient expected to be significant is between 0.31 and 0.40, then the recommended sample size would be 39. EXHIBIT 1.7  ■  M  inimum Sample Sizes for Different Levels of Minimum Path Coefficients (pmin) and Significance Levels Significance level 1%

5%

10%

0.05–0.1

1,004

619

451

0.11–0.2

251

155

113

0.21–0.3

112

69

51

0.31–0.4

63

39

29

0.41–0.5

41

25

19

pmin

Missing Value Treatment As with other statistical analyses, missing values should be dealt with when using PLS-SEM. For reasonable limits (i.e., less than 5% values missing per indicator), missing value treatment options such as mean replacement, EM (expectation–maximization algorithm), and nearest neighbor (e.g., Hair, Black, Babin, & Anderson, 2019) generally result in only slightly different PLS-SEM estimates. Alternatively, researchers can opt for deleting all observations with missing values, which decreases variation in the data and may introduce biases when certain groups of observations have been deleted systematically.

28   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Nonnormal Data The use of PLS-SEM has two other key advantages related to data characteristics (i.e., distribution and scales). In situations where it is difficult or impossible to meet the stricter requirements of more traditional multivariate techniques (e.g., normal data distribution), PLS-SEM is the preferred method. PLS-SEM’s greater flexibility is described by the label “soft modeling,” coined by Wold (1982), who developed the method. It should be noted, however, that “soft” is attributed only to the distributional assumptions and not to the concepts, models, or estimation techniques (Lohmöller, 1989). PLS-SEM’s statistical properties provide very robust model estimations with data that have normal as well as extremely nonnormal distributional properties (Hair, Hult, Ringle, Sarstedt, & Thiele, 2017; Reinartz, Haenlein, & Henseler, 2009). It must be remembered, however, that influential outliers and collinearity do influence the ordinary least squares regressions in PLS-SEM, and researchers should evaluate the data and results for these issues (Hair, Black, Babin, & Anderson, 2019). Scales of Measurement The PLS-SEM algorithm generally requires variables to be measured on a metric scale (ratio or interval measurement) for the measurement model indicators. But the method also works well with ordinal scales with equidistant data points (i.e., quasi-metric scales; Sarstedt & Mooi, 2019; Chapter 3.6) and with binary coded data. The use of binary coded data is often a means of including categorical control variables or moderators in PLS-SEM models. In short, binary indicators can be included in PLS-SEM models but require special attention. For example, using PLS-SEM in discrete choice experiments where the aim is to predict a binary dependent variable requires specific designs and estimation routines (Hair, Ringle et al., 2019). Secondary Data Secondary data are data that have already been gathered, often for a different research purpose and some time ago (Sarstedt & Mooi, 2019; Chapter 3.2.1). Secondary data are increasingly available to explore real-world phenomena. Research based on secondary data typically focuses on a different objective than in a standard CB-SEM analysis, which is strictly confirmatory in nature. More precisely, secondary data are mainly used in exploratory research to propose causal relationships in situations that have little clearly defined theory (Hair, Risher, Sarstedt, & Ringle, 2019; Hair, Hollingsworth, Randolph, & Chong, 2017). Such settings require researchers to place greater emphasis on examining all possible relationships rather than achieving model fit (Nitzl, 2016). By its nature, this process creates large, complex models that cannot be analyzed with the CB-SEM method. In contrast, due to its less stringent requirements on the data, PLS-SEM offers the flexibility needed for the interplay between theory and data (Nitzl, 2016). Or, as Wold (1982, p. 29) notes, “soft modeling is primarily designed for

Chapter 1  ■  An Introduction to Structural Equation Modeling  

29

research contexts that are simultaneously data-rich and theory-skeletal.” Furthermore, the increasing popularity of secondary data analysis (e.g., by using data that stem from company databases, social media, customer tracking, national statistical bureaus, or publicly available survey data) shifts the research focus from strictly confirmatory to predictive and causal–predictive modeling. Such research settings are a perfect fit for the prediction-oriented PLS-SEM approach (also see Gefen, Rigdon, & Straub, 2011). PLS-SEM also proves valuable for analyzing secondary data from a measurement theory perspective. First, unlike survey measures, which are usually crafted to confirm a well-developed theory, measures used in secondary data sources are typically not created and refined over time for confirmatory analyses. Thus, achieving model fit is very unlikely with secondary data measures in most research situations when using CB-SEM. Second, researchers who use secondary data do not have the opportunity to revise or refine the measurement model to achieve fit. Third, a major advantage of PLS-SEM when using secondary data is that it permits the unrestricted use of single-item and formative measures. This is extremely valuable for research involving secondary data, because many measures included in corporate databases are artifacts, such as financial ratios and other firm-fixed factors (Henseler, 2017b). Such artifacts typically are reported in the form of formative indices whose estimation dictates the use of PLS-SEM. Exhibit 1.8 summarizes key considerations related to data characteristics.

EXHIBIT 1.8  ■  Data Considerations When Applying PLS-SEM •• The 10 times rule is not a reliable indication of sample size requirements in PLS-SEM. Statistical power analyses provide a more reliable minimum sample size estimate. Researchers can also draw on the inverse square root method as a more conservative way of assessing minimum sample size requirements. •• When the construct measures meet recommended guidelines in terms of reliability and validity, results from CB-SEM and PLS-SEM are generally very similar. •• PLS-SEM can handle extremely nonnormal data (e.g., data with high levels of skewness). •• Due to its flexibility in handling different data and measurement types, PLS-SEM is the method of choice when analyzing secondary data. •• Most missing value treatment procedures (e.g., mean replacement, pairwise deletion, EM, and nearest neighbor) can be used for reasonable levels of missing data (less than 5% missing per indicator) with limited effect on the analysis results. •• PLS-SEM works with metric, quasi-metric, and categorical (i.e., dummy-coded) scaled data, although there are certain limitations. Processing of data from discrete choice experiments requires specific designs and estimation routines.

30   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Model Characteristics PLS-SEM is very flexible in its modeling properties. In its basic form, the PLS-SEM algorithm requires all models to be without circular relationships or loops of relationships between the latent variables in the structural model. Although causal loops are sometimes specified in business research, this characteristic does not limit the applicability of PLS-SEM as Lohmöller’s (1989) extensions of the basic PLS-SEM algorithm allow for handling such model types. Other model specification requirements that constrain the use of CB-SEM, such as distribution assumptions, are generally not relevant with PLS-SEM. Measurement model difficulties are one of the major obstacles to obtaining a solution with CB-SEM. For instance, estimation of complex models with many latent variables and/or indicators is often impossible with CB-SEM. In contrast, PLS-SEM can be used in such situations since it is not constrained by identification and other technical issues. Consideration of reflective and formative measurement models is a key issue in the application of SEM (Bollen & Diamantopoulos, 2017). PLS-SEM can easily handle both formative and reflective measurement models and is considered the primary approach when the hypothesized model incorporates formative measures. CB-SEM can accommodate formative indicators, but to ensure model identification, they must follow distinct specification rules (Diamantopoulos & Riefler, 2011). In fact, the requirements often prevent running the analysis as originally planned. In contrast, PLS-SEM does not have such requirements and handles formative measurement models without any limitation. This also applies to model settings in which endogenous constructs are measured formatively. The applicability of CB-SEM to such model settings has been subject to considerable debate (Cadogan & Lee, 2013; Rigdon, 2014a), but due to PLS-SEM’s multistage estimation process (Chapter 3), which separates measurement from structural model estimation, the inclusion of formatively measured endogenous constructs is not an issue in PLS-SEM (Rigdon et al., 2014). The only problematic issue is when a high level of collinearity exists between the indicator variables of a formative measurement model. Different from CB-SEM, PLS-SEM facilitates easy specification of interaction terms to map moderation effects in a path model. This makes PLS-SEM the method of choice in simple moderation models and more complex conditional process models, which combine moderation and mediation effects (Sarstedt, Hair et al., 2020). Similarly, higher-order constructs, which allow specifying a construct simultaneously on different levels of abstraction (Sarstedt et al., 2019) can readily be implemented in PLS-SEM. Finally, PLS-SEM is capable of estimating very complex models. For example, if theoretical or conceptual assumptions support large models and sufficient data are available (i.e., meeting minimum sample size requirements), PLS-SEM can handle models of almost any size, including those with dozens of constructs and hundreds of indicator variables. As noted by Wold (1985), PLS-SEM is virtually

Chapter 1  ■  An Introduction to Structural Equation Modeling  

31

without competition when path models with latent variables are complex in their structural relationships (Chapter 3). Exhibit 1.9 summarizes rules of thumb for PLS-SEM model characteristics. EXHIBIT 1.9  ■  Model Considerations When Choosing PLS-SEM •• PLS-SEM offers much flexibility in handling different measurement model setups. For example, PLS-SEM can handle reflective and formative measurement models as well as single-item measures without additional requirements or constraints. •• The method allows for the specification of advanced model elements such as interaction terms and higher-order constructs. •• Model complexity is generally not an issue for PLS-SEM. As long as appropriate data meet minimum sample size requirements, the complexity of the structural model is virtually unrestricted.

GUIDELINES FOR CHOOSING BETWEEN PLS-SEM AND CB-SEM To answer the question of when to use PLS-SEM versus CB-SEM, researchers should focus on the characteristics and objectives that distinguish the two methods (Hair, Sarstedt, Ringle, & Mena, 2012). Broadly speaking, with its strong focus on model fit and in light of its extensive data requirements, CB-SEM is particularly suitable for testing a theory in the confinement of a concise theoretical model. However, if the primary research objective is prediction and explanation of target constructs (Rigdon, 2012), PLS-SEM should be given preference (Hair, Hollingsworth, Randolph, & Chong, 2017; Hair, Sarstedt, & Ringle, 2019). Summarizing the previous discussions and drawing on Hair, Risher, Sarstedt, and Ringle (2019), Exhibit 1.10 displays the rules of thumb that can be applied when deciding whether to use CB-SEM or PLS-SEM. As can be seen, PLS-SEM is not recommended as a universal alternative to CB-SEM. Both methods differ from a statistical point of view, are designed to achieve different objectives, and rely on different philosophies of measurement. Neither of the techniques is generally superior to the other, and neither of them is appropriate for all situations (Petter, 2018). In general, the strengths of PLS-SEM are CB-SEM’s limitations and vice versa, although PLS-SEM is increasingly being applied for scale development and confirmation (Hair, Howard, & Nitzl, 2020). It is important that researchers understand the different applications each approach was developed for—and to use them accordingly. Researchers need to apply the SEM technique that best suits their research objective, data characteristics, and model setup (Roldán & Sánchez-Franco, 2012).

32   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 1.10  ■  R  ules of Thumb for Choosing Between PLS-SEM and CB-SEM Use PLS-SEM when •• the analysis is concerned with testing a theoretical framework from a prediction perspective; •• the structural model is complex and includes many constructs, indicators, and/or model relationships; •• the research objective is to better understand increasing complexity by exploring theoretical extensions of established theories (exploratory research for theory development); •• the path model includes one or more formatively measured constructs; •• the research consists of financial ratios or similar types of artifacts; •• the research is based on secondary data, which may lack a comprehensive substantiation on the grounds of measurement theory; •• a small population restricts the sample size (e.g., business-to-business research), but PLS-SEM also works very well with large sample sizes; •• distribution issues are a concern, such as lack of normality; or •• the research requires latent variable scores for follow-up analyses. Use CB-SEM when •• the goal is theory testing and confirmation; •• error terms require additional specification, such as the covariation; •• the structural model has circular relationships; or •• the research requires a global goodness-of-fit criterion. Source: Adapted from Hair, J. F., Risher, J. J., Sarstedt, M., & Ringle, C. M. (2019). When to use and how to report the results of PLS-SEM. European Business Review, 31(1), 2–24. Copyright © 2019 by Emerald Publishing. Reprinted with permission of the publisher (Emerald Publishing; https://www.emeraldgrouppublishing.com).

ORGANIZATION OF REMAINING CHAPTERS The remaining chapters provide more detailed information on PLS-SEM, including specific examples of how to use software to estimate simple and complex PLS path models. In doing so, the chapters follow a multistage procedure that should be used as a blueprint when conducting PLS-SEM analyses (Exhibit 1.11).

Chapter 1  ■  An Introduction to Structural Equation Modeling  

33

EXHIBIT 1.11  ■  A Systematic Procedure for Applying PLS-SEM Stage 1

Specifying the Structural Model

Chapter 2

Stage 2

Specifying the Measurement Models

Chapter 2

Stage 3

Collecting and Examining the Data

Chapter 2

Stage 4

PLS Path Model Estimation

Chapter 3

Stage 5a

Assessing PLS-SEM Results of the Reflective Measurement Models

Chapter 4

Stage 5b

Assessing PLS-SEM Results of the Formative Measurement Models

Chapter 5

Stage 6

Assessing PLS-SEM Results of the Structural Model

Chapter 6

Stage 7

Advanced PLS-SEM Analyses

Chapters 7 and 8

Stage 8

Interpretation of Results and Drawing Conclusions

Chapters 6, 7, and 8

Specifically, the process starts with the specification of structural and measurement models, followed by the examination of data (Chapter 2). Next, we discuss the PLS-SEM algorithm and provide an overview of important considerations when running the analyses (Chapter 3). On the basis of the results of the computation, researchers then have to evaluate the results. To do so, researchers must know how to assess both reflective and formative measurement models (Chapters 4 and 5). When the data for the measures are considered reliable and valid (based on established criteria), researchers can then evaluate the structural

34   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

model (Chapter 6). Chapter 7 covers the handling of mediating and moderating effects whose analysis has become standard in PLS-SEM research. On the basis of the results of Chapters 6 and 7, researchers interpret their findings and draw their final conclusions. Finally, Chapter 8 offers a brief overview of advanced techniques.

Summary •

Understand the meaning of structural equation modeling (SEM) and its relationship to multivariate data analysis. SEM is a second-generation multivariate data analysis method, which facilitates analyzing the relationships among constructs, each measured by one or more indicator variables. The primary advantage of SEM is its ability to measure complex model relationships while accounting for measurement error, inherent in the indicators. There are two types of SEM methods—CB-SEM and PLS-SEM. The two method types differ in the way they estimate the model parameters and their assumptions regarding the nature of measurement. Whereas CB-SEM considers the constructs as common factors, PLS-SEM considers the constructs as composites based on total variance, linearly formed by sets of indicator variables.



Describe the basic considerations in applying multivariate data analysis. Several considerations are necessary when applying multivariate analysis, including the following five: (1) composite variables, (2) measurement, (3) measurement scales, (4) coding, and (5) data distributions. A composite variable (also called a variate) is a linear combination of several indicators that are chosen based on the research problem at hand. Measurement is the process of assigning numbers to a variable based on a set of rules. Multivariate measurement involves using several variables to indirectly measure a concept to improve measurement accuracy. The anticipated improved accuracy is based on the assumption that using several variables (indicators) to measure a single concept is more likely to represent all the different aspects of the concept and thereby result in a more valid measurement of the concept. The ability to identify measurement error using multivariate measurement also helps researchers obtain more accurate measurements. Measurement error is the difference between the true value of a variable and the value obtained by a measurement. A measurement scale is a tool with a predetermined number of closed-ended responses that can be used to obtain an answer to a question. There are four types of measurement scales: nominal, ordinal, interval, and ratio. When researchers collect quantitative data using scales, the answers to the questions can be shown as a distribution across the available (predefined) response

Chapter 1  ■  An Introduction to Structural Equation Modeling  

categories. The type of distribution must always be considered when working with SEM. •

Comprehend the basic concepts of partial least squares structural equation modeling (PLS-SEM). Path models are diagrams used to visually display the hypotheses and variable relationships that are examined when structural equation modeling is applied. Four basic elements must be understood when developing path models: (1) constructs, (2) measured variables, (3) relationships, and (4) error terms. Constructs (or latent variables) measure theoretical concepts that are not directly observable. They are represented in path models as circles or ovals. Measured variables are directly measured observations (raw data), generally referred to as either indicators or manifest variables, and are represented in path models as rectangles. Relationships represent hypotheses in path models and are shown as arrows that are singleheaded, indicating a predictive-causal relationship between the constructs. These relationships are derived from structural theory and logic. Depending on their role in the model, constructs are either exogenous or endogenous. Error terms represent the unexplained variance when path models are estimated and are present for endogenous constructs and reflectively measured indicators. Exogenous constructs and formative indicators do not have error terms. Measurement theory specifies how the constructs (latent variables) are measured. Latent variables can be specified as either reflective or formative.



Explain the differences between covariance-based structural equation modeling (CB-SEM) and PLS-SEM, and when to use each. Compared to CB-SEM, PLS-SEM emphasizes prediction while simultaneously relaxing the demands regarding the data and specification of relationships. PLS-SEM aims at maximizing the endogenous latent variables’ explained variance by estimating partial model relationships in an iterative sequence of ordinary least squares regressions. In contrast, CB-SEM estimates model parameters such that the discrepancy between the estimated and sample covariance matrices is minimized. Instead of following a common factor model logic as CB-SEM does, PLS-SEM calculates composites of indicators that serve as proxies for the concepts under research. The method is not constrained by identification issues, even if the model becomes complex—a situation that typically restricts CB-SEM use—and does not require accounting for most distributional assumptions. Moreover, PLS-SEM can better handle formative measurement models and has advantages when sample sizes are relatively small and when analyzing secondary data. Researchers should consider the two SEM approaches as complementary and apply the SEM technique that best suits their research objective, data characteristics, and model setup.

35

36   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Review Questions 1. What is multivariate analysis? 2. Describe the difference between first- and second-generation multivariate methods. 3. What is structural equation modeling? 4. What is the key difference in the common factor model and the composite model? 5. What is the value of structural equation modeling in understanding relationships between variables?

Critical Thinking Questions 1. When would SEM methods be more advantageous than first-generation techniques, such as multivariate regression in understanding relationships between variables? 2. What are the most important considerations in deciding whether to use CB-SEM or PLS-SEM? 3. Under what circumstances is PLS-SEM the preferred method over CB-SEM? 4. Why is an understanding of theory important when deciding whether to use PLS-SEM or CB-SEM? 5. Why should social science researchers consider using SEM instead of multiple regression? 6. Why is PLS-SEM’s prediction focus a major advantage of the method?

Key Terms 10 times rule  25 Categorical scale  8 Coding 10 Common factor-based SEM  15 Composite-based SEM  16 Composite scores  8

Composite variable  6 Confirmatory 3 Constructs 7 Covariance-based structural equation modeling (CB-SEM) 4

Chapter 1  ■  An Introduction to Structural Equation Modeling  

Endogenous latent variables  13 Equidistance 9 Error terms  13 Exogenous latent variables  13 Explaining and predicting (EP) theories 22 Exploratory 3 Factor (score) indeterminacy  17 First-generation techniques  3 Formative measurement model  14 Indicators 7 Inner model  13 Interval scale  9 Inverse square root method  25 Items 7 Latent variables  7 Manifest variables  7 Measurement 7 Measurement error  8 Measurement model  13 Measurement scale  8 Measurement theory  13 Metric scale  28 Metrological uncertainty  17 Minimum sample size requirements 24 Missing value treatment  27

37

Multivariate analysis  2 Nominal scale  8 Ordinal scale  9 Outer model  13 Partial least squares structural equation modeling (PLS-SEM) 2 Path model  12 PLS path modeling  4 PLS regression  18 PLS-SEM bias  23 R² value  18 Ratio scale  9 Reflective measurement model 14 Secondary data  28 Second-generation techniques  4 Single-item constructs  8 Statistical power  22 Structural equation modeling (SEM) 4 Structural model  13 Structural theory  13 Sum scores  16 Theory 13 Variance-based SEM  18 Variate 6

Suggested Readings Chin, W. W. (1998). The partial least squares approach to structural equation modeling. In G. A. Marcoulides (Ed.), Modern methods for business research (pp. 295–358). Mahwah, NJ: Erlbaum. Chin, W. W., Cheah, J.-H., Liu, Y., Ting, H., Lim, X.-J., & Cham, T. H. (2020). Demystifying the role of causal-predictive modeling using partial least squares structural equation modeling in information systems research. Industrial Management & Data Systems, 120(12), 2161–2209. Cook, R. D., & Forzani, L. (2020). Fundamentals of path analysis in the social sciences. Working Paper. https://arxiv.org/pdf/2011.06436.pdf (Continued)

38   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued) Dijkstra, T. K. (2014). PLS’ Janus face—response to Professor Rigdon’s “Rethinking partial least squares modeling: In praise of simple methods.” Long Range Planning, 47(3), 146–153. Gefen, D., Rigdon, E. E., & Straub, D. W. (2011). Editor’s comment: An update and extension to SEM guidelines for administrative and social science research. MIS Quarterly, 35(2), iii–xiv. Haenlein, M., & Kaplan, A. M. (2004). A beginner’s guide to partial least squares analysis. Understanding Statistics, 3(4), 283–297. Hair, J. F. (2020). Next generation prediction metrics for composite-based PLS-SEM. Industrial Management & Data Systems, 121(1), 5–11. Hair, J. F., & Sarstedt, M. (2019). Composites vs. factors: Implications for choosing the right SEM method. Project Management Journal, 50(6), 1–6. Hair, J. F., & Sarstedt, M. (2021). Explanation plus prediction—The logical focus of project management research. Project Management Journal, forthcoming. Hair, J. F., Sarstedt, M., & Ringle, C. M. (2019). Rethinking some of the rethinking of partial least squares. European Journal of Marketing, 53(4), 566–584. Jöreskog, K. G., & Wold, H. (1982). The ML and PLS techniques for modeling with latent variables: Historical and comparative aspects. In K. G. Jöreskog & H. Wold (Eds.), Systems under indirect observation, Part I (pp. 263–270). Amsterdam: North-Holland. Khan, G., Sarstedt, M., Shiau, W.-L., Hair, J. F., Ringle, C. M., & Fritze, M. (2019). Methodological research on partial least squares structural equation modeling (PLS-SEM): A social network analysis. Internet Research, 29(3), 407–429. Kock, N., & Hadaya, P. (2018). Minimum sample size estimation in PLS-SEM: The inverse square root and gamma-exponential methods. Information Systems Journal, 28(1) 227–261. Lohmöller, J. B. (1989). Latent variable path modeling with partial least squares. Heidelberg: Physica. Marcoulides, G. A., & Chin, W. W. (2013). You write but others read: Common methodological misunderstandings in PLS and related methods. In H. Abdi, W. W. Chin, V. Esposito Vinzi, G. Russolillo, & L. Trinchera (Eds.), New perspectives in partial least squares and related methods (pp. 31–64). New York, NY: Springer. Mateos-Aparicio, G. (2011). Partial least squares (PLS) methods: Origins, evolution, and application to social sciences. Communications in Statistics: Theory and Methods, 40(13), 2305–2317. Petter, S. (2018). “Haters gonna hate”: PLS and Information Systems Research. ACM SIGMIS Database: The DATABASE for Advances in Information Systems, 49(2), 10–13.

Chapter 1  ■  An Introduction to Structural Equation Modeling  

Rigdon, E. E. (2012). Rethinking partial least squares path modeling: In praise of simple methods. Long Range Planning, 45(5–6), 341–358. Rigdon, E. E., Sarstedt, M., & Ringle, C. M. (2017). On comparing results from CB-SEM and PLS-SEM: Five perspectives and five recommendations. Marketing ZFP, 39(3), 4–16. Sarstedt, M., Hair, J. F., Ringle, C. M., Thiele, K. O., & Gudergan, S. P. (2016). Estimation issues with PLS and CBSEM: Where the bias lies! Journal of Business Research, 69(10), 3998–4010. Wold, H. (1985). Partial least squares. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of statistical sciences (pp. 581–591). New York, NY: Wiley.

Visit the companion site for this book at https://www.pls-sem.net/.

39

2 SPECIFYING THE PATH MODEL AND EXAMINING DATA LEARNING OUTCOMES 1. Understand the basic concepts of structural model specification, including mediation, moderation, and the use of control variables. 2. Explain the differences between reflective and formative measurement models and specify the appropriate measurement model. 3. Comprehend that the selection of the mode of measurement model and the indicators must be based on theoretical reasoning before data collection. 4. Explain the difference between multi-item and single-item measures and assess when to use each measurement type. 5. Understand the nature of higher-order constructs. 6. Describe the data collection and examination considerations necessary to apply PLS-SEM. 7. Learn how to develop a PLS path model using the SmartPLS software.

40

Chapter 2  ■  Specifying the Path Model and Examining Data  

41

CHAPTER PREVIEW This chapter introduces the basic concepts of structural and measurement model specification when PLS-SEM is used. The concepts are associated with completing the first three stages in the application of PLS-SEM, as described in Chapter 1. To begin with, Stage 1 is specifying the structural model. Next, Stage 2 is selecting and specifying the measurement models. Stage 3 summarizes the major guidelines for data collection when the application of PLS-SEM is anticipated, as well as the need to examine your data after they have been collected to ensure the results from applying PLS-SEM are valid and reliable. An understanding of these three topics will prepare you for Stage 4, estimating the model, which is the focus of Chapter 3.

STAGE 1: SPECIFYING THE STRUCTURAL MODEL In the initial stages of a research project that involves the application of SEM, an important first step is to prepare a diagram that illustrates the research hypotheses and visually displays the variable relationships that will be examined. This diagram is often referred to as a path model. Recall that a path model is a diagram that connects indicators and constructs based on theory and logic to visually display the hypotheses that will be tested (Chapter 1). Preparing a path model early in the research process enables researchers to organize their thoughts and visually consider the relationships between the variables of interest. Path models also are an efficient means of sharing ideas between researchers working on or reviewing a research project. Path models are made up of two elements: (1) the structural model (also called the inner model in PLS-SEM), which describes the relationships between the latent variables, and (2) the measurement model (also called outer model in PLS-SEM), which describes the relationships between the latent variable and its measures (i.e., its indicators). We discuss structural models first, which are developed in Stage 1. In the next section, we explain Stage 2, measurement models. When a structural model is being developed, two primary issues need to be considered: the sequence of the constructs and the relationships between them. Both issues are critical to the concept of modeling because they represent the hypotheses and their relationship to the theory being tested. The sequence of the constructs in a structural model is based on theory, logic, or practical experiences observed by the researcher. The sequence is displayed from left to right, with independent (predictor) constructs on the left and

42   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

dependent (outcome) variables on the right-hand side. That is, constructs on the left are assumed to precede and predict constructs on the right. Constructs that act only as independent variables are generally referred to as exogenous latent variables and are on the very left side of the structural model. Exogenous latent variables only have arrows that point out of them and never have arrows from other latent variables pointing into them. Constructs considered dependent in a structural model (i.e., those that have an arrow pointing into them from other latent variables) are called endogenous latent variables and are on the right side of the structural model. Constructs that operate as both independent and dependent variables in a structural model also are considered endogenous and appear in the middle of the diagram. The structural model in Exhibit 2.1 illustrates the three types of constructs and the relationships among them. The reputation construct on the far left is an exogenous (i.e., independent) latent variable. It is modeled as predicting the satisfaction construct. The satisfaction construct is an endogenous latent variable that has a dual relationship as both independent and dependent. It is a dependent construct because it is predicted by reputation. But it is also an independent construct because it predicts loyalty. The loyalty construct on the right end is an endogenous (i.e., dependent) latent variable predicted by satisfaction.

EXHIBIT 2.1  ■  Example of Path Model and Types of Constructs

Reputation

Satisfaction

Loyalty

Determining the sequence of the constructs is seldom an easy task because contradictory theoretical perspectives can lead to different sequencing of latent variables. For example, some researchers assume that customer satisfaction precedes and predicts corporate reputation (e.g., Walsh, Mitchell, Jackson, & Beatty, 2009), while others argue that corporate reputation predicts customer satisfaction (Eberl, 2010; Sarstedt, Wilczynski, & Melewar, 2013). Theory and logic should always determine the sequence of constructs in a structural model, but when the literature is inconsistent or unclear, researchers must use their best judgment to determine the sequence. Acknowledging that there is not one unique model that characterizes a phenomenon well, researchers can also establish and empirically compare

Chapter 2  ■  Specifying the Path Model and Examining Data  

43

theoretically justified alternative models (Burnham & Anderson, 2002). The models selected for comparison should be motivated by theory from relevant fields, in line with PLS-SEM’s “causal predictive” nature (Jöreskog & Wold, 1982, p. 270). Because PLS path models focus on providing theoretical explanations, considering purely empirically motivated models would be akin to “snooping” and is not recommended for theoretical research that focuses on both explanation and prediction (Gregor, 2006). Establishing alternative models requires leveraging the existing literature to provide valid theoretical rationale for all the models being considered. In particular, you should be able to (1) describe the theoretical commonalities among the proposed alternative models (i.e., whether certain proposed effects are common across models), (2) contrast the models to highlight the differences in theoretical mechanisms being captured (such differences may manifest as additional/ different paths or antecedents), and (3) explain why the commonalities and differences are important to consider in terms of the effect on the target variable for the population under study. We discuss model comparisons in the context of the structural model evaluation in Chapter 6. Sharma, Sarstedt, Shmueli, Kim, and Thiele (2019) introduce a five-step procedure for model comparison and inference in PLS-SEM. These authors also discuss possible misconceptions related to model comparisons. Once the sequence of the proposed constructs has been decided, the relationships between them must be established by drawing arrows. The arrows are inserted with the arrowhead pointing to the right. This approach indicates the sequence and that the constructs on the left predict the constructs on the right side. The predictive relationships are sometimes referred to as causal links, if the structural theory supports a causal relationship. But researchers should be cautious in concluding causal links. In drawing arrows between the constructs, researchers face a trade-off between theoretical soundness (i.e., including those relationships that are strongly supported by theory) and model parsimony (i.e., using fewer relationships). The latter should be of crucial concern as the most nonrestrictive statement, “everything is predictive of everything else,” is also the most uninformative. As pointed out by Falk and Miller (1992, p. 24), “a parsimonious approach to theoretical specification is far more powerful than the broad application of a shotgun.” In most instances, researchers examine linear independent–dependent relationships between two or more constructs in the path model. Theory often suggests, however, that model relationships are more complex and involve mediation or moderation relationships. In addition, researchers commonly specify control variables that account for some of the variation in the endogenous constructs. In the following section, we briefly introduce these different relationship types. In Chapter 7, we explain how they can be estimated and interpreted using PLS-SEM.

44   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Mediation A mediating effect is created when a third variable or construct intervenes between two other related constructs (Memon, Cheah, Ramayah, Ting, & Chuah, 2018; Nitzl, Roldán, & Cepeda Carrión, 2016), as shown in Exhibit 2.2. To understand how mediating effects work, let’s consider a path model in terms of direct and indirect effects. A direct effect is a relationship that links two constructs with a single arrow. An indirect effect is a relationship that involves a sequence of relationships with at least one intervening construct involved. Thus, an indirect effect is a sequence of two or more direct effects (compound path) that are represented visually by multiple arrows. This indirect effect is characterized as the mediating effect. In Exhibit 2.2, satisfaction is modeled as a possible mediator between reputation and loyalty.

EXHIBIT 2.2  ■  Example of a Mediating Effect

Satisfaction

Reputation

Loyalty

From a theoretical perspective, the most common application of mediation is to “explain” why a relationship between an exogenous and endogenous construct exists. For example, a researcher may observe a relationship between two constructs but not be sure why the relationship exists or if the observed relationship is the only relationship between the two constructs. In such a situation, a researcher might posit an explanation of the relationship in terms of an intervening variable that operates by receiving the “inputs” from an exogenous construct and translating them into an “output,” in the form of an endogenous construct. The role of the mediator variable then is to reveal the mechanism through which the independent constructs impact the dependent construct. Consider the example in Exhibit 2.2, in which we want to examine the effect of corporate reputation on customer loyalty. On the basis of theory and

Chapter 2  ■  Specifying the Path Model and Examining Data  

45

logic, we know that a relationship exists between reputation and loyalty, but we are unsure how the relationship actually works (Eberl & Schwaiger, 2005; Schwaiger, 2004). As researchers, we might want to explain how companies translate their reputation into higher loyalty among their customers. We may observe that sometimes a customer perceives a company as being highly reputable, but this perception does not translate into high levels of loyalty. In other situations, we observe that some customers with lower corporate reputation assessments are highly loyal. These observations are confusing and lead to the question as to whether there is some other process going on that translates corporate reputation into customer loyalty. In the diagram, the intervening process (mediating effect) is modeled via the construct satisfaction. If a respondent perceives a company to be highly reputable, this assessment may lead to higher satisfaction levels and ultimately to increased loyalty. In such a case, the relationship between reputation and loyalty may be explained by the reputation → loyalty sequence, or the reputation → satisfaction → loyalty sequence, or perhaps even by both sets of relationships (Exhibit 2.2). The reputation → loyalty sequence is an example of a direct relationship. In contrast, the reputation → satisfaction → loyalty sequence is an example of an indirect relationship. After empirically testing these relationships, the researcher would be able to explain how reputation is related to loyalty, as well as the role that satisfaction might play in mediating that relationship. Chapter 7 offers additional details on mediation and explains how to test mediating effects in PLS-SEM.

Moderation Moderation is another important statistical analysis concept. In statistical moderation, a third variable directly affects the relationship between the exogenous and endogenous latent variables but in a different way from mediation. Referred to as a moderator effect, this situation occurs when the moderator (a variable or construct) changes the strength or even the direction of a relationship between two constructs in the model (Becker, Sarstedt, & Ringle, 2018; Memon et al., 2019). The crucial distinction between moderation and mediation is that the moderator variable does not depend on the exogenous latent variable. For example, income has been shown to significantly affect the strength of the relationship between customer satisfaction and customer loyalty (Homburg & Giering, 2001). In that context, income serves as a moderator variable on the satisfaction → loyalty relationship, as shown in Exhibit 2.3. Specifically, the strength of the relationship (as measured by the path coefficient) between satisfaction and loyalty has been shown to be weaker for people with high income than for people with low income. For higher-income individuals, there may be

46   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

little or no relationship between satisfaction and loyalty. But for lower-income individuals, there often is a strong relationship between the two variables. As such, moderation may be understood as a way to account for heterogeneity in the theoretical model. Heterogeneity means that different types of effects can be expected for different groups of respondents. That is, instead of assuming that the relationship between customer satisfaction and customer loyalty is the same for all respondents, we acknowledge that this effect is different for low- and high-income individuals. In the example outlined in Exhibit 2.3, income may be measured on a continuous scale, for example, the annual income measured based on Euros or the U.S. dollar. But a moderator variable can also be measured categorically, for example, high income is > 50 thousand Euros a year, and low income is ≤ 50 thousand Euros a year. If this is the case, the variable frequently serves as a grouping variable that divides the data into subsamples. The same theoretical model is then estimated for each of the distinct subsamples. Since researchers are usually interested in comparing the models and learning about significant differences between the subsamples, the model estimates for the subsamples are usually compared by means of multigroup analysis (Matthews, 2017). Specifically, multigroup analysis enables the researcher to test for differences between identical models estimated for different groups of respondents. The general objective is to see if there are statistically significant differences between the group-specific path coefficients. For example, we might be interested in evaluating whether the effects between reputation, satisfaction, and loyalty shown in Exhibit 2.2 are significantly different for males compared with females (Exhibit 2.4). EXHIBIT 2.3  ■  Theoretical Model of a Continuous Moderating Effect

Income

Satisfaction

Loyalty

In Chapter 7, we discuss in greater detail how to use categorical and continuous variables for the moderator analysis. Chapter 8 offers a brief overview of multigroup analysis.

Chapter 2  ■  Specifying the Path Model and Examining Data  

47

EXHIBIT 2.4  ■  Example of a Multigroup Analysis Females Satisfaction

Reputation

Loyalty

Significant difference?

Males Satisfaction

Reputation

Loyalty

Control Variables When specifying theoretical models to be tested, researchers sometimes include control variables. The business disciplines of accounting, finance, international business, and management often include control variables in their research. Control variables are designed to measure the influence of independent variables that are not part of the primary theoretical model being examined. The control variables are used as a constant and unchanging standard of comparison, but they are not the primary interest of the researcher. Including control variables in a statistical model is most important when the control variable is significantly correlated with both the dependent variable and one or more of the other independent variables in the model. Control variables have been included in multiple regression models for many years, but with the increasing popularity of PLS-SEM and the underlying characteristics it has in common with regression, researchers are beginning to explore the usefulness of control variables in PLS-SEM.

48   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

By adding control variables to the hypothesized structural model, researchers hope to account for other explanatory factors (independent variables) that potentially influence the dependent variables (or constructs). For example, when estimating the impact of customer satisfaction on stock returns, researchers also need to account for the influence of several firm characteristics such as R&D intensity, marketing investments, and firm size (Raithel, Sarstedt, Scharf, & Schwaiger, 2012). Failure to account for these characteristics could lead to an overestimation of the effect of customer satisfaction on stock returns, potentially triggering a type I error (i.e., false positive). From a statistical perspective, adding control variables to the model entails that the hypothesized effects are estimated at constant levels of the control variables. If the hypothesized relationships remain largely constant, researchers can rule out alternative explanations related to the control variables. As such, control variables help strengthening the causal inference of the effects. In addition, adding control variables improves the precision of the model estimates as they explain the statistical noise in the endogenous construct. This particularly holds when the control variables are lowly correlated with the predictor constructs of the endogenous construct (Klarmann & Feurer, 2018). To add control variables into a PLS path model, researchers need to establish a separate exogenous construct for each control variable to be considered and link each new construct to the endogenous latent variable under consideration. For example, suppose that a researcher wants to control for the impact of the respondent’s age on loyalty when estimating the mediation model shown in Exhibit 2.2. To do so, the researcher needs to add a new construct into the model, measured with the single age item, and link this construct to the loyalty construct. By adding the age measure as a control, the effects of reputation on loyalty and customer satisfaction on loyalty will decrease, provided that age has an impact on the endogenous construct. In some situations, researchers wish to control for the impact of categorical variables such as industry type. If the categorial variable has only two categories such as gender, one uses a binary (dummy) variable and includes it as a single-item construct in the PLS path model. In this case, zero becomes the reference category (e.g., female customers) and the relationship between control variable and endogenous construct shows the effect of switching from the reference category to the other category (e.g., male customers). When the variable has more than two categories, the categorical variable needs to be recoded into a series of binary (dummy) variables. Specifically, when the control variable has k categories, researchers need to create k-1 binary variables. The category that is left out is referred to as the reference category. To identify the reference category, all binary variables take the value zero. The values of the dummy variables other than zero (typically one) then denote the deviation from this reference category. When an observation falls into the reference category, which is typically the first category, all binary (dummy) variables are zero. When an observation falls into the second category, the first binary (dummy) variable is

Chapter 2  ■  Specifying the Path Model and Examining Data  

49

one, all others are zero, and so on. The k-1 binary variables need to be included as measures of a single construct (Henseler, Hubona, & Ray, 2016) using a formative measurement model specification (see the next section for more details on formative measurement models). Exhibit 2.5 shows an example of a categorical control variable for five industries, whereby the first industry serves as the reference category, which is not included as a binary (dummy) variable. As a result, the control variable become a composite that is formed by all binary (dummy) coded category variables except the reference category. In the model shown in Exhibit 2.5, we control for the impact of Industry on the relationships of Reputation and Satisfaction on Loyalty. Industry has the following five categories: 1 = Computer software, 2 = Internet retail, 3 = Internet service providers, 4 = Personal computers, and 5 = Video streaming services. We use the first industry (i.e., Computer software) as the reference category. Hence it is not included as a binary (dummy) indicator variable for the Industry control construct. Researchers are typically only interested in controlling for the impact of the control variables, rather than explicitly hypothesizing and testing their impact. As a consequence, the path coefficients quantifying the effect of the control variables on the endogenous construct and their significances are not interpreted. However, when including control variables, researchers need to offer compelling

EXHIBIT 2.5  ■  C  ategorical Control Variable With Multiple Categories in PLS-SEM

Satisfaction

Reputation

Loyalty

Internet Retail Internet Service Providers Personal Computers

Industry

Video Streaming Services Note: This exhibit does not show the measurement models and indicators of the constructs Reputation, Satisfaction, and Loyalty. The control variable Industry includes dummy-coded indicator variables of industries 2 to 5, where the first industry (i.e., Computer Software) represents the reference category.

50   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

theoretical arguments as to why these variables are important rather than following a kitchen sink approach, which considers all potential control variables available (Spector & Brannick, 2011). Berneth and Aguinis (2016) offer best-practice recommendations that can be followed to make decisions on the appropriateness of including a specific control variable within a particular theoretical framework, research domain, and empirical study.

STAGE 2: SPECIFYING THE MEASUREMENT MODELS The structural model describes the relationships between latent variables (constructs). In contrast, the measurement models represent the relationships between constructs and their corresponding indicator variables (Sarstedt, Ringle, & Hair, 2017a). The basis for determining these relationships is measurement theory. A sound measurement theory is a necessary condition to obtain useful results from PLS-SEM. Hypothesis tests involving the structural relationships among constructs will be only as reliable or valid as the measurement models are explaining how these constructs are measured. Researchers typically have several established measurement approaches to choose from, each a slight variant from the others. In fact, almost all social science researchers today use established measurement approaches published in prior research studies or scale handbooks (e.g., Bearden, Netemeyer, & Haws, 2011; Bruner, 2019; Zarantonella & Pauwels-Delassus, 2015) that performed well (Ramirez, David, & Brusco, 2013). In some situations, however, the researcher is faced with the lack of an established measurement approach and must develop a new set of measures (or substantially modify an existing approach). A description of the general process for developing indicators to measure a construct can be long and detailed. Hair, Black, Babin, and Anderson (2019) describe the essentials of this process. Likewise, Diamantopoulos and Winklhofer (2001), DeVellis (2017), and MacKenzie, Podsakoff, and Podsakoff (2011) offer thorough explications of different approaches to measurement development. In each case, decisions regarding how the researcher selects the indicators to measure a particular construct provide a foundation for the remaining analysis. The path model shown in Exhibit 2.6 shows an excerpt of the path model we use as an example throughout the book. The model has two exogenous constructs—corporate social responsibility (CSOR) and attractiveness (ATTR)— and one endogenous construct, which is competence (COMP). Each of these constructs is measured by means of multiple indicators. For instance, the endogenous construct COMP has three measured indicator variables, comp_1, comp_2, and comp_3. Using a scale from 1 to 7 (fully disagree to fully agree), respondents had to evaluate the following statements: “[The company] is a top competitor in its market,” “As far as I know, [the company] is recognized worldwide,” and

Chapter 2  ■  Specifying the Path Model and Examining Data  

51

EXHIBIT 2.6  ■  Example of a Path Model With Three Constructs csor_1 csor_2 csor_3

CSOR

csor_4

comp_1

csor_5

COMP

comp_3

attr_1 attr_2

comp_2

ATTR

attr_3

“I believe that [the company] performs at a premium level.” The answers to these three inquiries represent the measures for this construct. The construct itself is measured indirectly by these three indicator variables and, for that reason, is referred to as a latent variable. The other two constructs in the model, CSOR and ATTR, can be described in a similar manner. That is, the two exogenous constructs are measured by indicators that are each directly measured by responses to specific questions. Note that the relationship between the indicators and the corresponding construct is different for COMP compared with CSOR and ATTR. When you examine the COMP construct, the direction of the arrows goes from the construct to the indicators. This type of measurement model is referred to as reflective. When you examine the CSOR and ATTR constructs, the direction of the arrows is from the measured indicator variables to the constructs. This type of measurement model is called formative. As discussed in Chapter 1, an important characteristic of PLS-SEM is that the technique readily incorporates both reflective and formative measures. Likewise, PLS-SEM can easily be used when constructs are measured with only a single item (rather than multiple items). Both of these measurement issues are discussed in the following sections.

Reflective and Formative Measurement Models When developing constructs, researchers must consider two broad types of measurement specification: reflective and formative measurement models. The reflective measurement model has a long tradition in the social sciences and is directly based on classical test theory. According to this theory, measures represent the effects (or manifestations) of an underlying construct. Therefore, causality is from the construct to its measures (COMP in Exhibit 2.6). Reflective

52   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

indicators (sometimes referred to as effect indicators in the psychometric literature) can be viewed as a representative sample of all the possible items available within the conceptual domain of the construct (Nunnally & Bernstein, 1994). Therefore, since a reflective measure dictates that all indicator items are “caused” by the same construct (i.e., they stem from the same domain), indicators associated with a particular construct should be highly correlated with each other. In addition, individual items should be interchangeable, and any single item can generally be left out without changing the meaning of the construct, as long as the construct has sufficient reliability. The fact that the relationship goes from the construct to its measures implies that if the evaluation of the latent trait changes (e.g., because of a change in the standard of comparison), all indicators will change simultaneously. A set of reflective measures is commonly called a scale. In contrast, formative measurement models are based on the assumption that the indicators form the construct by means of linear combinations. Therefore, researchers typically refer to this type of measurement model as being a formative index. An important characteristic of formative indicators is that they are not interchangeable, as is true with reflective indicators. Thus, each indicator for a formative construct captures a specific aspect of the construct’s domain. Taken jointly, the items ultimately determine the meaning of the construct, which implies that omitting an indicator potentially alters the nature of the construct. As a consequence, breadth of coverage of the construct domain is extremely important to ensure that the content of the focal construct is adequately captured (Diamantopoulos & Winklhofer, 2001). Researchers distinguish between two types of indicators in the context of formative measurement: composite and causal indicators. Composite indicators largely correspond to the above definition of formative measurement models in that they are combined in a linear way to form a variate (Chapter 1), which is also referred to as composite variable in the context of SEM (Bollen, 2011; Bollen & Bauldry, 2011). More precisely, the indicators fully form the composite variable (i.e., the composite variable’s R² value is 1.0). Composite indicators have often been used to measure artifacts, which can be understood as human-made concepts (Henseler, 2017b). Examples of such artifacts in marketing include the retail price index or the marketing mix (Hair, Sarstedt, & Ringle, 2019). However, composite indicators can also be used to measure attitudes, perceptions, and behavioral intentions (Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016; Rossiter, 2011; Rossiter, 2016), provided that the indicators have conceptual unity in accordance with a clear theoretical definition. The PLS-SEM algorithm relies solely on the concept of composite indicators because of the way the algorithm estimates formative measurement models (e.g., Diamantopoulos & Riefler, 2011). Causal indicators also form the latent variable but this type of measurement acknowledges that it is highly unlikely that any set of causal indicators can fully

Chapter 2  ■  Specifying the Path Model and Examining Data  

53

capture every aspect of a latent phenomenon (Diamantopoulos & Winklhofer, 2001). Therefore, latent variables measured with causal indicators have an error term, which is assumed to capture all the other causes of the latent variable not included in the model (Diamantopoulos, 2006). The use of causal indicators is prevalent in CB-SEM, which—at least in principle—allows for explicitly defining the error term of a formatively measured latent variable. However, the nature and magnitude of this error term is questionable as its magnitude partly depends on other constructs embedded in the model and their measurement quality (Aguirre-Urreta, Rönkkö, & Marakas, 2016). In a nutshell, the distinction between composite and causal indicators relates to a difference in measurement philosophy. Causal indicators assume that a certain concept can—at least in principle—be fully measured using a set of indicators and an error term. Composite indicators make no such assumption but view measurement explicitly as an approximation of a certain theoretical concept. The inclusion of an error term in causal indicator models appears appealing on first sight but as its magnitude depends on the measurement quality of downstream constructs, the error term’s value for judging the quality of the formative measurement model is ambiguous (Rigdon et al., 2014). In addition, by including an error term in a formative measurement model, CB-SEM treats the formative measurement as if it was a common factor model. PLS-SEM, on the other hand, estimates formative measurement models with composite indicators, which is fully en par with the composite-based approach underlying the PLS-SEM algorithm. That is, regardless of whether estimating reflective or formative measurement models, PLS-SEM uses linear combinations to form composites to measure the constructs in a path model (Chapter 3). In light of the above, the distinction between causal and composite indicators in measurement appears rather artificial with little consequence for method choice. For the sake of simplicity and in line with seminal research in the field (e.g., Fornell & Bookstein, 1982), we therefore refer to formative indicators when assuming composite indicators (as used in PLS-SEM) in the remainder of this book. Similarly, we refer to formative measurement models to describe measurement models comprising composite indicators. Henseler et al. (2014), Rigdon, Sarstedt, and Ringle (2017), and Sarstedt, Hair, Ringle, Thiele, and Gudergan (2016) provide further information on composite models as well as common factor models and their distinction. Exhibit 2.7 illustrates the key difference between the reflective and formative measurement perspectives. The black circle illustrates the construct domain, which is the domain of content the construct is intended to measure. The gray circles represent the content domain that each indicator captures. Whereas the reflective measurement approach aims at maximizing the overlap between interchangeable indicators, the formative measurement approach tries to fully cover the domain of the latent concept under investigation (black circle) by the different formative indicators (gray circles), which should have small overlap.

54   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 2.7  ■  C  onceptual Difference Between Reflective and Formative Measures Reflective measurement

Construct domain

Formative measurement

Construct domain

Note: The black circle represents the construct domain of interest and the gray-shaded circles the content domain captured by each indicator.

Unlike the reflective measurement approach whose objective is to maximize the overlap between interchangeable indicators, there are no specific expectations about patterns or the magnitude of intercorrelations between formative indicators (Diamantopoulos, Riefler, & Roth, 2008). Since there is no “common cause” for the items in the construct, there is not any requirement for the items to be correlated, and they may be completely independent. In fact, collinearity among formative indicators can present significant problems because the weights linking the formative indicators with the construct can become unstable and nonsignificant. Furthermore, formative indicators have no individual measurement error terms. That is, they are assumed to be error-free in a conventional sense. These characteristics have broad implications for the evaluation of formatively measured constructs, which rely on a totally different set of criteria compared with the evaluation of reflective indicators (Chapter 5). For example, analyzing the internal consistency reliability of a formatively measured construct could suggest that individual indicators need to be removed because of low inter-item correlations. However, such a step would decrease the content validity of the measurement approach (Diamantopoulos & Siguaw, 2006). Broadly speaking, researchers need to pay closer attention to the content validity of the measures by determining how well the indicators represent the domain (or at least its major aspects) of the latent concept under research (e.g., Bollen & Lennox, 1991).

Chapter 2  ■  Specifying the Path Model and Examining Data  

55

But when do we measure a construct reflectively or formatively? There is not a definite answer to this question since constructs are not inherently reflective or formative. Instead, the specification depends on the construct conceptualization and the objective of the study. Consider Exhibit 2.8 which shows how the construct “satisfaction with hotels” (Y1) can be operationalized in both ways (Albers, 2010). The left side of Exhibit 2.8 shows a reflective measurement model setup. This type of model setup is likely to be more appropriate when a researcher wants to test theories with respect to satisfaction. In many managerially oriented business studies, however, the aim is to identify the most important drivers of satisfaction that ultimately lead to customer loyalty. In this case, researchers should consider the different facets of satisfaction, such as satisfaction with the service or the personnel, as shown on the right side of Exhibit 2.8. In the latter case, a formative measurement model specification is more promising as it allows identifying distinct drivers of satisfaction and thus deriving more nuanced recommendations. This especially applies to situations where the corresponding constructs are exogenous. However, formative measurement models may also be used on endogenous constructs when measurement theory supports such a specification. Apart from the role a construct plays in the model and the recommendations the researcher wants to give based on the results, the specification of the

EXHIBIT 2.8  ■  S  atisfaction as a Formatively and Reflectively Measured Construct Reflective Measurement Model

The service is good

I appreciate this hotel I am looking forward to staying in this hotel I recommend this hotel to others

Formative Measurement Model

Y1

The personnel is friendly

Y1

The rooms are clean

Adapted from source: Albers, S. (2010). PLS and success factor studies in marketing. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (pp. 409–425). Berlin: Springer.

56   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

content of the construct (i.e., the domain content the construct is intended to capture) primarily guides the measurement perspective. Still, the decision as to which measurement model is appropriate has been the subject of considerable debate in a variety of disciplines and is not fully resolved. In Exhibit 2.9, we present a set of guidelines that researchers can use in their decision of whether to measure a construct reflectively or formatively. Note that there are also empirical means to determine the measurement perspective. Gudergan, Ringle, Wende, and Will (2008) propose the confirmatory tetrad analysis in PLS-SEM (CTA-PLS), which allows testing the null hypothesis that the construct measures are reflective in nature. We discuss the CTA-PLS technique

EXHIBIT 2.9  ■  Guidelines for Choosing the Measurement Model Mode Criterion

Decision

Reference

What is the causal priority between the indicator and the construct?

•• From the construct to the indicators: reflective

Diamantopoulos & Winklhofer (2001)

Is the construct a trait explaining the indicators or rather a combination of the indicators?

•• If trait: reflective

Do the indicators represent consequences or causes of the construct?

•• If consequences: reflective

Is it necessarily true that if the assessment of the trait changes, all items will change in a similar manner (assuming they are equally coded)?

•• If yes: reflective

Are the items mutually interchangeable?

•• If yes: reflective

•• From the indicators to the construct: formative

•• If combination: formative

Fornell & Bookstein (1982)

Rossiter (2002)

•• If causes: formative Chin (1998)

•• If no: formative

•• If no: formative

Jarvis, MacKenzie, & Podsakoff (2003)

Chapter 2  ■  Specifying the Path Model and Examining Data  

57

in greater detail in Chapter 8 (also see Hair, Sarstedt, Ringle, & Gudergan, 2018, Chapter 3). Clearly, a purely data-driven perspective needs to be supplemented with theoretical considerations based on the guidelines summarized in Exhibit 2.9.

Single-Item Measures and Sum Scores Rather than using multiple items to measure a construct, researchers sometimes choose to use a single item. PLS-SEM proves valuable in this respect as the method does not suffer from identification problems when using fewer than three items in a measurement model as is the case with CB-SEM. Single items have practical advantages such as ease of application, brevity, and lower costs associated with their use. Unlike long and complicated scales, which often result in a lack of understanding and mental fatigue for respondents, single items promote higher response rates as the questions can be easily and quickly answered (Fuchs & Diamantopoulos, 2009; Sarstedt & Wilczynski, 2009). However, single-item measures do not offer more for less. For instance, when partitioning the data into groups, researchers have fewer options since scores from only a single variable are available to partition the data. Similarly, information is available from only a single measure instead of several measures when using imputation methods to deal with missing values. More importantly, from a psychometric perspective, single-item measures do not allow for the removal of measurement error (as is the case with multiple items), which generally decreases their reliability. Note that, contrary to commonly held beliefs, single-item reliability can be estimated (e.g., Cheah, Sarstedt, Ringle, Ramayah, & Ting, 2018; Loo, 2002; Wanous, Reichers, & Hudy, 1997)—see Exhibit 5.3 in Chapter 5 for details. In addition, opting for single-item measures in most empirical settings is a risky decision when it comes to predictive validity considerations. Specifically, the set of circumstances that would favor the use of single-item over multi-item measures is very unlikely to be encountered in practice. According to the guidelines by Diamantopoulos, Sarstedt, Fuchs, Kaiser, and Wilczynski (2012), single-item measures should be considered only in situations when (1) small sample sizes are present (i.e., N < 50), (2) path coefficients (i.e., the coefficients linking constructs in the structural model) of 0.30 and lower are expected, (3) items of the originating multi-item scale are highly homogeneous (i.e., inter-item correlations > 0.80, Cronbach’s alpha > 0.90), and (4) the items are semantically redundant (Exhibit 2.10). For further discussions on the efficacy of single-item measures, see Kamakura (2015). Nevertheless, when setting up measurement models, this purely empirical perspective should be complemented with practical considerations. Some research situations call for or even necessitate the use of single items. Respondents frequently feel they are oversurveyed, which contributes to lower response rates. The difficulty of obtaining large sample sizes in surveys, often due to a lack of

58   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 2.10  ■  Guidelines for Single-Item Use

N < 50

Small sample size used?

No

Yes

Path coefficients < 0.30

Weak effects expected?

No

Use multiitem scale

Yes

Inter-item correlations > 0.80 Cronbach’s alpha > 0.90

Are items highly homogeneous?

No

Yes

Are items semantically redundant?

No

Yes Use single item

Source: Diamantopoulos, A., Sarstedt, M., Fuchs, C., Kaiser, S., & Wilczynski, P. (2012). Guidelines for choosing between multi-item and single-item scales for construct measurement: A predictive validity perspective. Journal of the Academy of Marketing Science, 40(3), 434–449.

willingness to take the time to complete questionnaires, leads to the necessity of considering reducing the length of construct measures where possible. Therefore, if the population being surveyed is small or only a limited sample size is available (e.g., due to budget constraints, difficulties in recruiting respondents,

Chapter 2  ■  Specifying the Path Model and Examining Data  

59

or dyadic data), the use of single-item measures may be a pragmatic solution. Even if researchers accept the consequences of lower predictive validity and use single-item measures anyway, one fundamental question remains: What should this item be? Unfortunately, research clearly points to severe difficulties when choosing a single item from a set of candidate items, regardless of whether this selection is based on statistical measures or expert judgment (Sarstedt, Diamantopoulos, Salzberger, & Baumgartner, 2016). Against this background, we clearly advise against the use of single items for construct measurement, unless indicated otherwise by Diamantopoulos et al.’s (2012) guidelines. Finally, it is important to note that the above issues must be considered for the measurement of unobservable phenomena, such as perceptions or attitudes. But single-item measures are clearly appropriate when used to measure observable characteristics such as sales, quotas, profits, and so on. In a similar manner, and as indicated in Chapter 1, we recommend avoiding using regressions based on sum scores, which some scholars have recently propagated. Similar to reflective and formative measurement models, sum scores use several indicators to measure a construct. However, instead of explicitly estimating their varying relationships with the construct, the sum scores approach uses the average value of the indicators to compute latent variable scores. Sum scores therefore represents a simplification of PLS-SEM, where all indicator weights in the measurement model are equal. This practice is problematic as it ignores the effect of measurement error inherent in each indicator. In contrast, the individual weighting of the indicators in a PLS-SEM analysis accounts for measurement error, thereby increasing the reliability and validity of the model estimates (Yuan, Wen, & Tang, 2020). For example, Hair, Hult, Ringle, Sarstedt, and Thiele (2017) have shown that sum scores can produce substantial parameter biases and often lag behind PLS-SEM in terms of statistical power. Apart from these reliability- and validityrelated concerns of the sum scores approach, the researcher does not learn which indicator has a higher or lower relative importance. Since PLS-SEM provides this additional information, its use is clearly superior compared with sum scores.

Higher-Order Constructs Thus far, we have considered constructs, which are measured on a single layer of abstraction. That is, we measured each construct with a set of indicators that are similar in terms of their concreteness. However, PLS-SEM also allows researchers to model a construct on multiple layers of abstraction simultaneously. Higher-order constructs, also referred to as higher-order models or hierarchical component models in the context of PLS-SEM (Lohmöller, 1989; Sarstedt, Hair, Cheah, Becker, & Ringle, 2019), allow specifying a single construct on a more abstract dimension and more concrete subdimensions at the same time (Cheah et al., 2019; Hair, Sarstedt, Ringle, & Gudergan, 2018, Chapter 2; Wetzels, Odekerken-Schröder, & van Oppen, 2009). For example, the construct satisfaction can be represented by a number of more concrete aspects, measured

60   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

by lower-order components that capture separate attributes of satisfaction. In the context of services, these might include satisfaction with the quality of the service, the service personnel, the price, or the servicescape. These lower-order components might form the more abstract higher-order component satisfaction, as shown in Exhibit 2.11. EXHIBIT 2.11  ■  Example of a Higher-Order Construct Lower-Order Components

Higher-Order Component

Price

Service Quality Satisfaction Personnel

Servicescape

Instead of modeling the attributes of satisfaction as drivers of the respondent’s overall satisfaction on a single construct layer, the higher-order construct summarizes the lower-order component into a single multidimensional construct. This modeling approach leads to more parsimony and reduces model complexity. Theoretically, this process can be extended to any number of multiple layers, but researchers usually restrict their modeling approach to two layers of abstraction (i.e., one higher-order component and several lower-order components). Constructs with two layers of abstraction are also referred to as second-order constructs. Chapter 8 offers more details on higher-order constructs. At this point, you should be able to create a path model. Exhibit 2.12 summarizes some key guidelines you should consider when preparing your path model. The next section continues with collecting the data needed to empirically test your PLS path model.

Chapter 2  ■  Specifying the Path Model and Examining Data  

61

EXHIBIT 2.12  ■  Guidelines for Preparing Your PLS Path Model Structural model •• The constructs considered relevant to the study must be clearly identified and defined. •• The structural model discussion states how the constructs are related to each other, that is, which constructs are dependent (endogenous) or independent (exogenous). If applicable, this also includes more complex relationships such as mediators or moderators and the inclusion of control variables. •• If possible, the nature (positive or negative) of the relationships as well as the direction is hypothesized on the basis of theory, logic, previous research, or researcher judgment. •• There is a clear explanation of why you expect these relationships to exist. The explanation cites theory, qualitative research, business practice, or some other credible source. •• A theoretical model or framework is prepared to clearly illustrate the hypothesized relationships. Measurement model •• The measurement model discussion states whether constructs are conceptualized as regular or higher-order constructs. •• The measurement specification (i.e., reflective vs. formative) has to be clearly stated and motivated. A construct’s conceptualization and the aim of the study guide this decision. •• Single-item measures should be used only if indicated by Diamantopoulos et al.’s (2012) guidelines. •• Do not use regressions based on sum scores.

STAGE 3: DATA COLLECTION AND EXAMINATION Application of PLS-SEM requires that quantitative data are available. Social science researchers typically use primary data, which have been collected for a specific research project, commonly using questionnaires (Sarstedt & Mooi, 2019; Chapter 3.2). However, researchers are increasingly turning their attention to secondary data, which are available from databases or come in the form of website tracking information; social media, geospatial, and sensor data; as well as other information obtained through scraping and similar data collection methods (Hulland, Baumgartner, & Smith, 2018).

62   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

When empirical data are collected using questionnaires, typically data collection issues must be addressed after the data are collected. The primary issues that need to be examined include missing data, suspicious response patterns (straight lining or inconsistent answers), outliers, and data distribution. We briefly address each of these on the following pages. The reader is referred to more comprehensive discussions of these issues in Hair, Black, Babin, and Anderson (2018).

Missing Data Researchers often have to deal with missing data. There are two levels at which missing data occur: • Entire surveys are missing (survey non-response), and • Respondents have not answered all the items (item non-response) Survey non-response (also referred to as unit non-response) occurs when entire surveys are missing. Survey non-response is very common as only 5–25% of surveys are typically filled out. Item non-response occurs when respondents do not provide answers to certain questions. There are different forms of missingness, including people not filling out or refusing to answer questions. Item non-response is common and 2–10% of questions usually remain unanswered. However, this number greatly depends on various factors, such as the subject matter, the length of the questionnaire, and the method of administration. Non-response can be much higher in respect to questions that people consider sensitive and varies from country to country. In some countries, for instance, reporting income is a sensitive issue (Sarstedt & Mooi, 2019; Chapter 3.9). As a rule of thumb, when the amount of missing data for a specific respondent exceeds 15%, the observation should be removed from the data set. Similarly, we recommend excluding an indicator from the analysis if it has more than 15% missing values. Once observations with too many missing responses have been removed, the next step is to decide how to deal with the remaining missing values in the data set. The software used in the book, SmartPLS 3 (Ringle, Wende, & Becker, 2015), offers three types of missing value treatment. In mean value replacement, the missing values of an indicator variable are replaced with the mean of valid values of that indicator. While easy to implement, mean value replacement decreases the variability in the data and likely reduces the possibility of finding meaningful relationships. It should therefore be used only when the data exhibit extremely low levels of missing data. As a rule of thumb, we recommend using mean value replacement when there are less than 5% values missing per indicator. Alternatively, SmartPLS offers an option to remove all cases from the analysis that include missing values in any of the indicators used in the model (referred to as casewise deletion or listwise deletion). Grimm and Wagner (2020) show that PLS-SEM estimates are very stable when using casewise

Chapter 2  ■  Specifying the Path Model and Examining Data  

63

deletion on data sets with up to 9% missing values. However, when using casewise deletion, researchers need to ensure that they do not systematically delete a certain group of respondents. For example, market researchers frequently observe that wealthy respondents are more likely to refuse answering questions related to their income. Running casewise deletion would systematically omit this group of respondents and therefore yield erroneous conclusions. Second, using casewise deletion can dramatically diminish the number of observations in the data set. It is therefore crucial to carefully check the number of observations used in the final model estimation when this type of missing value treatment is used. Instead of discarding all observations with missing values, pairwise deletion uses all observations with complete responses in the calculation of the model parameters. For example, assume we have a measurement model with three indicators (x1, x 2, and x3). To estimate the model parameters, all valid values in x1, x 2, and x3 are used in the computation. That is, if a respondent has a missing value in x3, the valid values in x1 and x 2 are still used to calculate the model. Consequently, different calculations in the analysis may be based on different sample sizes, which can bias the results. Some researchers, therefore, call this approach “unwise deletion,” and we also generally advise against its use. Exceptions are situations in which many observations have missing values—thus hindering the use of mean replacement and especially casewise deletion—and the aim of the analysis is to gain first insights into the model structure. In addition, more complex procedures for handling missing values can be conducted before analyzing the data with SmartPLS. Among the best approaches to overcome missing data is to first determine the demographic profile of the respondent with missing data and then calculate the mean for the sample subgroup representing the identified demographic profile. For example, if the respondent with missing data is male, aged 25 to 34, with 14 years of education, then calculate the mean for that group on the questions with missing data. Next, determine if the question with missing data is associated with a construct with multiple items. If yes, then calculate an average of the responses to all the items associated with the construct. The final step is to use the subgroup mean and the average of the construct indicator responses to decide what value to insert for the missing response. This approach minimizes the decrease in variability of responses and also enables the researcher to know specifically what is being done to overcome missing data problems. Finally, research has brought forward a variety of methods that impute missing observations using information from the available data (Little & Rubin, 2002; Schafer & Graham, 2002). The choice of the best imputation method depends on several factors, including the number of missing values and the missing value pattern—see Sarstedt and Mooi (2019; Chapter 5.4) for an overview. However, since knowledge on their suitability specifically in a PLS-SEM context is scarce, we recommend drawing on the methods described above when treating missing values in PLS-SEM analyses.

64   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Suspicious Response Patterns Before analyzing their data, researchers should also examine response patterns. In doing so, they are looking for a pattern often described as straight lining. Straight lining occurs when a respondent marks the same response for a high proportion of the questions. For example, if a 7-point scale is used to obtain answers and the response pattern is all 4s (the middle response), then that respondent in most cases should be deleted from the data set. Similarly, if a respondent selects only 1s or only 7s, then that respondent should in most cases be removed. Other suspicious response patterns are diagonal lining and alternating extreme pole responses. A visual inspection of the responses or the analysis of descriptive statistics (e.g., mean, variance, and distribution of the answers per respondent) allows identifying suspicious response patterns. Inconsistency in answers may also need to be addressed before analyzing the data. Many surveys start with one or more screening questions. The purpose of a screening question is to ensure that only individuals who meet the prescribed criteria complete the survey. For example, a survey of mobile phone users may screen for individuals who own an Apple iPhone. But a question later in the survey is posed and the individual indicates he or she uses an Android device. This respondent would therefore need to be removed from the data set. Surveys often ask the same question with slight variations, especially when reflective indicators are used. If a respondent gives a very different answer to the same question asked in a slightly different way, this too raises a red flag and suggests the respondent was not reading the questions closely or simply was marking answers to complete and exit the survey as quickly as possible. Finally, researchers sometimes include specific questions to assess the attention of respondents. For example, in the middle of a series of questions, the researcher may instruct the respondent to check only a 1 on a 7-point scale for the next question. If any answer other than a 1 is given for the question, it is an indication the respondent is not closely reading the question.

Outliers An outlier is an extreme response to a particular question, or extreme responses to all questions. Outliers must be interpreted in the context of the study, and this interpretation should be based on the type of information they provide. Outliers can result from data collection of entry errors (e.g., the researcher coded “77” instead of “7” on a 1 to 9 Likert scale). However, exceptionally high or low values can also be part of reality (e.g., an exceptionally high income). Finally, outliers can occur when combinations of variable values are particularly rare (e.g., spending 80% of annual income on holiday trips). The first step in dealing with outliers is to identify them. Standard statistical software packages offer a multitude of univariate, bivariate, or multivariate graphs and statistics, which allow identifying outliers. For example, when analyzing

Chapter 2  ■  Specifying the Path Model and Examining Data  

65

box plots, one may characterize responses as extreme outliers, which are three times the interquartile range below the first quartile or above the third quartile. Moreover, IBM SPSS Statistics has an option called Explore that develops box plots and stem-and-leaf plots to facilitate the identification of outliers by respondent number (Sarstedt & Mooi, 2019; Chapter 5.4). Once the outliers are identified, the researcher must decide what to do. If there is an explanation for exceptionally high or low values, outliers are typically retained, because they represent an element of the population. However, their impact on the analysis results should be carefully evaluated. That is, one should run the analyses with and without the outliers to ensure that a very few (extreme) observations do not influence the results substantially. If the outliers are a result of data collection or entry errors, they are always deleted or corrected (e.g., the value of 55 on a 7-point scale). If there is no clear explanation for the exceptional values, outliers should be retained—see Sarstedt and Mooi (2019; Chapter 5.4) for more details about outliers. Outliers can also represent a unique subgroup of the sample. There are two approaches to use in deciding if a unique subgroup exists. First, a subgroup can be identified based on prior knowledge, for example, based on observable characteristics such as gender, age, or income. Using this information, the researcher partitions the data set into two or more groups and runs a multigroup analysis to disclose significant differences in the model parameters. The second approach to identifying unique subgroups is the application of latent class techniques. Latent class techniques allow researchers to identify and treat unobserved heterogeneity, which cannot be attributed to a specific observable characteristic or a combination of characteristics. Several latent class techniques have recently been proposed that generalize finite mixture modeling, iteratively reweighted least squares, hill-climbing approaches, and genetic algorithms to PLS-SEM (Sarstedt, Ringle, & Hair, 2017b). In Chapter 8, we discuss several of these techniques in greater detail.

Data Distribution PLS-SEM is a nonparametric statistical method. Different from CB-SEM, which draws on a maximum likelihood estimator that requires normally distributed data, PLS-SEM does not make any distributional assumptions (Hair, Ringle, & Sarstedt, 2011). Nevertheless, it is important to verify that the data are not too far from normal as extremely nonnormal data prove problematic in the assessment of the parameters’ significances. Specifically, extremely nonnormal data inflate standard errors obtained from bootstrapping (see Chapter 5 for more details) and thus trigger type II errors (i.e., false negatives). For instance, the Shapiro–Wilks test is designed to test normality by comparing the data to a normal distribution with the same mean and standard deviation as in the sample (Sarstedt & Mooi, 2019; Chapter 5). However, this test only indicates whether the null hypothesis of normally distributed data should be rejected

66   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

or not. As the bootstrapping procedure performs fairly robustly when data are nonnormal, these tests provide only limited guidance when deciding whether the data are too far from being normally distributed. Instead, researchers should examine two measures of distributions—skewness and kurtosis. Skewness assesses the extent to which a variable’s distribution is symmetrical. If the distribution of responses for a variable stretches toward the right or left tail of the distribution, then the distribution is characterized as skewed. A negative skewness indicates a greater number of larger values, whereas a positive skewness indicates a greater number of smaller values. As a general guideline, a skewness value between −1 and +1 is considered excellent, but a value between −2 and +2 is generally considered acceptable. Values beyond −2 and +2 are considered indicative of substantial nonnormality. Kurtosis is a measure of whether the distribution is too peaked (a very narrow distribution with most of the responses in the center). A positive value for the kurtosis indicates a distribution more peaked than normal. In contrast, a negative kurtosis indicates a shape flatter than normal. Analogous to the skewness, the general guideline is that if the kurtosis is greater than +2, the distribution is too peaked. Likewise, a kurtosis of less than −2 indicates a distribution that is too flat. When both skewness and kurtosis are close to zero, the pattern of responses is considered a normal distribution (George & Mallery, 2019). Serious effort, considerable amounts of time, and a high level of caution are required when collecting and analyzing the data that you need for carrying out multivariate techniques. Always remember the garbage in, garbage out rule. All your analyses are meaningless if your data are inappropriate. Exhibit 2.13 summarizes some key guidelines you should consider when examining your data and preparing them for PLS-SEM. For more detail on examining your data, see Chapter 2 of Hair, Black, Babin, and Anderson (2019).

EXHIBIT 2.13  ■  Guidelines for Examining Data Used With PLS-SEM •• Missing data must be identified. When missing data per observation (i.e., item non-response) and per indicator exceed 15%, they should be removed from the data set. Other missing data should be dealt with before running a PLS-SEM analysis. When less than 5% of values per indicator are missing, use mean replacement. Otherwise, use casewise deletion, but make sure that the deletion of observations does not occur systematically and that enough observations remain for the analysis. Generally, avoid using pairwise deletion. Also consider using more complex imputation procedures before importing the data into the PLS-SEM software. •• Suspicious and inconsistent response patterns typically justify removing a response from the data set. (Continued)

Chapter 2  ■  Specifying the Path Model and Examining Data  

67

EXHIBIT 2.13  ■  (Continued) •• Outliers should be identified before running PLS-SEM. Subgroups that are substantial in size should be identified based on prior knowledge or by statistical means (e.g., using a latent class analysis). •• Lack of normality in variable distributions can distort the results of multivariate analysis. This problem is much less severe with PLS-SEM, but researchers should still examine PLS-SEM results carefully when distributions deviate substantially from normal. Absolute skewness and kurtosis values of greater than 2 are indicative of nonnormal data.

CASE STUDY ILLUSTRATION— SPECIFYING THE PLS-SEM MODEL The most effective way to learn how to use a statistical method is to apply it to a set of data. Throughout this book, we use a single example that enables you to do that. We start the example with a simple model, and in Chapter 5, we expand that same model to a much broader, more complex model. For our initial model, we hypothesize a path model to estimate the relationships between corporate reputation, customer satisfaction, and customer loyalty. The example will provide insights on (1) how to develop the structural model representing the underlying concepts/theory, (2) the setup of measurement models for the latent variables, and (3) the structure of the empirical data used. Then, our focus shifts to setting up the SmartPLS 3 software (Ringle, Wende, & Becker, 2015) for PLS-SEM.

Application of Stage 1: Structural Model Specification To specify the structural model, we must begin with some fundamental explications about theoretical models. The corporate reputation model by Eberl (2010) is the basis of our theory. The goal of the model is to explain the effects of corporate reputation on customer satisfaction (CUSA) and, ultimately, customer loyalty (CUSL). Corporate reputation represents a company’s overall evaluation by its stakeholders (Helm, Eggert, & Garnefeld, 2010). It is measured using two dimensions. One dimension, the company’s competence (COMP), represents cognitive evaluations of the company. The second dimension captures affective judgments, which determine the company’s likeability (LIKE). This two-dimensional approach to measure reputation was developed by Schwaiger (2004). It has been validated in different countries (e.g., Eberl, 2010; Zhang & Schwaiger, 2012) and applied in various research studies (e.g., Eberl & Schwaiger, 2005; Radomir & Moisescu, 2019; Radomir & Wilson, 2018;

68   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Raithel & Schwaiger, 2015; Raithel, Wilczynski, Schloderer, & Schwaiger, 2010; Sarstedt & Schloderer, 2010; Schloderer, Sarstedt, & Ringle, 2014; Schwaiger, Raithel, & Schloderer, 2009; Yun, Kim, & Cheong, 2020). Research also shows that the approach performs favorably (in terms of convergent validity and predictive validity) compared with alternative reputation measures (Sarstedt, Wilczynski, & Melewar, 2013). Building on a definition of corporate reputation as an attitude-related construct, Schwaiger (2004) further identified four antecedent dimensions of reputation—quality, performance, attractiveness, and corporate social responsibility—measured by a total of 21 formative indicators. These driver constructs of corporate reputation are components of the more complex example we will use in the book and will be added in Chapter 5. Likewise, we do not consider more complex model setups such as mediation or moderation effects yet. These aspects will be covered in the case studies in Chapter 7. In summary, the simple corporate reputation model has two main theoretical components: (1) the target constructs of interest—namely CUSA and CUSL (endogenous constructs)—and (2) the two corporate reputation dimensions COMP and LIKE (exogenous constructs), which represent key determinants of the target constructs. Exhibit 2.14 shows the constructs and their relationships, which represent the structural model for the PLS-SEM case study. To propose a theory, researchers usually build on existing research knowledge. When PLS-SEM is applied, the structural model displays the theory with its key elements (i.e., constructs) and cause-effect relationships (i.e., paths). Researchers EXHIBIT 2.14  ■  Example of a Theoretical Model (Simple Model)

COMP

CUSA

LIKE

CUSL

Chapter 2  ■  Specifying the Path Model and Examining Data  

69

typically develop hypotheses for the constructs and their path relationships in the structural model. For example, consider Hypothesis 1 (H1): Customer satisfaction has a positive effect on customer loyalty. PLS-SEM enables statistically testing the significance of the hypothesized relationship (Chapter 6). When conceptualizing the theoretical constructs and their hypothesized structural relationships for PLS-SEM, it is important to make sure the model has no circular relationships (i.e., causal loops). A circular relationship would occur if, for example, we reversed the relationship between COMP and CUSL as this would yield the causal loop COMP → CUSA → CUSL → COMP.

Application of Stage 2: Measurement Model Specification Since the constructs are not directly observed, we need to specify a measurement model for each construct. The specification of the measurement models (i.e., multi-item vs. single-item measures and reflective vs. formative measures) draws on prior research studies by Schwaiger (2004) and Eberl (2010). In our simple example of a PLS-SEM application, we have three constructs (COMP, CUSL, and LIKE) measured by multiple items (Exhibit 2.15). All three constructs have reflective measurement models as indicated by the arrows pointing from the construct to the indicators. For example, COMP is measured by means of the three reflective items comp_1, comp_2, and comp_3, which relate to EXHIBIT 2.15  ■  Types of Measurement Models in the Simple Model Reflective Measurement Model comp_1 comp_2

Reflective Measurement Model like_1

COMP

like_2

LIKE

like_3

comp_3 Single-Item Construct

Reflective Measurement Model cusl_1

cusa

CUSA

CUSL

cusl_2 cusl_3

70   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

the following survey questions (Exhibit 2.16): “[The company] is a top competitor in its market,” “As far as I know, [the company] is recognized worldwide,” and “I believe that [the company] performs at a premium level.” Respondents had to indicate the degree to which they (dis)agree with each of the statements on a 7-point scale from 1 = fully disagree to 7 = fully agree. Different from COMP, CUSL, and LIKE, the customer satisfaction construct (CUSA) is operationalized by a single item (cusa) that is related to the following question in the survey: “If you consider your experiences with [company], how satisfied are you with [company]?” The single indicator is measured with a 7-point scale indicating the respondent’s degree of satisfaction (1 = very dissatisfied; 7 = very satisfied). The single item has been used due to practical considerations in an effort to decrease the overall number of items in the questionnaire. As customer satisfaction items are usually highly homogeneous, the loss in predictive validity compared with a multi-item measure is not considered severe. As cusa is

EXHIBIT 2.16  ■  I ndicators for Reflective Measurement Model Constructs Competence (COMP) comp_1

[The company] is a top competitor in its market.

comp_2

As far as I know, [the company] is recognized worldwide.

comp_3

I believe that [the company] performs at a premium level.

Likeability (LIKE) like_1

[The company] is a company that I can better identify with than other companies.

like_2

[The company] is a company that I would regret more not having if it no longer existed than I would other companies.

like_3

I regard [the company] as a likeable company.

Customer Loyalty (CUSL) cusl_1

I would recommend [the company] to friends and relatives.

cusl_2

If I had to choose again, I would choose [the company] as my mobile phone services provider.

cusl_3

I will remain a customer of [the company] in the future.

Note: For data collection, the actual name of the company was inserted in the bracketed space that indicates company.

Chapter 2  ■  Specifying the Path Model and Examining Data  

71

the only item measuring customer satisfaction, construct and item are equivalent (as indicated by the fact that the relationship between construct and single-item measure is always one in PLS-SEM). Therefore, the choice of the measurement perspective (i.e., reflective vs. formative) is of no concern and the relationship between construct and indicator is undirected.

Application of Stage 3: Data Collection and Examination To estimate the PLS-SEM, data were collected using computer-assisted telephone interviews (Sarstedt & Mooi, 2019; Chapter 4) that asked about the respondents’ perception of and their satisfaction with four major mobile network providers in Germany’s mobile communications market. Respondents rated the questions on 7-point Likert scales, with higher scores denoting higher levels of agreement with a particular statement. In the case of cusa, higher scores denote higher levels of satisfaction. Satisfaction and loyalty were measured with respect to the respondents’ own service providers. The data set used in this book is a subset of the original set and has a sample size of 344 observations. The data have been collected using a quota sampling approach (Sarstedt, Bengart, Shaltoni, & Lehmann, 2018) by a professional market research company in the German market. The resulting sample is representative of the German population. Exhibit 2.17 shows the data matrix for the model. The 10 columns represent a subset of all variables (i.e., specific questions in the survey as described in the previous section) that have been surveyed, and the 344 rows (i.e., cases) contain the answers of every respondent to these questions. For example, the first row contains the answers of Respondent 1 while the last row contains the answers of Respondent 344. The columns show the answers to the survey questions. Data in the first nine columns are for the indicators associated with the three constructs, and the tenth column includes the data for the single indicator of CUSA. The data set contains further variables that relate to, for example, the driver constructs of LIKE and COMP. We will cover these aspects in Chapter 5.

EXHIBIT 2.17  ■  Data Matrix for the Indicator Variables Variable Name Case Number comp_1 comp_2 comp_3 like_1 like_2 like_3 cusl_1 cusl_2 cusl_3 cusa

...

1

6

7

6

6

6

6

7

7

7

7

...

2

4

5

6

5

5

5

7

7

5

6

...

...

...

...

...

...

...

...

...

...

...

...

...

344

6

5

6

6

7

5

7

7

7

7

...

72   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

If you are using a data set in which a respondent did not answer a specific question, you need to insert a number that does not appear otherwise in the responses to indicate the missing values. Researchers commonly use −99 to indicate missing values, but you can use any other value that does not normally occur in the data set. In the following, we will also use −99 to indicate missing values. If, for example, the first data point of comp_1 were a missing value, the −99 value would be inserted into the space as a missing value space holder instead of the value of 6 that you see in Exhibit 2.17. Missing value treatment procedures (e.g., mean replacement) could then be applied to these data (e.g., Hair, Black, Babin, & Anderson, 2019). Again, if the number of missing values in your data set per indicator is relatively small (i.e., less than 5% missing per indicator), we recommend mean value replacement instead of casewise deletion to treat the missing values when running PLS-SEM. Furthermore, we need to ascertain that the number of missing values per observation and per indicator does not exceed 15%. If this was the case, the corresponding observation should be eliminated from the data set. The data example shown in Exhibit 2.17 (and in the book’s example) has only very few missing values. More precisely, cusa has one missing value (0.29%), cusl_1 and cusl_3 have three missing values (0.87%), and cusl_2 has four missing values (1.16%). Since the missing values per indicator are less than 5%, mean value replacement can be used. Furthermore, none of the observations and indicators has more than 15% missing values, so we can proceed analyzing all 344 respondents. To run outlier diagnostics, we compute a series of box plots using IBM SPSS Statistics—see Chapter 5 in Sarstedt and Mooi (2019) for details on how to run these analyses in IBM SPSS Statistics. The results indicate some influential observations but no outliers. Moreover, nonnormality of data regarding skewness and kurtosis is not an issue. The kurtosis and skewness values of all the indicators are within the −2 and +2 range.

Path Model Creation Using The SmartPLS Software The SmartPLS 3 software (Ringle, Wende, & Becker, 2015) is used to execute all the PLS-SEM analyses in this book. The discussion includes an overview of the software’s functionalities. The student version of the software is available free of charge at https://www.smartpls.com. The student version offers practically all functionalities of the full version but is restricted to data sets with a maximum of 100 observations. However, as the data set used in this book has more than 100 observations (344 to be precise), you should use the professional version of SmartPLS, which is available as a 30-day trial version at https://www .smartpls.com. After the trial period, a license fee applies. Licenses are available for different periods of time (e.g., 1 month, 1 year, or 2 years) and can be

Chapter 2  ■  Specifying the Path Model and Examining Data  

73

purchased through the SmartPLS website. The SmartPLS website includes a download area for the software, including the old SmartPLS 2 (Ringle, Wende, & Will, 2005) software, and many additional resources such as short explanations of PLS-SEM and software-related topics, a list of recommended literature, answers to frequently asked questions, tutorial videos for getting started using the software, and the SmartPLS forum, which allows you to discuss PLS-SEM topics with other users. Sarstedt and Cheah (2019) provide a comprehensive software review. SmartPLS has a graphical user interface that enables the user to estimate the PLS path model. Exhibit 2.20 at the end of this section shows the graphical interface for the SmartPLS software, with the simple model already drawn. In the following paragraphs, we describe how to set up this model using the SmartPLS software. Before you draw your model, you need to have data that serve as the basis for running the model. The data we will use with the reputation model can be downloaded either as comma-separated value (.csv) or text (.txt) data sets in the download section of this book’s webpage at the following URL: https://www.pls-sem.net/. SmartPLS can use both data file formats (i.e., .csv or .txt). Follow the onscreen instructions to save one of these two files on your hard drive. Click on Save Target As . . . to save the data to a folder on your hard drive and then Close. Now run the SmartPLS software by clicking on the desktop icon that is available after the software installation on your computer device. Alternatively, go to the folder where you installed the SmartPLS software on your computer. Click on the file that runs SmartPLS and then on the Run tab to start the software. To create a new project after running SmartPLS, click on File → Create New Project. First type a name for the project into the Name box (e.g., PLS-SEM BOOK - Corporate Reputation Extended). After clicking OK, the new project is created and appears in the Project Explorer window that is in the upper left below the menu bar. All previously created SmartPLS projects also appear in this window. Next, you need to assign a data set to the project, in our case, Corporate reputation data.csv (or whatever name you gave to the data you downloaded). To do so, click on the information button labeled Double-click to import data! below the project you just created, find and highlight your data folder, and click Open. It is important to note that if you use your own data set for a project using the SmartPLS software, the data must not include any string elements (e.g., respondents’ comments to open-ended questions). For example, SmartPLS interprets single dots (such as those produced by IBM SPSS Statistics in case an observation has a system-missing value) as string elements. In our example, the data set does not include any string elements, so this is not an issue. In the screen that follows, you can adjust the name of the data set. In this example, we use the original name (i.e., Corporate reputation data) and proceed by clicking OK. SmartPLS will open a new tab (Exhibit 2.18), which provides information on the data set and its format (data view).

74

EXHIBIT 2.18  ■  Data View in SmartPLS

Chapter 2  ■  Specifying the Path Model and Examining Data  

75

At the bottom of the screen appears a list with all variables, their number of missing values and basic descriptive statistics (e.g., mean, median, minimum, and maximum values, standard deviation, excess kurtosis, and skewness). At the top right of the screen you can see the Sample Size as well as the number of indicators and missing values. At the top left of the screen, you can specify the Delimiter to determine the separation of the data values in your data set (i.e., comma, semicolon, tabulator, or space), the Value Quote Character (i.e., none, single quote, or double quote) in case the values use quotations (e.g., “7”), and the Number Format (i.e., United States with a dot as decimal separator or Europe with a comma as decimal separator). Furthermore, you can specify the coding of missing values. Click on None next to Missing Value Marker. In the screen that follows, you need to specify missing values. Enter −99 in the field and click on OK. SmartPLS dynamically updates the descriptive statistics of the indicators that contain missing values and indicates the number of missing values next to Missing Values (Exhibit 2.18). You can specify only one specific value for all missing data in SmartPLS. Thus, you have to make sure that all missing values have the same coding (e.g., −99) in your original data set. That is, you need to code all missing values uniformly, regardless of their type (user-defined missing or system-missing) and the reason for being missing (e.g., respondent refused to answer, respondent did not know the answer, not applicable). The additional tabs in the data view show the Indicator Correlations and the Raw File with the imported data. At this point, you can close the data view. Note that you can always reopen the data view by double-clicking on the data set (i.e., Corporate reputation data) in the Project Explorer. Each project can have one or more path models and one or more data sets (i.e., .csv or .txt files). When setting up a new project, SmartPLS will automatically add a model with the same name as the project. You can also rename the model by right-clicking on it. In the menu that opens, click on Rename and type in the new name for the model. To distinguish our introductory model from the later ones, rename it to Simple Model and click on OK (Exhibit 2.19). Next, double-click on Simple Model in the Project Explorer window and SmartPLS will open the graphical Modeling Window on the right, where you can create a path model. We start with a new project (as opposed to working with a saved project), so the Modeling Window is empty and you can start creating the path model shown in Exhibit 2.19. By clicking on Latent Variable ), you can place one new construct into the Modeling in the menu bar ( Window. Each time you left-click in the Modeling Window, a new construct represented by a red circle will appear. Alternatively, go to the Edit menu and click Add Latent Variable(s). Now a new construct will appear each time you left-click in the Modeling Window. To leave this mode, click on Select in the menu bar ( ).

76

EXHIBIT 2.19  ■  Initial Model

Chapter 2  ■  Specifying the Path Model and Examining Data  

77

Once you have created all your constructs, you can left-click on any of the constructs to select, resize, or move it in the Modeling Window. To connect the latent variables with each other (i.e., to draw path arrows), left-click on Connect in the menu bar ( ). Next, left-click on an exogenous (independent) construct and move the cursor over the target endogenous (dependent) construct. Now left-click on the endogenous construct, and a path relationship (directional arrow) will be inserted between the two constructs. Repeat the same process and connect all the constructs based on your theory. Alternatively, go to the Edit menu and click Add Connection(s). The next step is to name the constructs. To do so, right-click on the construct to open a menu with different options and left-click on Rename. Type the name of your construct in the window of the Rename box (i.e., COMP) and then click OK. The name COMP will appear under the construct. Follow these steps to name all constructs. When you finish, it will look like Exhibit 2.19. Next, you need to assign indicators to each of the constructs. On the left side of the screen, there is an Indicators window that shows all the indicators that are in your data set along with some basic descriptive statistics when you left-click on an indicator. Start with the COMP construct by dragging the first competence indicator comp_1 from the Indicators window and dropping it on the construct (i.e., left-click the mouse and hold the button down, then move it until over a construct, then release). After assigning an indicator to a construct, it appears in the graphical Modeling Window as a yellow rectangle attached to the construct (as reflective). Assigning an indicator to a construct will also turn the color of the construct from red to blue. You can move the indicator around, but it will remain attached to the construct (unless you delete it). By right-clicking on the construct and choosing one of the options under Align (e.g., Indicators Top), you can align the indicator(s). You can also hide the indicators of a construct by selecting the corresponding option in the menu that appears when right-clicking on it. Moreover, you can access the align indicators option via the Modeling Toolbox on the right-hand side of the Modeling Window. Continue until you have assigned all the indicators to the constructs as shown in Exhibit 2.20. Make sure to save the model by going to File → Save. Right-clicking on selected construct(s) in the graphical Modeling Window opens a menu with several options. Apart from renaming the constructs, you can for example invert the measurement model from reflective to formative measurement, and vice versa (Switch between Formative/Reflective), hide and show the indicators of the construct, and access more advanced options such as adding interaction and quadratic effects. Additionally, when double-clicking on construct, a different menu opens that allows you to select an indicator weighting scheme per construct (i.e., Automatic, Mode A, Mode B, Sumscores, and Predefined) and a value for the construct reliability between 0 and 1 for formatively measured constructs. To add a note to your Modeling Window, left-click on the Comment button in the menu bar.

78

EXHIBIT 2.20  ■  Simple Model With Names and Data Assigned

Chapter 2  ■  Specifying the Path Model and Examining Data  

79

Clicking on the right mouse button while the cursor is placed over other elements also opens a menu with additional functions. As a further example, if you place the cursor in the Project Explorer window and right-click on the project name, you can create a new model (Create New Path Model), create a new project (Create New Project), or import a new data set (Import Data File). Moreover, you can select the Copy, Paste, and Delete options for projects and models that appear in the Project Explorer window. For example, the Duplicate option is useful when you would like to modify a PLS path model but want to keep your initial model setup. Using the Copy option, you can copy and paste a PLS path model from one project to the other. The Import Data File option allows you to add more data sets to an existing project (e.g., data from different years if available). You can also export a project by selecting the Export Project option. Using this option, SmartPLS will export the entire project, including all models and data sets you may have included in it, in a .zip folder. You can also directly import this “readyto-use” project by going to File → Import Project from Backup File. You can use this option to import the project that includes the PLS-SEM example on corporate reputation. The file name is Corporate Reputation.zip. This project is ready to download on your computer system in the download section at https://www .pls-sem.net/. Download this file and save it on your computer system. Then, go to File → Import Project from Backup File. SmartPLS allows you to browse your computer and select the downloaded project Corporate Reputation.zip for import. After successful import, double-click on the model in this project, and the path model as shown in Exhibit 2.20 will appear in a new Modeling Window.

Summary •

Understand the basic concepts of structural model specification, including mediation, moderation, and the use of control variables. This chapter includes the first three stages in the application of PLS-SEM. Building on an established theory, prior research, and logic the model specification starts with the structural model (Stage 1). Each element of the theory represents a construct in the structural model. Moreover, assumptions for the causal relationships between the elements must be considered. The relationships between the constructs are directed (i.e., the arrows linking the constructs go from one construct to the next), but they can also be more complex and contain mediating or moderating relationships. In addition, researchers frequently specify control variables in order to control for the impact of other characteristics or phenomena that are not part of the primary theoretical model being tested. The goal of the PLS-SEM analysis is to empirically test the theory or a certain element thereof in the form of the structural model. (Continued)

80   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued)



Explain the differences between reflective and formative measurement models and specify the appropriate measurement model. Stage 2 focuses on selecting a measurement model for each construct in the structural model to obtain reliable and valid measurements. Generally, there are two types of measurement models: reflective and formative. The reflective mode has arrows (relationships) pointing from the construct to the indicators in the measurement model. If the construct changes, it leads to a simultaneous change of all items in the measurement model. Thus, all indicators are highly correlated. In contrast, in formative measurement models, arrows point from the indicators in the measurement model to the constructs. Hence, all indicators together form the construct, and all major elements of the domain must be represented by the selected formative indicators. Since formative indicators represent independent sources of the construct’s content, they do not necessarily need to be correlated (in fact, they shouldn’t be highly correlated).



Comprehend that the selection of the mode of measurement model and the indicators must be based on theoretical reasoning before data collection. A reflective specification would use different indicators than a formative specification of the same construct. Researchers typically use reflective constructs as target constructs of the PLS path model, while formative constructs may be particularly valuable as explanatory sources (independent variables) or drivers of these target constructs. During the data analysis phase, the measurement mode of the constructs can be empirically tested by using confirmatory tetrad analysis.



Explain the difference between multi-item and single-item measures and assess when to use each measurement type. Rather than using multiple items to measure a construct, researchers sometimes choose to use a single item. Single items have practical advantages such as ease of application, brevity, and lower costs associated with their use. However, single-item measures do not offer more for less. From a psychometric perspective, single-item measures are less reliable and lag behind in terms of predictive validity. The latter aspect is particularly problematic in the context of PLS-SEM in light of the method’s causal-predictive character. Furthermore, identifying an appropriate single item from a set of candidate items, regardless of whether this selection is based on statistical measures or expert judgment, proves very difficult. For these reasons, the use of single items should generally be avoided. The above issues are important considerations when measuring unobservable phenomena, such as perceptions or attitudes. But singleitem measures are clearly appropriate when used to measure observable characteristics such as gender, sales, profits, and so on.

Chapter 2  ■  Specifying the Path Model and Examining Data  



Understand the nature of higher-order constructs. Higher-order constructs, also referred to as hierarchical component models, are used to specify a single construct on a more abstract dimension and more concrete subdimensions at the same time. Higher-order constructs have become increasingly popular in research since they offer a means of establishing more parsimonious path models. Researchers often specify and estimate higher-order constructs with two layers of abstraction, also referred to as second-order constructs.



Describe the data collection and examination considerations necessary to apply PLS-SEM. Stage 3 underlines the need to examine your data after they have been collected to ensure that the results from the methods application are valid and reliable. The primary issues that need to be examined include missing data, suspicious response patterns (straight lining or inconsistent answers), and outliers. Distributional assumptions are of less concern because of PLS-SEM’s nonparametric nature. However, as highly skewed data can cause issues in the estimation of significance levels, researchers should ensure that the data are not too far from normal. As a general rule of thumb, always remember the garbage in, garbage out rule. All your analyses are meaningless if your data are inappropriate.



Learn how to develop a PLS path model using the SmartPLS software. The first three stages of conducting a PLS-SEM analysis are explained by conducting a practical exercise. We discuss how to draw a PLS path model focusing on corporate reputation and its relationship with customer satisfaction and loyalty. We also explain several options that are available in the SmartPLS software. The outcome of the exercise is a PLS path model drawn using the SmartPLS software that is ready to be estimated.

Review Questions 1. What is a structural model? 2. What is a reflective measurement model? 3. What is a formative measurement model? 4. What is a single-item measure? 5. When do you consider data to be “too nonnormal” for a PLS-SEM analysis?

81

82   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Critical Thinking Questions 1. How can you decide whether to specify a construct reflectively or formatively? 2. Which research situations favor the use of reflective and formative measures? 3. Discuss the pros and cons of single-item measures. 4. Create your own example of a PLS path model (including the structural model with latent variables and the measurement models). 5. Why is it important to carefully analyze your data prior to analysis? What particular problems do you encounter when the data set has relatively large amounts of missing data per indicator (e.g., more than 5% of the data are missing per indicator)?

Key Terms Alternating extreme pole responses 64 Artifacts 52 Casewise deletion  62 Causal indicators  52 Causal links  43 Composite indicators  52 Confirmatory tetrad analysis in PLS-SEM (CTA-PLS)  56 Control variables  47 Diagonal lining  64 Direct effect  44 Effect indicators  52 Endogenous latent variables  42 Exogenous latent variables  42 Formative measurement  52 Heterogeneity 46 Hierarchical component models  59 Higher-order component  60 Higher-order constructs  59 Higher-order models  59 Index 52 Indirect effect  44 Inner model  41

Kurtosis 66 Listwise deletion  62 Lower-order components  60 Mean value replacement  62 Measurement model  41 Mediating effect  44 Missing value treatment  62 Model comparisons  43 Moderation 45 Moderator effect  45 Multigroup analysis  46 Outer model  41 Outlier 64 Pairwise deletion  63 Path model  41 Reflective measurement  51 Scale 52 Second-order constructs  60 Skewness 66 Straight lining  64 Structural model  41 Sum scores  59

Chapter 2  ■  Specifying the Path Model and Examining Data  

83

Suggested Readings Becker, J.-M., Ringle, C. M., & Sarstedt, M. (2018). Estimating moderating effects in PLS-SEM and PLSc-SEM: Interaction term generation*data treatment. Journal of Applied Structural Equation Modeling, 2(2), 1–21. Bollen, K. A. (2011). Evaluating effect, composite, and causal indicators in structural equation models. MIS Quarterly, 35(2), 359–372. Cheah, J.-H., Ting, H., Ramayah, T., Memon, M. A., Cham, T.-H., & Ciavolino, E. (2019). A comparison of five reflective–formative estimation approaches: Reconsideration and recommendations for tourism research. Quality & Quantity, 53(3), 1421–1458. Eberl, M. (2010). An application of PLS in multi-group analysis: The need for differentiated corporate-level marketing in the mobile communications industry. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 487–514). Berlin: Springer. Gregor, S. (2006). The nature of theory in information systems. MIS Quarterly, 30(3), 611–642. Gudergan, S. P., Ringle, C. M., Wende, S., & Will, A. (2008). Confirmatory tetrad analysis in PLS path modeling. Journal of Business Research, 61(12), 1238–1249. Hair, J. F., Hult, G. T. M., Ringle, C. M., Sarstedt, M., & Thiele, K. O. (2017). Mirror, mirror on the wall: A comparative evaluation of composite-based structural equation modeling methods. Journal of the Academy of Marketing Science, 45(5), 616–632. Hair, J. F., Ringle, C. M., & Sarstedt, M. (2011). PLS-SEM: Indeed a silver bullet. Journal of Marketing Theory and Practice, 19(2), 139–151. Klarmann, M., & Feurer, S. (2018). Control variables in marketing research. Marketing ZFP – Journal of Research and Management, 40(2), 26–40. Lohmöller, J.-B. (1989). Latent variable path modeling with partial least squares. Heidelberg: Physica. Matthews, L. (2017). Applying multigroup analysis in PLS-SEM: A step-by-step process. In H. Latan & R. Noonan (Eds.), Partial least squares path modeling: Basic concepts, methodological issues and applications (pp. 219–243). Cham: Springer. Memon, M. A., Cheah, J.-H., Ramayah, T., Ting, H., & Chuah, F. (2018). Mediation analysis: Issues and recommendations. Journal of Applied Structural Equation Modeling, 2(1), i–ix. Memon, M. A., Cheah, J.-H., Ramayah, T., Ting, H., Chuah, F., & Cham, T. H. (2019). Moderation analysis: Issues and guidelines. Journal of Applied Structural Equation Modeling, 3(1), i–ix. (Continued)

84   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued) Nitzl, C., Roldán, J. L., & Cepeda Carrión, G. (2016). Mediation analyses in partial least squares structural equation modeling: Helping researchers to discuss more sophisticated models. Industrial Management & Data Systems, 116(9), 1849–1864. Sarstedt, M., Hair, J. F., Cheah, J.-H., Becker, J.-M., & Ringle, C. M. (2019). How to specify, estimate, and validate higher-order models. Australasian Marketing Journal, 27(3), 197–211. Sarstedt, M., Ringle, C. M., & Hair, J. F. (2017). Partial least squares structural equation modeling. In C. Homburg, M. Klarmann, & A. Vomberg (Eds.), Handbook of market research. Cham: Springer. Schwaiger, M. (2004). Components and parameters of corporate reputation: An empirical study. Schmalenbach Business Review, 56(1), 46–71. Sharma, P. N., Sarstedt, M., Shmueli, G., Kim, K. H., & Thiele, K. O. (2019). PLS-based model selection: The role of alternative explanations in Information Systems research. Journal of the Association for Information Systems, 40(4), 346–397. Wetzels, M., Odekerken-Schröder, G., & van Oppen, C. (2009). Using PLS path modeling for assessing hierarchical construct models: Guidelines and empirical illustration. MIS Quarterly, 33(1), 177–195. Yuan, K.-H., Wen, Y., & Tang, J. (2020). Regression analysis with latent variables by partial least squares and four other composite scores: Consistency, bias and correction. Structural Equation Modeling: A Multidisciplinary Journal, 27(3), 333–350.

Visit the companion site for this book at https://www.pls-sem.net/.

3 PATH MODEL ESTIMATION LEARNING OUTCOMES 1. Learn how the PLS-SEM algorithm functions. 2. Comprehend the options and parameter settings to run the algorithm. 3. Understand the statistical properties of the PLS-SEM method. 4. Explain how to interpret the results. 5. Apply the PLS-SEM algorithm using the SmartPLS software.

CHAPTER PREVIEW This chapter covers Stage 4 of the process on how to apply PLS-SEM. Specifically, we focus on the PLS-SEM algorithm and its statistical properties. A basic understanding of the principles that underlie PLS-SEM, as well as its strengths and weaknesses, is needed to correctly apply the method (e.g., to make decisions regarding software options). Building on these foundations, you will be able to choose the options and parameter settings required to run the PLS-SEM algorithm. After explaining how the PLS path model is estimated, we summarize how to interpret the initial results. These will be discussed in much greater detail in Chapters 4–6. This chapter closes with an application of the PLS-SEM algorithm to estimate results for the corporate reputation example using the SmartPLS software. 85

86   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

STAGE 4: MODEL ESTIMATION AND THE PLS-SEM ALGORITHM How the Algorithm Works The PLS-SEM algorithm was originally developed by Wold (1975, 1982) and later extended, for instance, by Lohmöller (1989), Bentler and Huang (2014), and Dijkstra (2014). The algorithm estimates the path coefficients and other model parameters in a way that maximizes the explained variance of the dependent constructs in a series of partial regressions (i.e., it minimizes their unexplained variance). In doing so, PLS-SEM linearly combines the indicators of each construct’s measurement model to form composite variables (Chapter 1). The composite variables are assumed to be comprehensive representations of the constructs and, therefore, valid proxies of the conceptual variables being examined. For this reason, PLS is considered a composite-based SEM method (Hwang, Sarstedt, Cheah, & Ringle, 2020). In estimating the composite variable scores, the PLSSEM algorithm weights each indicator individually. The outer weights reflect each indicator’s importance for forming the composite. That is, indicators with a higher weight contribute more strongly to forming the composite. In case of reflective measurement models (Chapter 2), the weights are also indicative of each indicator’s degree of measurement error. Indicators with high degrees of measurement error have lower weights, thereby contributing less to forming the composite variable. To start our discussion of the PLS-SEM algorithm, you need to understand the data that are used to run the algorithm. Exhibit 3.1 shows a data matrix for the indicator variables (columns) and observations (rows) in the PLS path model illustrated in Exhibit 3.2. The measured indicator (x) variables (rectangles in the top portion of Exhibit 3.2) are shown in the row at the top of the matrix. The (Y ) constructs (circles in Exhibit 3.2) are shown at the right side of the matrix. For example, there are seven measured indicator variables in this PLSSEM example, and the variables are identified as x1 to x7. Three constructs identified as Y1, Y2, and Y3 are also shown. Note that the Y constructs are not measured variables. The measured x variables are used as raw data input to estimate the scores of Y1, Y2, and Y3 (referred to as construct scores or latent variable scores) EXHIBIT 3.1  ■  Data Matrix for a PLS-SEM Example Case ID

x1

x2

x3

x4

x5

x6

x7

1

x1,1

x2,1

x 3,1

x 4,1

x5,1

x6,1

x7,1

...

...

...

...

...

...

...

389

x 389,1

x 389,2

x 389,3

x 389,4

x 389,5

x 389,6

Y1

Y2

Y3

Y1,1

Y2,1

Y3,1

...

...

...

...

x 389,7

Y389,1

Y389,2

Y389,3

Chapter 3  ■  Path Model Estimation  

87

EXHIBIT 3.2  ■  Path Model and Data for a Hypothetical PLS-SEM Example

x1

w11

x2

w12

Y1

p13

l35 Y3

x3 x4

w23

l36

x6

l37

p23

Y2

x5

x7

w24

Measurement Models (Indicators x, latent variables Y, and relationships, i.e., w or l, between indicators and latent variables) Y1

Y2

Structural Model (Latent variables Y and relationships between latent variables p)

Y3

Y1

Y2

Y3

x1

w11

Y1

p13

x2

w12

Y2

p23

x3

w23

x4

w24

Y3

x5

I35

x6

I36

x7

I37

Source: Henseler, J., Ringle, C. M., & Sarstedt, M. (2012). Using partial least squares path modeling in international advertising research: Basic concepts and recent issues. In S. Okazaki (Ed.), Handbook of research in international advertising (pp. 252–276). Cheltenham, UK: Edward Elgar Publishing. http://www.elgaronline.com/

(e.g., for construct Y1, the scores are data points Y1,1 to Y389,1) as part of solving the PLS-SEM algorithm. A data matrix like the one in Exhibit 3.1 serves as input for indicators in our hypothetical PLS path model (Exhibit 3.2). The data for the indicators of the measurement model might be obtained from responses to survey questions (i.e., survey data) or from secondary data of a company database (e.g., the advertising budget, number of employees, profit). A case identification (ID) for

88   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

the observations is listed in the first column of the data matrix. For example, if your sample includes 389 responses (sample size), then the numbers in this column would be 1 to 389 (assuming you use a number as case ID to identify each respondent/object). To determine the minimum sample size for PLS path model estimation in this example, we draw on Kock and Hadaya’s (2018) inverse square root method. When assuming a minimum path coefficient of 0.15 at a 5% significance level, the required sample size is 275 observations (see Exhibit 1.7 in Chapter 1). Since this hypothetical example uses a data set with 389 observations, we meet the minimum sample size requirement for the PLS path model estimation. The PLS-SEM algorithm estimates all unknown elements in the PLS path model. The upper portion of Exhibit 3.2 shows the PLS path model with three latent variables and seven measured indicator variables. The four indicator variables (x1, x 2, x3, and x4) for the two exogenous constructs (Y1 and Y2) are modeled as formative measures (i.e., relationships from the indicators to the latent variables). In contrast, the three indicator variables (x5, x6, and x7) for the endogenous construct (Y3) are modeled as reflective measures (i.e., relationships from the latent variable to the indicators). This kind of setup for the measurement models is just one example of possible measurement model alternatives. Researchers can select between a reflective and a formative measurement model for every construct. For example, Y1 could be modeled as formative while both Y2 and Y3 could be modeled as reflective (assuming theory supported this change and it was considered in designing the questionnaire). In Exhibit 3.2, the relationships between the measured indicator variables of the formative constructs Y1 and Y2 (i.e., outer weights) are labeled as w11, w12, w23, and w24 (the first number is for the construct and the second number is for the arrow; the w stands for weight). Similarly, the relationships between the measured indicator variables of the reflective construct Y3 (i.e., outer loadings) are labeled as l35, l36, and l37 (the l stands for loading). Outer weights and loadings are initially unknown and are estimated by the PLS-SEM algorithm. Similarly, the relationships between the latent variables (i.e., the path coefficients) in the structural model that are labeled as p (Exhibit 3.2) are also initially unknown and estimated as part of solving the PLS-SEM algorithm. The coefficient p13 represents the relationship from Y1 to Y3, and p23 represents the relationship from Y2 to Y3. In estimating the model parameters (i.e., outer weights, outer loadings, and path coefficients), the PLS-SEM algorithm follows two stages. The first stage involves estimating the construct scores (i.e., the Y columns on the right side of the matrix in Exhibit 3.1) using a procedure that iterates between estimating the measurement and structural model relationships. More specifically, the algorithm starts by linearly combining the indicators to form composites (i.e., the construct scores). These construct scores serve as basis for estimating the structural model relationships (i.e., the path coefficients), which are then used to recompute the construct scores. The recomputed construct scores allow for updating the outer weights in the measurement models. With these newly estimated outer weights,

Chapter 3  ■  Path Model Estimation  

89

the PLS-SEM algorithm returns updated construct scores, and the computations start from the beginning. This process continues until convergence, which is the case when the changes of the relationships between the constructs and their indicators become very small. Thus, the underlying consideration for convergence is relatively simple. If the outer weights do not change anymore, the construct values do not change either. The iterative computation of the structural and measurement model relationships follows a least squares approach, analogous to regression analysis. When estimating the structural model relationships and recomputing the construct scores using the path coefficients, the measurement model relationships are held constant and vice versa. Two considerations are important in the first stage of the PLS-SEM algorithm: 1. The estimation of outer weights used to compute the construct scores. 2. The estimation of the structural model relationships. The outer weights used in the linear combinations of the indicators can come in two forms: Mode A and Mode B. Mode A uses correlation weights between the construct and its indicators. More specifically, the outer weights are the correlation (or bivariate regression) between the construct and each of its indicators. In contrast, Mode B uses regression weights, where the construct is regressed on its indicators. Hence, the outer weights in Mode B are the coefficients of a multiple regression model. The decision whether to use Mode A or Mode B is strongly tied to the measurement model specification. For reflective measurement models, researchers typically use Mode A, which means that the l coefficients (i.e., outer loadings) are used as weights to compute the construct scores. The outer loadings are calculated by running a series of bivariate correlation analyses between the Y construct (e.g., Y3 in Exhibit 3.2.) and each of its indicator variables x (e.g., x5, x6, and x7 in Exhibit 3.2.). In contrast, for formative measurement models, researchers typically use Mode B. In this case, the weights are derived from regressing each (formative) construct on its indicator variables x (e.g., x1 and x 2 in Exhibit 3.2.). The resulting w coefficients are referred to as outer weights. The use of Mode A (i.e., correlations weights) for reflective measurement models and Mode B (i.e., regression weights) for formative measurement models represents the standard approach to estimate the relationships between the constructs and their indicators in PLS-SEM. In special situations, however, researchers may choose to use a different mode when estimating the relationships of reflective and formative measurement models (Rigdon, 2012). For instance, Mode A has advantages over Mode B because it usually results in less collinearity between items. Therefore, a researcher who wants to reduce collinearity issues when estimating the relationships of formative measurement models may use Mode A. Moreover, the selection of the construct scores’ estimation mode can affect the predictive capabilities of the PLS path model (Becker, Rai, & Rigdon, 2013;

90   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Everman & Tate, 2016). For example, Becker, Rai, and Rigdon (2013) show that when constructs are specified formatively, Mode A estimation yields better outof-sample prediction when the model estimation draws on more than 100 observations and when the endogenous construct’s R 2 value is 0.30 or higher. The second important consideration involves the estimation of the structural model relationships. To estimate these relationships, the PLS-SEM algorithm runs a series of partial regression models. Each regression specifies an endogenous construct in the PLS path model as the dependent variable (e.g., Y3 in Exhibit 3.2). This dependent variable’s direct predecessors (i.e., latent variables with a direct relationship leading to the target construct; here, Y1 and Y2) act as the independent variables in a regression model. The resulting (standardized) regression coefficients are referred to as path coefficients and indicate the relative importance of each predecessor construct for explaining the target construct. The approach of using the results of partial regression models for determining the relationships in the structural model is referred to as path weighting scheme. Alternative approaches include the centroid and the factor weighting schemes. The centroid weighting scheme uses a value of +1 or −1 for the relationship between two constructs depending on the sign of its correlation value. Alternatively, the factor weighting scheme uses the correlation between two constructs to determine their relationships. In most instances, the choice of the weighting scheme does not produce considerably different PLS-SEM results. But, in contrast to the centroid and factor weighting schemes, the path weighting scheme does take into account the direction of the structural model relationships. Chin (1998, p. 309) notes that the path weighting scheme “attempts to produce a component that can both ideally be predicted (as a predictand) and at the same time be a good predictor for subsequent dependent variables.” As a result, the path weighting scheme leads to slightly higher R² values in the endogenous latent variables compared to the other schemes and should therefore be preferred. After convergence of the PLS-SEM algorithm in its first stage, the second stage uses the construct scores to calculate the final set of model estimates. These include the outer weights and loadings, the structural model’s path coefficients, and the resulting R² value of the endogenous latent variables, among others. For a detailed description of the PLS-SEM algorithm’s stages and its alternating least squares algorithm, see, for example, Chin (1998), Hwang, Sarstedt, Cheah, and Ringle (2020), Lohmöller (1989), and Tenenhaus, Esposito Vinzi, Chatelin, and Lauro (2005). The illustration of the PLS-SEM algorithm underlines the important role of the nomological network for model estimation. When estimating the model, the PLS-SEM algorithm produces sets of construct scores whose nature depends on the model setup. Adjusting the model setup (e.g., by changing the structural model) will produce different sets of construct scores, which will also change the measurement model estimates. This is also why the measurement model results from a PLS-SEM analysis will differ from a stand-alone assessment of construct measures using principal component analysis (Sarstedt & Mooi, 2019; Chapter 8).

Chapter 3  ■  Path Model Estimation  

91

In the latter case, the model estimation does not draw on a larger net of related constructs, whereas PLS-SEM does. Note that this context-dependency of measurement model estimates sets PLS-SEM apart from CB-SEM, in which the measurement model validation is independent from the nomological network. A recent extension of the PLS-SEM approach is the weighted PLS-SEM (WPLS) algorithm (Becker & Ismail, 2016). This modified version of the original PLS-SEM algorithm enables the researcher to incorporate sampling weights. When estimating a PLS path model, researchers typically seek to draw inferences about the population of interest. An important requirement for such inferences is that the sample is representative of the population. Probability sampling methods such as simple random sampling, stratified sampling, or cluster sampling meet this requirement as every member of a population has an equal probability of being selected in the sample (Sarstedt & Mooi, 2019; Chapter 3). In this case, every observation in the sample would carry the same weight in the PLS-SEM analysis. However, in practice, population members are often not equally likely to be included in the sample, for example, because of the use of non-probability sampling methods such as quota sampling. When this situation is encountered, sampling (post-stratification) weights can be used to obtain unbiased estimates of the population effects (Sarstedt, Bengart, Shaltoni, & Lehmann, 2018). For example, if a population consists of an equal share of males and females but the sample comprises 60% males and 40% females, sampling weights ensure that females are weighted more strongly than males in the analysis. One approach for incorporating sample weighting into PLS-SEM builds on the data of the observed (or manifest) variables and a weighting variable that corrects for the unequal probability of selection and thereby ensures representativeness of the sample. The WPLS algorithm incorporates these weights using weighted correlations and weighted regression results to estimate the PLS path model. As a result, WPLS provides more accurate parameter estimates than the basic PLS-SEM algorithm when appropriate sampling weights are available. For more details on WPLS, guidelines, and applications, see Becker & Ismail (2016) and Cheah, Roldán, Ciavolino, Ting, and Ramayah (2020). To run the PLS-SEM algorithm, users can choose from a range of software programs. A popular early example of a PLS-SEM software program is PLSGraph (Chin, 2003), which is a graphical interface to Lohmöller’s (1987) LVPLS, the first program in the field. Compared with LVPLS, which required the user to enter commands via a text editor, PLS-Graph represents a significant improvement, especially in terms of user-friendliness. However, PLS-Graph and similar PLS-SEM software applications such as VisualPLS (Fu, 2006) have not been further developed in recent years—see Temme, Kreis, and Hildebrandt (2010) for an early PLS-SEM software review. With the increasing dissemination of PLS-SEM in a variety of disciplines, several other programs with user-friendly graphical interfaces were introduced to the market such as Adanco (Henseler, 2017a), SmartPLS (Ringle, Wende, & Will, 2005; Ringle, Wende, & Becker, 2015), WarpPLS (Kock, 2020), and XLSTAT’s PLSPM package. Among these

92   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

alternatives, SmartPLS is the most widely applied, comprehensive, and advanced program available (Sarstedt & Cheah, 2019). Hence, this software serves as the basis for all case study examples in this book. Finally, users with experience in the statistical software environment R can also draw on packages such as csem (Rademaker et al., 2020), seminr (Ray, Danks, Velasquez Estrada, Uanhoro, & Bejar, 2020), and semPLS (Monecke & Leisch, 2012; Monecke & Leisch, 2013), which facilitate flexible analysis of PLS path models. The new R software edition of this book (Hair et al., 2022) illustrates all elements of the corporate reputation case study using the seminr package.

Statistical Properties PLS-SEM aims to maximize the amount of explained variance of the endogenous constructs embedded in a path model, which is grounded in welldeveloped causal explanations. Because of this, PLS is considered a “causalpredictive” approach to SEM (Jöreskog & Wold, 1982, p. 270), which makes the method particularly useful for studies on the sources of competitive advantage and success driver studies (Albers, 2010; Hair, Ringle, & Sarstedt, 2011). From a statistical perspective, PLS-SEM seeks to minimize the combination of bias and error variance, “occasionally sacrificing theoretical accuracy for improved empirical precision” (Shmueli, 2010, p. 293). In contrast, CB-SEM is strictly confirmatory in nature (Chapter 1). As such, the method may achieve a slightly more accurate representation of the underlying theory by focusing only on minimizing bias, but this comes at the expense of lower predictive power (Evermann & Tate, 2016). Unlike CB-SEM, PLS-SEM does not optimize a unique global scalar function. Some scholars have traditionally considered the lack of a global scalar function and the consequent lack of global goodness-of-fit measures drawbacks of PLS-SEM, but we do not take this position. When using PLS-SEM, it is important to recognize that the term fit has different meanings in the contexts of CBSEM and PLS-SEM (Hair, Hollingsworth, Randolph, & Chong, 2017; Rigdon, Sarstedt, & Ringle, 2017). Fit statistics for CB-SEM are derived from the discrepancy between the empirical and the model-implied (theoretical) covariance matrix, whereas PLS-SEM focuses on the discrepancy between the observed (in the case of manifest variables) or approximated (in the case of latent variables) values of the dependent variables and the values predicted by the model in question (Hair, Sarstedt, & Ringle, 2019; Henseler et al., 2014). While researchers have proposed various model fit measures for PLS-SEM (Schuberth, Henseler, & Dijkstra, 2018; Tenenhaus et al., 2005), their efficacy for identifying misspecified models is highly limited (see Exhibit 6.2 for a discussion of the measures and their limitations). As a consequence, to judge the model’s quality researchers using PLS-SEM rely on alternative measures that assess the model’s predictive capabilities (Shmueli, Ray, Velasquez Estrada, & Chatla, 2016; Shmueli et al., 2019), both in-sample and out-of-sample (Hair, 2020). These measures are

Chapter 3  ■  Path Model Estimation  

93

introduced in Chapters 4–6 on the evaluation of measurement models and the structural model. PLS-SEM is nonparametric in nature. This means the method does not make any assumptions regarding the distribution of the data or, more precisely, the residuals, as it is the case in regular regression analysis (Sarstedt & Mooi, 2019; Chapter 7). This property has important implications for testing the significance of the model coefficients (e.g., path coefficients) as the technique does not assume any specific distribution. Instead, the researcher has to derive a distribution from the data using bootstrapping, which is then relied on as basis for significance testing (Chapter 5). One of the most important features of PLS-SEM relates to the nature of the construct scores. CB-SEM estimates the model parameters without using any case values of the latent variable scores; that is, the scores are indeterminate (Guttman, 1955). In contrast, the PLS-SEM algorithm uses the indicators to compute composite variables, which contain the construct scores—for example, the construct score of Y3 in Exhibit 3.2 as a linear combination of the indicator variables x5, x6, and x7. The PLS-SEM algorithm treats these scores as perfect substitutes for the indicator variables and therefore uses all the variance from the indicators that can help to explain the endogenous constructs. This is because the PLS-SEM approach is based on the assumption that all the measured variance in the model’s indicator variables is potentially useful and should be included in estimating the construct scores. However, indicator variables always involve some degree of measurement error, which is therefore also present in the latent variable scores and is ultimately reflected in the model estimates. This error in the latent variable scores, despite small, does produce a slight bias in the model estimates. The result is the path model relationships are slightly underestimated, while the parameters for the measurement models typically are overestimated compared with CB-SEM. This characteristic has in the past been misinterpreted as the PLS-SEM bias. It should be noted, however, that in this context inconsistency does not imply the results of the PLS-SEM algorithm are actually biased (e.g., Marcoulides & Chin, 2013; Rigdon, Sarstedt, & Ringle, 2017; Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016). Instead, it means the structural and measurement model relationships in PLS-SEM are not the same as those of CB-SEM as a result of PLS-SEM’s different handling of latent variables (Chapter 1). Only when the number of observations and the number of indicators per latent variable increase to infinity will the PLS-SEM estimates be the same as in CB-SEM. This characteristic has commonly been described as consistency at large (Hui & Wold, 1982; Lohmöller, 1989). Infinity is a really large number and implies that this difference never fully disappears. However, simulation studies show that the difference between PLS-SEM and CB-SEM is very small when the measurement models meet minimum recommended quality standards (e.g., Henseler et al., 2014; Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016) and plays virtually no role in most empirical settings. Simulation studies also show that the differences in results between alternative

94   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

composite model estimation methods such as GSCA, PLS-SEM, and sum score regression are relatively small (Cho, Hwang, Sarstedt, & Ringle, 2020; Hair, Hult, Ringle, Sarstedt, & Thiele, 2017). In contrast, simulation studies also demonstrate that CB-SEM results can become extremely inaccurate when estimating composite model data, while PLS-SEM produces accurate estimates, even when assuming a common factor model (Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016). The latter especially holds when the number of constructs and structural model relationships (i.e., the model complexity) is high and sample size is low, a situation in which the bias produced by CB-SEM is oftentimes quite substantial, particularly when distributional assumptions are violated. Recent research has brought forward the consistent PLS-SEM (PLSc-SEM) method (Bentler & Huang, 2014; Dijkstra, 2014), a variation of the original PLSSEM approach. Simulations studies (e.g., Dijkstra & Henseler, 2015a, 2015b) show that PLSc-SEM and CB-SEM produce highly similar results in a variety of model constellations. Hence, PLSc-SEM is capable of mimicking CB-SEM well. However, while maintaining some of the general PLS-SEM advantages, PLSc-SEM is subject to similar problems that have been noted for CB-SEM, such as inferior robustness and highly inaccurate results in certain configurations (Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016). More severely, the correction factor introduced by the PLSc-SEM algorithm makes it very difficult to assess a model’s out-of-sample predictive power. We discuss PLSc-SEM in greater detail in the final section of Chapter 8. Finally, model estimates produced by PLS-SEM generally exhibit higher levels of statistical power than CB-SEM (Reinartz, Haenlein, & Henseler, 2009; Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016), even when the data originate from a common factor model. Consequently, PLS-SEM is better at identifying population relationships and is therefore more suitable for exploratory research purposes—a feature that is further supported by the less restrictive requirements of PLS-SEM in terms of model setups, model complexity, and data characteristics (Chapter 1).

Algorithmic Options and Parameter Settings to Run the Algorithm To estimate a PLS path model, algorithmic options and parameter settings must be selected. The algorithmic options and parameter settings include the choice between Mode A and B for estimating the construct scores. PLS-SEM software applications like SmartPLS typically use Mode A for reflective measurement models and Mode B for formative measurement models as the default setting. This setting is usually used by the researcher and only in certain situations is the mode setting selected differently. Sum scores is another alternative for computing the construct scores, whose use should however be avoided (Chapter 2). Further algorithmic options and parameter settings include selecting the structural model weighting scheme, the data metric, initial values to start the

Chapter 3  ■  Path Model Estimation  

95

PLS-SEM algorithm, the stop criterion, and the maximum number of iterations. As indicated above, PLS-SEM allows the user to apply three structural model weighting schemes: (1) the centroid weighting scheme, (2) the factor weighting scheme, and (3) the path weighting scheme. While the results differ little across the different approaches, the path weighting scheme is the recommended approach (Chin, 1998). The PLS-SEM algorithm draws on standardized latent variable scores. Thus, PLS-SEM applications must use standardized data for the indicators (more specifically, z-standardization, where each indicator has a mean of 0 and the variance is 1) as input for running the algorithm. As a consequence, the partial regressions used in the course of the model estimations have a zero intercept. This raw data transformation is the recommended option (and automatically supported by available software packages such as SmartPLS) when starting the PLS-SEM algorithm. When running the PLS-SEM method, the software package standardizes both the raw data of the indicators and the latent variable scores. As a result, the algorithm calculates standardized coefficients approximately between −1 and +1 for every relationship in the structural model and the measurement models. For example, path coefficients close to +1 indicate a strong positive relationship (and vice versa for negative values). The closer the estimated coefficients are to 0, the weaker the relationships. Very low values close to zero often are not statistically significant. Checking for significance of relationships is part of evaluating and interpreting the results discussed in Chapters 4–6. When regressing Y3 on Y1 and Y2 in the structural model, the standardized regression coefficients p13 and p23 may have values of 0.6 and 0.2, indicating that Y1 has a higher relative importance in explaining Y3. The result for the standardized coefficient p13 is interpreted as follows: If the other construct Y2 is kept constant and Y1 is increased by one standard deviation unit, Y3 increases by 0.6 standard deviation units. The other standardized coefficient p23 is interpreted analogously. The relationships in the measurement model require initial values to start the PLS-SEM algorithm. For the first iteration, any linear combination of indicators can serve as values for the latent variable scores. In practice, equal weights are a good choice for the initialization of the PLS-SEM algorithm. Computer programs such as SmartPLS therefore set all weights to +1 in the PLS-SEM algorithm’s first iteration. This setting, however, requires that all indicators have the same orientation. For example, higher indicator values indicate a higher degree of agreement in case of Likert scales. If this does not apply to all indicators of a measurement model, for example because of the use of reverse-scaled indicators (Weijters & Baumgartner, 2012), researchers need to rescale the indicators before the PLS-SEM analysis (e.g., in spreadsheet applications such as Excel). If an indicator is rescaled (e.g., the highest value becomes the lowest and vice versa) to ensure the same orientation of all indicators in a measurement model, it is important to also change the label of the rescaled item accordingly (e.g., from a negative to a positive wording).

96   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

The final parameter setting to select is the algorithm’s stopping criterion. The PLS-SEM algorithm is designed to run until the results stabilize. Stabilization is reached when the sum of changes in the outer weights between two iterations is sufficiently low, which means it drops below a predefined limit. A threshold value (i.e., stop criterion) of 1 · 10 –7 (i.e., 0.0000001) is recommended to ensure that the PLS-SEM algorithm converges at reasonably low levels of iterative changes in the latent variable scores. One must ensure, however, that the algorithm stops at the predefined stop criterion. Thus, a sufficiently high maximum number of iterations must be selected. Since the algorithm is very efficient (i.e., it converges after a relatively low number of iterations even with complex models), the selection of a maximum number of 300 iterations should ensure that convergence is obtained at the stop criterion of 1 · 10 –7. Prior research has shown that the PLS-SEM algorithm almost always converges (Hanafi, 2007; Hanafi, Dolce, & El Hadri, 2021). Only under very extreme and artificial conditions, which very seldom occur in practice, is it possible that the algorithm does not converge, which has, however, practically no implications for the results (Henseler, 2010). Exhibit 3.3 summarizes guidelines for initializing the PLS-SEM algorithm. EXHIBIT 3.3  ■  Rules of Thumb for Initializing the PLS-SEM Algorithm •• Use Mode A for reflective measurement models and Mode B for formative measurement models; do not determine the construct scores by using sum scores. •• Select the path weighting scheme as the weighting method. •• Use +1 as the initial value for all outer weights, provided that all indicators have the same orientation. Indicators with a different orientation should be rescaled and relabeled prior to the PLS-SEM analysis. •• Choose a stop criterion of 1 · 10–7 (i.e., 0.0000001). •• Select a value of at least 300 for the maximum number of iterations. •• If you are using survey data and a weighting variable is available, use it with the WPLS algorithm to ensure the results’ representativeness.

Results When the PLS-SEM algorithm converges, the final outer weights are used to compute the final latent variable scores. Then, these scores serve as input to run ordinary least squares regressions to determine final estimates for the path relationships in the structural model. PLS-SEM always provides the outer loadings and outer weights, regardless of the measurement model setup. With reflectively measured constructs, the outer loadings are single regression results with a particular indicator in the measurement model as a dependent variable

Chapter 3  ■  Path Model Estimation  

97

(e.g., x5 in Exhibit 3.2) and the construct as an independent variable (e.g., Y3 in Exhibit 3.2). Because of the data standardization, the regression result corresponds to the bivariate correlation between each indicator and its construct. In contrast, with formatively measured constructs, the outer weights are resulting beta coefficients of a multiple regression with the construct as a dependent variable (e.g., Y1 in Exhibit 3.2) and the indicators as independent variables (e.g., x1 and x 2 in Exhibit 3.2). The outer loadings and outer weights are computed for all measurement model constructs in the PLS path model. However, outer loadings are primarily associated with the results for the relationships in reflective measurement models, and outer weights are associated with the results for the relationships in formative measurement models. The estimations for the paths between the latent variables in the structural model are reported as standardized coefficients. In the partial regression models of the structural model, an endogenous latent variable (e.g., Y3 in Exhibit 3.2) serves as the dependent variable while its direct predecessors serve as independent variables (e.g., Y1 and Y2 in Exhibit 3.2). In addition to the coefficients from the estimation of the partial regression models in the structural model (one for each endogenous latent variable), the output includes the R² values of each endogenous latent variable in the structural model. The R² values are usually between 0 and +1 and represent the amount of explained variance in the construct. For example, an R² value of 0.70 for the construct Y3 in Exhibit 3.2 means that 70% of this construct’s variance is explained by the exogenous latent variables Y1 and Y2. The goal of the PLS-SEM algorithm is to maximize the R² values of the endogenous latent variables and thereby their explanation for a given model. Additional criteria must be evaluated to fully understand the results of the PLS-SEM algorithm. These additional criteria are explained in detail in Chapters 4–6.

CASE STUDY ILLUSTRATION—PLS PATH MODEL ESTIMATION (STAGE 4) To illustrate and explain PLS-SEM, we will use a single data set throughout the book and the SmartPLS 3 software (Ringle, Wende, & Becker, 2015). The data set is taken from prior research on the interplay of corporate reputation, customer satisfaction, and customer loyalty, as introduced in Chapter 2. The data set (i.e., the Corporate reputation data.csv file) and the ready-to-use SmartPLS project (i.e., the Corporate Reputation.zip file) for this case study are available at https://www.pls-sem.net/.

Model Estimation To estimate the corporate reputation model in SmartPLS, you need to create a new project, import the indicator data (i.e., Corporate reputation data.csv),

98   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

and draw the model as explained in the case study of Chapter 2. Alternatively, you can import the SmartPLS project from a backup file (i.e., Corporate Reputation.zip). The procedures to import projects into the SmartPLS software and to open a model are explained in Chapter 2 (i.e., you see the PLS path model in the software). Before estimating the PLS path model, you may want to adjust the estimation mode of the outer weights used to compute the construct scores. The default setting for computing the construct scores is Mode A for reflective measurement models and Mode B for formative measurement models. However, by doubleclicking on a latent variable, a dialog box opens, which allows you to change this default setting. However, the default setting rarely needs to be adjusted. Next, you need to navigate to Calculate → PLS Algorithm, which you can find at the top of the SmartPLS screen. The menu that shows up provides several algorithms to select from. For estimating the PLS path model, the PLS Algorithm is the one to choose. Alternatively, you can left-click on the wheel symbol in the toolbar labeled Calculate. After selecting the PLS Algorithm function, the dialog box in Exhibit 3.4 appears. The PLS-SEM algorithm needs three basic parameter settings to run it. The Path Weighting Scheme is selected for estimation of the inner weights. The PLS-SEM algorithm stops when the maximum number of 300 iterations or the stop criterion of 1.0E-7 (i.e., 0.0000001) has been reached. For all measurement model relationships, SmartPLS uses a default value of 1.0 to initialize the PLS-SEM algorithm. Alternatively, one can configure a specific weight for each indicator in the model (left-click on the option Configure individual initial weights). In this example, we use the default settings as shown in Exhibit 3.4.

EXHIBIT 3.4  ■  PLS-SEM Algorithm Settings

Chapter 3  ■  Path Model Estimation  

99

EXHIBIT 3.5  ■  Missing Values Dialog Box

Next, click on the Missing Values tab at the top of the dialog box (Exhibit 3.5). Please note that this tab appears only when you have specified the coding of missing values (e.g., −99) in the Data View of the data set that you selected for the model estimation (Chapter 2). The Missing Values tab in Exhibit 3.5 shows the number of missing values in the selected data set and alternative missing value treatment options in a menu (e.g., Mean Replacement). None of the indicator variables in the simple model has more than 5% missing values (specifically, the maximum number of missing values [four missing values; 1.16%] is in cusl_2; Chapter 2). Thus, use the mean value replacement option by selecting the corresponding option. Finally, in the Weighting tab, you can specify a weighting variable (labeled Weighting Vector in SmartPLS), if available in your data set, which assigns each observation a different importance in the WPLS estimation based on some criterion. In the corporate reputation example, we do not specify a weighting variable, so proceed by clicking on Start Calculation at the bottom of the dialog box. Occasionally, the algorithm does not start, and a message appears indicating a singular data matrix problem. There are two potential reasons for this issue. First, an indicator is a constant (i.e., the indicator has the same value such as “1” for all responses) and thus has zero variance. Second, an indicator is entered twice or is a linear combination of another indicator (e.g., one indicator is a multiple of another such as sales in units and sales in thousands of units). Under these circumstances, PLS-SEM cannot estimate the model, and the researcher has to modify the model by excluding the problematic indicator(s).

Estimation Results After the estimation of the model, SmartPLS opens the results report per default (note that you can turn this option off by choosing an alternative option for After Calculation at the bottom of the PLS Algorithm dialog box). At the bottom of the results report, you can select several results tables divided into four categories (Final Results, Quality Criteria, Interim Results, and Base Data).

100   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

For example, under Final Results, you can find the Outer Loadings and Outer Weights tables. It is important to note that the results for outer loadings and outer weights are provided by the software for all measurement models, regardless of whether they are reflective or formative. If you have reflective measurement models, you interpret the outer loadings results. In contrast, if you have formative measurement models, then you primarily interpret the outer weights results (the outer loadings also provide a means to evaluate formative measurement models; Chapter 5). Note the options Hide Zero Values (which is switched on as default in SmartPLS), Increase Decimals, and Decrease Decimals in the menu bar on top. Navigate to Edit Preferences in the SmartPLS menu if you would like to permanently change the number of decimals displayed in the results report (e.g., to 2 decimal places). The default report provides various other results menus of which the Stop Criterion Changes under Interim Results is of initial interest. Here we can see the number of iterations the PLS-SEM algorithm ran. We will discuss these and further menus of the results report in later chapters. Exhibit 3.6 shows the results report for the path coefficients in matrix format. The table reads from the row to the column. For example, the value 0.504 in the CUSA row and the CUSL column is the standardized path coefficient of the relationship from CUSA to CUSL. Clicking on the Path Coefficients tab right above the table opens a bar chart (Exhibit 3.7), which visualizes the path coefficients for each model relationship. Each bar’s height represents the strength of the relationship, which is displayed at the bottom of each bar. To get an initial overview of the results, you can switch from the results report view to the Modeling Window view by left-clicking the tab labeled Simple Model. Initially, SmartPLS provides three key results in the Modeling Window. These are

EXHIBIT 3.6  ■  Path Coefficients Report (Matrix Format)

Chapter 3  ■  Path Model Estimation  

101

EXHIBIT 3.7  ■  Path Coefficients Report (Bar Chart)

(1) the outer loadings and/or outer weights for the measurement models, (2) the path coefficients for the structural model relationships, and (3) the R² values of the endogenous constructs CUSA and CUSL (Exhibit 3.8). The structural model results enable us to determine, for example, that CUSA has the strongest effect on CUSL (0.504), followed by LIKE (0.342) and COMP (0.009). Moreover, the three constructs explain 56.2% of the variance of the endogenous construct CUSL (R² = 0.562), as indicated by the value in the circle. COMP and LIKE also jointly explain 29.5% of the variance of CUSA (R² = 0.295). In addition to examining the sizes of the path coefficients, we must also determine if they are statistically significant. Based on their sizes, it would appear that the relationships CUSA → CUSL and LIKE → CUSL are significant. But it seems very unlikely that the hypothesized path relationship COMP → CUSL (0.009) is significant. As a rule of thumb, for sample sizes of about 500 and higher, path coefficients with standardized values above 0.20 are usually significant, and those with values below 0.10 are usually not significant. Nevertheless, making definite statements about a path coefficient’s significance requires determining the coefficient estimates’ standard error, which is part of more detailed structural model results evaluations presented in Chapter 6. SmartPLS offers you further options to display estimation results in the Modeling Window using the Calculation Results box at the bottom left of the screen (Exhibit 3.9). Here, you can switch between different types of parameter estimates for the constructs, the inner (i.e., structural) model and the outer (i.e., measurement) models. For example, by clicking on the menu next to Constructs, you can switch between the statistics Average Variance Extracted (AVE), Composite

102

like_3

like_2

like_1

comp_3

comp_2

comp_1

0.879 0.870 0.843

0.858 0.798 0.818

LIKE

COMP

EXHIBIT 3.8  ■  PLS-SEM Results

0.424

cusa

0.162

1.000

0.342

CUSA

0.295

0.009

0.504 CUSL

0.562

0.833 0.917 0.843

cusl_3

cusl_2

cusl_1

Chapter 3  ■  Path Model Estimation  

103

EXHIBIT 3.9  ■  Calculation Results Box

Reliability, Cronbach’s Alpha, R Square, R Square Adjusted, and rho_A, which we will discuss in the following chapters. Depending on the selected setting, SmartPLS will show the corresponding statistic in the center of the constructs in the Modeling Window. Note that you can also left-click on the menu to select it and then use the upwards and downwards arrow keys on your keyboard to quickly switch between the alternative results options displayed for Constructs. Similarly, you can switch between different types of parameter estimates in the inner model or the outer models. Finally, by clicking on the menu next to Highlight Paths, you can highlight the estimated relationships in the inner model (i.e., structural model) and outer models (i.e., measurement models). Based on the size of their estimated coefficients, the paths appear as thicker or thinner lines in the SmartPLS Modeling Window. On the basis of the path coefficient estimates and their significance, you can determine whether the conceptual model along with its theoretical hypotheses are substantiated empirically. Moreover, by examining the relative sizes of the significant path relationships, it is possible to make statements about the relative importance of the exogenous latent variables in predicting an endogenous latent variable. In our simple example, CUSA and LIKE are both moderately strong predictors of CUSL, whereas COMP does not predict CUSL at all.

104   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Summary •

Learn how the PLS-SEM algorithm functions. The PLS-SEM algorithm uses the empirical data for the indicators and iteratively determines the construct scores, the path coefficients, indicator loadings and weights, and further statistics such as R2 values. After determining the scores for every construct, the algorithm estimates all remaining unknown relationships in the PLS path model. The algorithm first obtains the measurement model results, which are the relationships between the constructs and their indicator variables. Then, the algorithm calculates the path coefficients, which are the relationships between the constructs in the structural model, along with the R² values of endogenous constructs. All results are standardized, meaning that, for example, path coefficients can be compared with each other.



Comprehend the options and parameter settings to run the algorithm. To apply the PLS-SEM algorithm, researchers need to specify several parameter settings. The decisions include selecting the structural model weighting scheme, initial values to start the PLS-SEM algorithm, the stop criterion, and the maximum number of iterations. The path weighting scheme maximizes the R² values of the endogenous constructs, so that option should be selected. Finally, the initial values (e.g., +1) for the relationships in the measurement model, the stop criterion (a small number such as 1 · 10–7 ), and a sufficiently large maximum number of iterations (e.g., 300) should be selected. The PLSSEM algorithm runs until convergence is achieved or the maximum number of iterations has been reached. The resulting construct scores are then used to estimate all partial regression models in the structural model and the measurement models to obtain the final model estimates.



Understand the statistical properties of the PLS-SEM method. PLS-SEM is an ordinary least squares regression-based method, which implies that most of the statistical properties known from ordinary least squares regression also apply to PLS-SEM. The PLS-SEM algorithm aims to maximize the amount of explained variance of the endogenous constructs embedded in a path model, which is grounded in well-developed causal explanations. The first key results of the PLS path model estimation are the construct scores. These scores are treated as perfect substitutes for the indicator variables in the measurement models and therefore use all the variance that can help explain the endogenous constructs. Moreover, they facilitate estimating all relationships in the PLS path model. The estimation of these relationships is, however, subject to what has been mistakenly referred to as the PLS-SEM bias, which means that measurement model results are usually overestimated while structural model results are usually underestimated compared with CB-SEM results. However, under conditions commonly encountered in

Chapter 3  ■  Path Model Estimation  

105

research situations, this inconsistency is quite small. Moreover, the parameter estimation efficiency of PLS-SEM delivers high levels of statistical power compared with CB-SEM. Consequently, PLS-SEM better identifies population relationships and is better suited for exploratory research purposes—a feature that is further supported by the method’s less restrictive requirements in terms of model setups, model complexity, and data characteristics. •

Explain how to interpret the results. The PLS-SEM method estimates the standardized outer loadings, outer weights, and structural model path coefficients. The loadings and weights are computed for any measurement model in the PLS path model. When reflective measurement models are used, the researcher interprets the outer loadings, whereas outer weights are the primary criterion when formative measurement models are interpreted (note, however, that the loadings also play a role in formative measurement model assessment). For the structural model, the standardized coefficients of the relationships between the constructs are provided as well as the R² values for the endogenous constructs. Further advanced PLS-SEM evaluation criteria used to assess the results are introduced in Chapters 4–6.



Apply the PLS-SEM algorithm using the SmartPLS software. The corporate reputation example and the empirical data available with this book enable you to apply the PLS-SEM algorithm using the SmartPLS software. Selected menu options guide the user in choosing the algorithmic options and parameter settings required for running the PLS-SEM algorithm. The SmartPLS results reports enable the user to check if the algorithm converged (i.e., the stop criterion was reached and not the maximum number of iterations) and to evaluate the initial results for the outer weights, outer loadings, structural model path coefficients, and R² values. Additional diagnostic measures for more advanced analyses are discussed in later chapters.

Review Questions 1. Describe how the PLS-SEM algorithm functions. 2. Explain the parameter settings and algorithmic options that you would use (e.g., stop criterion, weighting scheme). 3. When do you use the weighted PLS (WPLS) algorithm? 4. What are the key results provided after convergence of the PLS-SEM algorithm?

106   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Critical Thinking Questions 1. Discuss the assumptions underlying the so called PLS-SEM bias. 2. Are CB-SEM results also subject to biases? Discuss this topic by specifically referring to factor- and composite-based data. 3. When do you use Mode A or Mode B for the estimation of construct scores in PLS-SEM? 4. Explain what factor determinacy means and why this feature makes the PLS-SEM method particularly useful for prediction.

Key Terms Algorithmic options  94 Centroid weighting scheme  90 Composite-based SEM method  86 Consistency at large  93 Construct scores  86 Constructs 101 Convergence 96 Correlation weights  89 Data matrix  99 Endogenous constructs  88 Exogenous constructs  88 Factor weighting scheme  90 Formative measures  88 Indicators 87 Initial values  95 Latent variables (endogenous, exogenous) 88 Latent variable scores  86 Maximum number of iterations  96 Measurement model  87 Mode A  89 Mode B  89

Model complexity  94 Outer loadings  88 Outer weights  88 Parameter settings  94 Path coefficients  90 Path weighting scheme  90 PLS-SEM algorithm  86 PLS-SEM bias  93 Prediction 90 R² value  90 Raw data  86 Reflective measure  88 Regression weights  89 Secondary data  87 Singular data matrix  99 Standardized data  95 Stop criterion  96 Structural model  88 Sum scores  94 Weighted PLS-SEM (WPLS)  91 Weighting scheme  90

Suggested Readings Becker, J.-M., & Ismail, I. R. (2016). Accounting for sampling weights in PLS path modeling: Simulations and empirical examples. European Management Journal, 34(6), 606–617.

Chapter 3  ■  Path Model Estimation  

107

Cheah, J.-H., Roldán, J. L., Ciavolino, E., Ting, H., & Ramayah, T. (2020). Sampling weight adjustments in partial least squares structural equation modeling: Guidelines and illustrations. Total Quality Management & Business Excellence, forthcoming. Evermann, J., & Tate, M. (2016). Assessing the predictive performance of structural equation model estimators. Journal of Business Research, 69(10), 4565–4582. Hair, J. F., Ringle, C. M., & Sarstedt, M. (2011). PLS-SEM: Indeed a silver bullet. Journal of Marketing Theory and Practice, 19(2), 139–151. Hair, J. F., Sarstedt, M., & Ringle, C. M. (2019). Rethinking some of the rethinking of partial least squares. European Journal of Marketing, 53(4), 566–584. Hanafi, M., Dolce, P., & El Hadri, Z. (2021). Generalized properties for Hanafi–Wold’s procedure in partial least squares path modeling. Computational Statistics, 36, 603–614. Henseler, J., Ringle, C. M., & Sarstedt, M. (2012). Using partial least squares path modeling in international advertising research: Basic concepts and recent issues. In S. Okazaki (Ed.), Handbook of research in international advertising (pp. 252–276). Cheltenham, UK: Edward Elgar. Hwang, H., Sarstedt, M., Cheah, J. H., & Ringle, C. M. (2020). A concept analysis of methodological research on composite-based structural equation modeling: Bridging PLSPM and GSCA. Behaviormetrika, 47(1), 219–241. Jöreskog, K. G., & Wold, H. (1982). The ML and PLS techniques for modeling with latent variables: Historical and comparative aspects. In K. G. Jöreskog & H. Wold (Eds.), Systems under indirect observation, Part I (pp. 263–270). Amsterdam: North-Holland. Lohmöller, J. B. (1989). Latent variable path modeling with partial least squares. Heidelberg: Physica. Marcoulides, G. A., & Chin, W. W. (2013). You write but others read: Common methodological misunderstandings in PLS and related methods. In H. Abdi, W. W. Chin, V. Esposito Vinzi, G. Russolillo, & L. Trinchera (Eds.), New perspectives in partial least squares and related methods (pp. 31–64). New York, NY: Springer. Rigdon, E. E. (2012). Rethinking partial least squares path modeling: In praise of simple methods. Long Range Planning, 45(5–6), 341–358. Rigdon, E. E., Becker, J.-M., & Sarstedt, M. (2019). Factor indeterminacy as metrological uncertainty: Implications for advancing psychological measurement. Multivariate Behavioral Research, 54(3), 429–443. Rigdon, E. E., Sarstedt, M., & Ringle, C. M. (2017). On comparing results from CB-SEM and PLS-SEM. Five perspectives and five recommendations. Marketing ZFP, 39(3), 4–16. Sarstedt, M., & Cheah, J. H. (2019). Partial least squares structural equation modeling using SmartPLS: A software review. Journal of Marketing Analytics, 7(3), 196–202. (Continued)

108   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued) Sarstedt, M., Hair, J. F., Ringle, C. M., Thiele, K. O., & Gudergan, S. P. (2016). Estimation issues with PLS and CBSEM: Where the bias lies! Journal of Business Research, 69(10), 3998–4010. Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y.-M., & Lauro, C. (2005). PLS path modeling. Computational Statistics & Data Analysis, 48(1), 159–205. Wold, H. (1982). Soft modeling: The basic design and some extensions. In K. G. Jöreskog & H. Wold (Eds.), Systems under indirect observation, Part II (pp. 1–54). Amsterdam: North-Holland.

Visit the companion site for this book at https://www.pls-sem.net/.

4 ASSESSING PLS-SEM RESULTS—PART I Evaluation of the Reflective Measurement Models

LEARNING OUTCOMES 1. Gain an overview of Stage 5 of the process for using PLS-SEM, which deals with the evaluation of measurement models. 2. Describe Stage 5a: evaluating reflectively measured constructs. 3. Use the SmartPLS software to assess reflectively measured constructs in the corporate reputation example.

CHAPTER PREVIEW Having learned how to create and estimate a PLS path model, we now focus on understanding how to assess the quality of the results. Initially, we summarize the primary criteria that are used for PLS path model evaluation and their systematic usage. Then, we focus on the evaluation of reflective measurement models. The PLS path model of corporate reputation is a practical application enabling you to review the relevant measurement model evaluation criteria and the appropriate

109

110   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

reporting of results. This discussion provides a foundation for the overview of formative measurement models in Chapter 5 and how to evaluate structural model results, which is covered in Chapter 6.

OVERVIEW OF STAGE 5: EVALUATION OF MEASUREMENT MODELS Model estimation delivers empirical measures of the relationships between the indicators and the constructs (measurement models) as well as between the constructs (structural model). The estimates enable us to evaluate the quality of the measures and assess whether the model provides satisfactory results in explaining and predicting the target constructs. The model evaluation follows a two-step process, as shown in Exhibit 4.1. The process involves separate assessments of the measurement models (Stage 5 of the procedure for using PLS-SEM) and the structural model (Stage 6). The PLS-SEM results assessment initially focuses on the measurement models (e.g., Chin, 2010; Roldán, & Sánchez-Franco, 2012; Tenenhaus, EXHIBIT 4.1  ■  Systematic Evaluation of PLS-SEM Results Stage 5: Evaluation of the Measurement Models Stage 5a: Reflective Measurement Models

Stage 5b: Formative Measurement Models

•• Indicator reliability

•• Convergent validity

•• Internal consistency (Cronbach’s alpha, composite reliability ρC, reliability coefficient ρA)

•• Collinearity between indicators •• Significance and relevance of outer weights

•• Convergent validity (average variance extracted) •• Discriminant validity (HTMT)

Stage 6: Evaluation of the Structural Model •• Collinearity (VIF) •• Significance and relevance of the structural model relationships (path coefficients) •• Explanatory power (coefficients of determination; R²) •• Predictive power (PLSpredict procedure) •• Model comparisons

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

111

Esposito Vinzi, Chatelin, & Lauro, 2005). Hair, Howard, and Nitzl (2020) summarize the process and evaluations at this stage under the term confirmatory composite analysis (CCA). Examination of PLS-SEM estimates enables the researcher to evaluate both the reliability and validity of the construct measures (Exhibit 4.1). Specifically, measurement typically involves using several variables (i.e., multi-items) to measure a construct. An example is the customer loyalty (CUSL) construct described in the PLS-SEM corporate reputation model, which we discussed earlier. The logic of using multiple items as opposed to single items for construct measurement is that the measure will be more accurate. Accuracy increases because using several indicators to measure a single concept reduces the overall degree of measurement error inherent in the indicators. In addition, a multi-item measurement is more likely to represent the different aspects of the concept. This is particularly true for measuring complex concepts like trust, satisfaction, commitment, and so forth. Nevertheless, even when multiple items are used to measure concepts, the measurement is very likely to contain some degree of measurement error. There are many sources of measurement error in social sciences research, including poorly worded questions in a survey, misunderstanding of the scaling approach, and incorrect application of a statistical method, all of which lead to random and systematic errors. The objective is to reduce the measurement error as much as possible. Multi-item measures enable researchers to more precisely identify and reduce measurement error, thereby accounting for it in the research findings. Measurement error is the difference between the true value of a variable and the value obtained by a measurement. Specifically, the measured value xm equals the true value xt plus a measurement error. The measurement error (e = εr + εs) can have a random source (random error εr ), which threatens reliability, or a systematic source (systematic error εs ) that threatens validity. This relationship can be expressed as follows: xm = xt + εr + εs. In Exhibit 4.2, we explain the difference between reliability and validity by comparing a set of three targets. In this analogy, repeated measurements (e.g., of a customer’s satisfaction with a specific product or service) are compared to arrows shot at a target. To measure each true score, we have five measurements (indicated by the black dots). The average value of the dots is indicated by a cross (x). Validity is indicated when the cross is close to the bull’s-eye at the target center. The closer the average value (black cross in Exhibit 4.2) to the true score, the lower the systematic error and the higher the validity. If several arrows are fired, reliability is the distances between the dots showing where the arrows hit the target. If random error is low, all the dots are close together (i.e., they do not vary much), the measure is reliable, even though the dots are not necessarily near the bull’s-eye. This corresponds to the upper left box, where we have a

112   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

scenario in which the measure is reliable but not valid. In the upper right box, both reliability and validity are shown. In the lower left box, though, we have a situation in which the measure is neither reliable nor valid. That is, the repeated measurements (dots) are scattered quite widely, and the average value (cross) is not close to the bull’s-eye. Even if the average value would match the true score (i.e., if the cross were in the bull’s-eye), we would still not consider the measure valid. The reason is that an unreliable measure cannot be valid, because there is no way we can distinguish the systematic error from the random error (Sarstedt & Mooi, 2019; Chapter 3). If we repeat the measurement, say, five more times, the random error would likely shift the cross to a different position. Thus, reliability is a necessary condition for validity. This is also why the not reliable but valid scenario in the lower right box is not possible. When evaluating the measurement models, we must distinguish between reflectively and formatively measured constructs (Chapter 2). The two approaches are based on different concepts and therefore require consideration of different evaluative measures. A reflective measurement model is assessed based on indicator reliability, internal consistency reliability, convergent validity, and

Not reliable

Reliable

EXHIBIT 4.2  ■  Comparing Reliability and Validity

Not applicable

Not valid

Valid

Source: Sarstedt, M., & Mooi, E. A. (2019). A concise guide to market research (3rd ed., p. 36). Berlin: Springer. With kind permission of Springer Science + Business Media.

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

113

discriminant validity. The reflective measurement model evaluation criteria cannot be universally applied to a formative measurement model. With formative measures, the first step is to ensure content validity before collecting the data and estimating the PLS path model. After model estimation, different metrics are used to assess formative measures for convergent validity, collinearity among indicators, and the significance and relevance of indicator weights (Exhibit 4.1). Researchers need to pay particular attention if the PLS path model includes a single-item construct (Chapter 2). For single-item constructs, the criteria for the assessment of reflective and formative measurement models are not applicable. In a single-item construct, the relationship between the construct and its (single) item is by definition 1. The resulting construct scores are identical to the scores of the single item. To evaluate the reliability and validity of single-item measures, researchers must rely on proxies or different forms of validity assessment (Cheah et al., 2019; Chapter 3). For example, researchers can correlate the single-item measure with an established criterion variable. The resulting correlation is then compared to the correlation that results if the predictor construct is measured by a multiitem scale (e.g., Diamantopoulos, Sarstedt, Fuchs, Kaiser, & Wilczynski, 2012). In terms of reliability, researchers often assume that one cannot estimate the reliability of single-item measures based on such techniques as common factor analysis and the correction for attenuation formula (e.g., Sarstedt & Wilczynski, 2009). These procedures require that both the multi-item measure and the single-item measure are included in the same survey. Thus, these analyses are of primary interest when researchers want to assess in a pretest or pilot study whether in the main study a multi-item scale can be replaced with a single-item measure of the same construct. Still, research suggests the reliability and validity of single items are highly context specific, which renders their assessment in pretests or pilot studies problematic (Sarstedt, Diamantopoulos, Salzberger, & Baumgartner, 2016). The structural model estimates are not examined until the reliability and validity of the constructs have been established. If assessment of reflective (i.e., Stage 5a) and formative (i.e., Stage 5b; Chapter 5) measurement models provides evidence of the measures’ quality, the structural model estimates are evaluated in Stage 6, as explained in Chapter 6. The structural model assessment involves checking for potential collinearity among constructs, testing the model’s nomological validity and its ability to explain and predict the variance in the dependent variables (i.e., assessing the model’s explanatory and predictive power; Chin et al., 2020). Testing nomological validity involves assessing the size and significance of the path coefficients, whereas the evaluation of the model’s explanatory and predictive power is assessed on the grounds of the coefficient of determination (R² value; Gefen, Rigdon, & Straub, 2011), and the results of the PLSpredict procedure (Shmueli, Ray, Velasquez Estrada, & Chatla, 2016). The f 2 effect size provides additional insights about the quality of the PLS path model estimation (Exhibit 4.1).

114   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

CB-SEM also relies on several of these same metrics. But in addition, it provides goodness-of-fit measures based on the discrepancy between the empirical and the model-implied (theoretical) covariance matrix. Since PLS-SEM relies on variances instead of covariances to determine an optimum solution, covariancebased goodness-of-fit measures are not transferrable to the PLS-SEM context. Fit measures in PLS-SEM are generally variance based and focus on the discrepancy between the observed (in the case of manifest variables) or approximated (in the case of latent variables) values of the dependent variables and the values predicted by the model in question. Nevertheless, a couple of researchers have proposed several PLS-SEM–based model fit measures. These proposed fit measures are not effective for model fit assessment in most situations commonly encountered by social sciences researchers (see Chapter 6 for more details) and are not necessary for PLS-SEM applications. The evaluation of models in a composite-based SEM approach like PLS is also referred to as confirmatory composite analysis (CCA). Analogous to the confirmatory factor analysis, which is a set of analyses used to verify the factor structure of a set of observed variables, the objective of CCA is verifying the quality of a composite measurement of a theoretical concept of interest. As such, the CCA approach is not exclusively tied to PLS-SEM but in principle applied to all composite-based SEM methods, including generalized structured components analysis (GSCA; Hwang & Takane, 2004; also see Hwang, Sarstedt, Cheah, & Ringle, 2020) and regularized generalized canonical correlation analysis (GCCA; Tenenhaus & Tenenhaus, 2011). CCA differs from confirmatory factor analysis in that its statistical objective is to maximize variance extracted from the exogenous variables, and in doing so, to facilitate prediction and confirmation of the endogenous constructs. That is, CCA enables researchers to validate measures within a nomological network. The method produces composite scores that are weighted sums of indicators and can be used in follow-up analyses. The resulting composites are correlated, however, as they would be in an oblique rotation with an exploratory factor analysis and include variance that maximizes prediction of the endogenous constructs. As with all statistical procedures, researchers have different understandings of what constitutes a CCA. We outline these different viewpoints in Exhibit 4.3 (also see Crittenden, Astrachan, Sarstedt, Lourenco, & Hair, 2020, and Manley, Hair, Williams, & McDowell, 2020). EXHIBIT 4.3  ■  Confirmatory Composite Analysis Researchers have proposed different approaches for running a CCA, with sometimes harshly expressed viewpoints. Schuberth, Henseler, and Dijkstra’s (2018) approach relies exclusively on tests of overall model fit and fit indices, similar to the ones typically used in confirmatory factor analyses. The authors (Continued)

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

115

EXHIBIT 4.3  ■  (Continued) note that the purpose of their approach is to test “whether an artifact is useful” in a model (Schuberth, Henseler, & Dijkstra, 2018, p. 12). But how does a test make an artifact useful? This notion is hard to defend from philosophy of science and measurement theory perspectives. In addition, to identify the model “each composite must be connected to at least one composite or one variable not forming a composite” (Schuberth, Henseler, & Dijkstra, 2018, p. 5). This distinguishes CCA from confirmatory factor analysis, which enables a standalone assessment of constructs in a factor model framework. As a consequence, the validity of a composite—as assessed in CCA—depends on the nomological network in which it is embedded (Henseler & Schuberth, 2020). Changing the model setup will change the validity estimate of the composite (Chapter 3). The need to validate composites in a nomological network entails that the same composite might fit well in one model but not in another, which casts doubt on Schuberth, Henseler, and Dijkstra’s (2018) notion that the CCA is the counterpart of a confirmatory factor analysis for composite models. Hair, Howard, and Nitzl (2020) instead argue that a CCA process should follow the classical model evaluation procedure as documented in this book. That is, researchers should first assess the quality of the reflective measurement models (see next sections) and the formative measurement models (Chapter 5). If these metrics meet the recommended guidelines, the next step is to assess the quality of the structural model (Chapter 6). Different from Schuberth, Henseler, and Dijkstra (2018), in Hair, Howard, and Nitzl’s (2020) approach, model fit indices play no role in light of conceptual concerns related to their applicability in a composite-based SEM context and their questionable performance (e.g., Hair, Sarstedt, & Ringle, 2019).

Assessment of PLS-SEM outcomes can be extended to more advanced analyses such as examining mediating or moderating effects (Sarstedt, Hair, Nitzl, Ringle, & Howard, 2020), which we discuss in Chapter 7. Similarly, advanced PLS-SEM topics involve conducting an importance–performance matrix analysis (Ringle & Sarstedt, 2016), analyzing necessity conditions (Richter, Schubring, Hauff, Ringle, & Sarstedt, 2020), specifying higher-order constructs (e.g., Becker, Klein, & Wetzels, 2012; Sarstedt, Hair, Cheah, Becker, & Ringle, 2019), assessing the mode of measurement model by using the confirmatory tetrad analysis (CTA-PLS; Gudergan, Ringle, Wende, & Will, 2008), considering endogeneity (Hult et al., 2018), accounting for different forms of heterogeneity (e.g., Becker, Rai, Ringle, & Völckner, 2013; Matthews, 2017), a statistical test for predictive model comparison (Liengaard et al., 2021), using discrete choice experiment data in PLS-SEM (Hair, Ringle et al., 2019), and combing PLS-SEM with agentbased simulations (Schubring, Lorscheid, Meyer, & Ringle, 2016). In Chapter 8, we discuss several of these aspects in greater detail. In addition, Hair, Sarstedt, Ringle, and Gudergan (2018) offer detailed explanations of these and further advanced topics in PLS-SEM (see also Sarstedt et al., 2020). The objective of

116   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

these additional analyses is to extend and further differentiate the findings from the basic PLS path model estimation. Some of these advanced analyses are necessary to obtain a complete understanding of PLS-SEM results (e.g., checking for the presence of unobserved heterogeneity and significantly different subgroups), while others are optional. The primary rules of thumb on how to evaluate PLS-SEM results are shown in Exhibit 4.4. In the following sections, we provide an overview of the process for assessing reflective measurement models (Stage 5a). Chapter 5 addresses the evaluation of formative measurement models (Stage 5b), while Chapter 6 deals with structural model evaluation. EXHIBIT 4.4  ■  Rules of Thumb for Evaluating PLS-SEM Results •• Model assessment in PLS-SEM initially aims at evaluating the reliability and validity of the construct measures. Researchers may consider following the CCA step-by-step approach described by Hair, Howard, and Nitzl (2020). •• Begin the evaluation process by assessing the quality of the reflective and formative measurement models (specific rules of thumb for reflective measurement models follow later in this chapter, and for formative measurement models in Chapter 5). •• Standard model evaluation criteria are not applicable to single-item constructs. •• If these measurement model criteria are met, the model evaluation continues by assessing whether the structural model provides satisfactory results in explaining and predicting the target constructs. Path estimates should be statistically significant and meaningful. Moreover, check the model’s explanatory power (R²) and predictive power (PLSpredict procedure) with regard to its target constructs (Chapter 6 presents specific guidelines). •• Advanced analyses that extend and differentiate initial PLS-SEM findings may be necessary to obtain an accurate picture of the results (Chapters 7 and 8).

STAGE 5A: ASSESSING RESULTS OF REFLECTIVE MEASUREMENT MODELS Assessment of reflective measurement models includes the evaluation of the measures’ reliability, both on an indicator (indicator reliability) and construct level (internal consistency reliability). Validity assessment focuses on the evaluation of

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

117

two validity types. The first is each measure’s convergent validity using the average variance extracted (AVE). The second is discriminant validity that compares all construct measures in the same model based on the heterotrait-monotrait (HTMT) ratio of correlations. In the following sections, we address each criterion for the evaluation of reflective measurement models.

Step 1: Indicator Reliability The first step in reflective measurement model assessment involves examining the outer loadings of the indicators. High outer loadings on a construct indicate the associated indicators have much in common, which is captured by the construct. The size of the outer loading is also commonly called indicator reliability. At a minimum, the outer loadings of all indicators should be statistically significant. Because a significant outer loading could still be fairly weak, a common rule of thumb is the standardized outer loadings should be 0.708 or higher. The rationale behind this rule can be understood in the context of the square of a standardized indicator’s outer loading, referred to as the communality of an item. The square of a standardized indicator’s outer loading represents how much of the variation in an item is explained by the construct and is described as the variance extracted from the item. An established rule of thumb is that a latent variable should explain a substantial part of each indicator’s variance, usually at least 50%. The remaining portion represents an indicator’s unexplained variance (measurement error). Explaining at least 50% of an indicator’s variance implies that the variance shared between the construct and its indicator is larger than the measurement error. Hence, an indicator’s standardized outer loading, as provided by the PLS-SEM results, should be 0.708 or above since that number squared (0.7082) equals 0.50. Note that in most instances, 0.70 is considered close enough to 0.708 to be acceptable. Researchers frequently obtain weaker outer loadings (< 0.70) in social science studies, especially when newly developed scales are used (Hulland, 1999). Rather than automatically eliminating indicators when their outer loading is below 0.70, researchers should carefully examine the effects of indicator removal on other reliability and validity measures. Generally, indicators with outer loadings between 0.40 and 0.70 should be considered for removal only when deleting the indicator leads to an increase in the internal consistency reliability or convergent validity (discussed in the next sections) above the suggested threshold value. Another consideration in the decision of whether to delete an indicator is the extent to which its removal affects content validity. Indicators with weaker outer loadings are sometimes retained on the basis of their contribution to content validity. Indicators with very low outer loadings (below 0.40) should, however, always be eliminated from the construct (Bagozzi, Yi, & Philipps, 1991; Hair, Ringle, & Sarstedt, 2011). Exhibit 4.5 illustrates the recommendations regarding indicator deletion based on outer loadings.

118   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 4.5  ■  Outer Loading Relevance Testing

Outer loading relevance testing

Outer loading is < 0.40

Delete the reflective indicator

Outer loading is ≥ 0.40 but < 0.70

Analyze the construct’s internal consistency reliability and convergent validity

Outer loading is ≥ 0.70

Retain the reflective indicator

Construct measures do not meet the recommended thresholds

Construct measures meet the recommended thresholds

Delete the reflective indicator but consider its impact on content validity

Retain the reflective indicator

Step 2: Internal Consistency Reliability The second criterion to be evaluated is typically internal consistency reliability. The traditional criterion for measuring internal consistency reliability is Cronbach’s alpha. The alpha criterion provides an estimate of the reliability based on the intercorrelations of the observed indicator variables. This statistic is defined as follows: M 2  M   Σi =1 si ⋅ 1 − Cronbach’s a =    st2  M − 1  

  . 

In this formula, si2 represents the variance of the indicator variable i of a specific 2 construct, measured with M indicators (i = 1, . . . , M), and st is the variance of

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

119

the sum of all M indicators of that construct. One weakness of Cronbach’s alpha is that it assumes all indicators are equally reliable (i.e., all the indicators have equal outer loadings on the construct). Moreover, Cronbach’s alpha is sensitive to the number of items in the scale and generally tends to underestimate the internal consistency reliability. Therefore, Cronbach’s alpha can be used as a more conservative measure of internal consistency reliability. PLS-SEM prioritizes the indicators according to their different individual reliabilities. Therefore, due to the limitations of Cronbach’s alpha, it is technically more appropriate to apply a different measure of internal consistency reliability, referred to as composite reliability (ρc ). This measure of reliability takes into account the different outer loadings of the indicator variables and is calculated using the following formula: ρc =

(∑

(∑ l ) l ) +∑

2 M i =1 i

2 M i =1 i

M i =1

var (ei )

,

where li symbolizes the standardized outer loading of the indicator variable i of a specific construct measured with M indicators, ei is the measurement error of indicator variable i, and var(ei) denotes the variance of the measurement error, 2 which is defined as 1 − l i . Cronbach’s alpha and the composite reliability (ρC ) vary between 0 and 1, with higher values indicating higher levels of reliability. Specifically, values of 0.60 to 0.70 are acceptable in exploratory research, while in more advanced stages of research, values between 0.70 and 0.90 can be regarded as satisfactory. Values above 0.90 (and definitely above 0.95) are not desirable because they are typically the result of semantically redundant items, which slightly rephrase the very same question. Since the presence of redundant items in a single construct has adverse consequences for the measures’ content validity (e.g., Rossiter, 2002) and may boost error term correlations (Drolet & Morrison, 2001; Hayduk & Littvay, 2012), researchers are advised to minimize the number of redundant indicators. Finally, values below 0.60 indicate a lack of internal consistency reliability. While Cronbach’s alpha is conservative, the composite reliability metric may be too liberal. Indeed, the construct’s true reliability is typically viewed as between these two extreme values. As an alternative and building on Dijkstra (2010), subsequent research has proposed the exact (or consistent) reliability coefficient ρA (Dijkstra, 2014; Dijkstra & Henseler, 2015b), which is defined as

ρA = (w^ ′w^ )2 ⋅

w^ ′(S − diag(S ))w^ , w^ ′(w^ w^ ′ − diag(w^ w^ ′))w^

where w^ represents the outer weights estimates, diag indicates the diagonal of the corresponding matrix, and S the sample covariance matrix.

120   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

The ρA reliability metric usually lies between Cronbach’s alpha and the composite reliability, and is therefore considered a good compromise between these two measures (Hair, Risher, Sarstedt, & Ringle, 2019).

Step 3: Convergent Validity Convergent validity is the extent to which a measure correlates positively with alternative measures of the same construct. Using the domain sampling model, indicators of a reflective construct are treated as different (alternative) approaches to measure the same construct. Therefore, the items that are indicators (measures) of a specific reflective construct should converge or share a high proportion of variance. A common measure to establish convergent validity on the construct level is the average variance extracted (AVE). This criterion is defined as the grand mean value of the squared loadings of the indicators associated with the construct (i.e., the sum of the squared loadings divided by the number of indicators). Therefore, the AVE is equivalent to the communality of a construct. The AVE is calculated using the following formula:  ∑ M l i2 AVE =  i =1  M 

 ,  

where li symbolizes the standardized outer loading of the indicator variable i of a specific construct measured with M indicators. Applying the same logic as the one used with the indicator reliability, an AVE value of 0.50 or higher indicates that, on average, the construct explains more than half of the variance of its indicators. Conversely, an AVE value of less than 0.50 indicates that, on average, more variance remains in the error of the items than in the variance explained by the construct. The AVE’s size of each reflectively measured construct should be evaluated. In the example introduced in Chapter 2, an AVE estimate is needed only for the constructs COMP, CUSL, and LIKE, all of which are reflectively measured constructs. For the single-item construct CUSA, the AVE is not an appropriate measure since the indicator’s outer loading is fixed at 1.

Step 4: Discriminant Validity Discriminant validity is the extent to which a construct is truly distinct from other constructs by empirical standards. Thus, establishing discriminant validity implies that a construct is unique and captures phenomena not represented by other constructs in the model. Traditionally, researchers have relied on the

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

121

Fornell-Larcker criterion to assess discriminant validity (Fornell & Larcker, 1981). It compares the square root of the AVE values with the latent variable correlations. Specifically, the square root of each construct’s AVE should be greater than its highest correlation with any other construct. The logic of the FornellLarcker method is based on the idea that a construct shares more variance with its associated indicators than with any other construct. Exhibit 4.6 illustrates this concept. In the example, the AVE values of the constructs Y1 and Y2 are 0.55 and 0.65, respectively. The AVE values are obtained by squaring each outer loading, obtaining the sum of the three squared outer loadings, and then calculating the average value. For example, with respect to construct Y1, 0.60, 0.70, and 0.90 squared are 0.36, 0.49, and 0.81, respectively. The sum of these three numbers is 1.66, and the average value is therefore 0.55 (i.e., 1.66/3). The correlation between constructs Y1 and Y2 (as indicated by the double-headed arrow linking the two constructs) is 0.80. Squaring the correlation of 0.80 indicates that 64% (i.e., the squared correlation; 0.80² = 0.64) of each construct’s variation is explained by the other construct. Therefore, Y1 explains less variance in its indicator measures x1 to x3 than it shares with Y2, which implies that the two constructs (Y1 and Y2)—even though conceptually different—are not sufficiently different in terms of their empirical standards. Thus, in this example, discriminant validity is not established.

EXHIBIT 4.6  ■  Visual Representation of the Fornell-Larcker Criterion x1 AVE = 0.55

x2 x3

0.60 0.70 0.90

corr. = 0.80 Y1 corr.² = 0.64

Y2 corr.² = 0.64

0.70 0.80 0.90

x4 x5

AVE = 0.65

x6

Recent research casts serious doubts, however, about the efficacy of the FornellLarcker criterion based on both empirical and conceptual grounds (Franke & Sarstedt, 2019; Henseler, Ringle, & Sarstedt, 2015). For example, Henseler, Ringle, & Sarstedt (2015) show that the criterion performs very poorly when indicator loadings of the constructs under consideration differ only slightly (e.g., all indicator loadings vary between 0.60 and 0.80)—as is usually the case in empirical applications of PLS-SEM. When indicator loadings vary more strongly, the Fornell-Larcker criterion’s performance in detecting discriminant validity

122   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

issues improves but is still rather poor overall. Hence, in empirical applications, the Fornell-Larcker criterion often fails to reliably identify discriminant validity problems (Radomir & Moisescu, 2019). The very same holds for the assessment of cross-loadings. According to this criterion, an indicator’s outer loading on the associated construct should be greater than any of its cross-loadings (i.e., its correlation) on other constructs. Henseler, Ringle, & Sarstedt (2015) show that cross-loadings are not able to detect even severe violations of discriminant validity, rendering this criterion useless for applied research. As a remedy, Henseler, Ringle, & Sarstedt (2015) propose the heterotraitmonotrait ratio (HTMT) of the correlations to accurately assess discriminant validity. In short, HTMT is the ratio of the between-trait correlations to the within-trait correlations. HTMT is the mean of all correlations of indicators across constructs measuring different constructs (i.e., the heterotraitheteromethod correlations) relative to the (geometric) mean of the average correlations of indicators measuring the same construct (i.e., the monotraitheteromethod correlations). For a formal definition of the HTMT statistic, see Henseler, Ringle, & Sarstedt (2015). Technically, the HTMT approach is an estimate of what the true correlation between two constructs would be, if they were perfectly measured (i.e., if they were perfectly reliable). This true correlation is also referred to as disattenuated correlation. A disattenuated correlation between two constructs close to 1 indicates a lack of discriminant validity. Exhibit 4.7 illustrates the HTMT approach. The average heterotraitheteromethod correlations equal all pairwise correlations between variables x1, x 2, and x3, and x4, x5, and x6 (gray-shaded area in the correlation matrix in Exhibit 4.7). In the example, the average heterotrait-heteromethod correlation is 0.341. The average monotrait-heteromethod correlations of Y1 equal the mean of all pairwise correlations between x1, x 2, and x3 (i.e., 0.712). Similarly, the mean of all pairwise correlations between x4, x5, and x6 (i.e., 0.409) defines the average monotraitheteromethod correlations of Y2. The HTMT statistic for the relationship between Y1 and Y2 therefore equals HTMT (Y1Y2 ) =

mean( RY1Y2 )

(

mean( RY1Y1 ). mean RY2Y2

)

,

where RY1Y2 is the matrix of correlations between every indicator of Y1 and Y2, and RY1Y1 ( RY2Y2 ) is a matrix of correlations between each indicator of Y1 (Y2) (Franke & Sarstedt, 2019). Considering the example above, the HTMT is computed as follows: HTMT(Y1Y2 ) =

0.341 = 0.632. 0.712 ⋅ 0.409

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

123

EXHIBIT 4.7  ■  Visual Representation of the HTMT Approach x4

x1

Y2

Y1

x2

x5 x6

x3

x1

x2

x3

x4

x5

x1

1

x2

0.770

x3

0.701 0.665

x4

0.426 0.339 0.393

x5

0.423 0.345 0.385 0.574

x6

0.274 0.235 0.250 0.318 0.335

x6

1 1 1 1 1

The exact threshold level of the HTMT is debatable; after all, “when is a correlation close to 1?” Based on prior research and their study results, Henseler, Ringle, & Sarstedt (2015) suggest a threshold value of 0.90 if the path model includes constructs that are conceptually very similar (e.g., affective satisfaction, cognitive satisfaction, and loyalty). In other words, an HTMT value above 0.90 suggests a lack of discriminant validity. When the constructs in the path model are conceptually more distinct, a lower and thus more conservative threshold value of 0.85 seems warranted (Henseler, Ringle, & Sarstedt, 2015). Furthermore, the HTMT can serve as the basis of a statistical discriminant validity test (Franke & Sarstedt, 2019). However, as PLS-SEM does not rely on any distributional assumptions, standard parametric significance tests cannot be applied to test whether the HTMT statistic is significantly different from a certain threshold value. Instead, researchers have to rely on a procedure called bootstrapping to derive a distribution of the HTMT statistic (see Chapter 5 for more details on the bootstrapping procedure). In bootstrapping, subsamples are randomly drawn (with replacement) from the original set of data. Each subsample is then used to estimate the model. This process is repeated until a large number of random subsamples have been created, typically about 10,000. The estimated parameters from the subsamples (in this case, the HTMT statistic) are used to derive standard errors for the estimates. With this information, it is possible to derive a bootstrap confidence interval. The confidence interval is the range into which the HTMT population value will fall, assuming a certain

124   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

level of confidence (e.g., 95%). We like to show that the HTMT value is statistically significantly lower than a given threshold value with a 5% probability of error. Hence, the analysis considers a 95% one-sided bootstrap confidence interval, which is identical to computing a 90% two-sided bootstrap confidence interval instead. For example, assuming a threshold value of 0.85, a 95% one-sided bootstrap confidence interval containing this value indicates a lack of discriminant validity. Conversely, if the upper bound of the HTMT value’s 95% one-sided bootstrap confidence interval is below the 0.85 threshold value, this indicates that the two constructs are empirically distinct. Since the application of cut-off values may conceal discriminant validity problems in applications of the HTMT statistic (Franke & Sarstedt, 2019), one should primarily rely on inferential testing using bootstrap confidence intervals. Note that the HTMT may be adversely affected by strong negative correlations among indicators in a measurement model. As a result, the analysis might produce negative HTMT values and inadmissible solutions due to the square root of a negative product when computing the geometric mean. However, such a result is extremely rare. As a remedy, researchers may replace the geometric mean for calculating the monotrait-heteromethod correlations with the grand mean. This will resolve the problem of potentially inadmissible HTMT solutions. At the same time, however, using the grand mean can result in negative HTMT values and will tend to inflate the HTMT statistic. Hence, in case of substantial negative correlations, we suggest using the absolute indicator correlations as input for the HTMT computation as presented above and in Henseler et al. (2015), and as implemented in PLS-SEM software applications such as SmartPLS, rather than using the grand mean. What should researchers do if any of the criteria signal a lack of discriminant validity? There are different ways to handle discriminant validity problems (Exhibit 4.8). An initial approach retains the constructs that cause discriminant validity problems in the model and aims at (1) increasing the average monotraitheteromethod correlations, and/or (2) decreasing the average heterotraitheteromethod correlations of the construct measures. First, to increase a construct’s average monotrait-heteromethod correlations, one can (a) eliminate items that have low correlations with other items measuring the same construct. Likewise, heterogeneous subdimensions in the construct’s set of items could also deflate the average monotrait-heteromethod correlations. In this case, the researcher (b) splits the construct (e.g., quality) into homogeneous subconstructs (e.g., product quality and service quality) whose monotraitheteromethod correlations are higher than in the more general construct. These subconstructs then replace the more general construct in the model. When following this approach, however, the discriminant validity of the newly generated constructs with all the other constructs in the model needs to be reevaluated. Splitting up the general construct can also involve establishing a higher-order construct, if the measurement theory supports this step (e.g., Sarstedt et al., 2019).

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

125

EXHIBIT 4.8  ■  Handling Discriminant Validity Problems Discriminant validity assessment using the HTMT criterion

Discriminant validity established

Discriminant validity not established

Continue with the analysis (1) Increase the average monotrait-heteromethod correlations and/or (2) Decrease the average heteromethod-heterotrait correlations

Discriminant validity not established

Discriminant validity established

Establish discriminant validity by merging the problematic constructs

Continue with the analysis

Discriminant validity not established

Discriminant validity established

Discard the model

Continue with the analysis

Second, to decrease the average heteromethod-heterotrait correlations, one can (a) eliminate items that are strongly correlated with items in the opposing construct, or (b) reassign these indicators to the other construct, if theoretically

126   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

plausible. It is important to note that the elimination of items purely on statistical grounds can have adverse consequences for the content validity of the constructs. Therefore, this step entails carefully examining the scale indicators (based on prior research results or on a pretest when newly developed measures are involved) to determine whether all the construct domain facets have been captured. At least two expert coders should conduct this judgment independently to ensure a high degree of objectivity. If these two approaches fail, researchers may consider merging the constructs that cause the problems into a more general construct. Again, measurement theory must support this step. In this case, the more general construct replaces the problematic constructs in the model. In Exhibit 4.9, we summarize the criteria used to assess the reliability and validity of reflective construct measures. If the criteria are not met, the researcher may decide to remove single indicators from a specific construct in an attempt to meet the criteria more closely. However, removing indicators should be carried out with care since the elimination of one or more indicators may improve the reliability or discriminant validity but at the same time may decrease the measurement’s content validity. EXHIBIT 4.9  ■  R  ules of Thumb for Evaluating Reflective Measurement Models •• I ndicator reliability: The indicator’s outer loadings should be higher than 0.70. Indicators with outer loadings between 0.40 and 0.70 should be considered for removal only if the deletion leads to an increase in composite reliability or AVE above the suggested threshold value. Indicators with loadings below 0.4 should be deleted, while considering the impact of indicator deletion on content validity. •• Internal consistency reliability: Cronbach’s alpha is the lower bound, and composite reliability ρC is the upper bound for internal consistency reliability. The reliability coefficient ρA usually lies between these bounds and may serve as a good representation of a construct’s internal consistency reliability. Reliability should generally be higher than 0.70. In exploratory research, values between 0.60 and 0.70 are considered acceptable. Reliability values higher than 0.95 are not desirable. •• Convergent validity: The AVE value should be higher than 0.50. •• Discriminant validity: { {

{

Use the HTMT criterion to assess discriminant validity in PLS-SEM. A ssume a threshold value of 0.90 for conceptually similar constructs and 0.85 for conceptually different constructs. Use bootstrap confidence intervals to assess whether the HTMT values differ from a specific threshold (e.g., 0.90) for all combinations of constructs.

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

127

CASE STUDY ILLUSTRATION— EVALUATION OF THE REFLECTIVE MEASUREMENT MODELS (STAGE 5A) Running the PLS-SEM Algorithm We continue working with our PLS-SEM example on corporate reputation. In Chapter 3, we explained how to estimate the PLS path model and how to obtain the results by opening the default report in the SmartPLS 3 software (Ringle, Wende, & Becker, 2015). Recall that to do so, you must first load the simple corporate reputation model and then run the model by clicking on the icon at the top right or by using the pull-down menu by going to Calculate → PLS Algorithm. After running the PLS Algorithm, the SmartPLS results report automatically opens; if not, go to the Calculation Results tab on the bottom left of the screen and click on Report. Before analyzing the results, you need to quickly check if the algorithm converged (i.e., the stop criterion of the algorithm was reached and not the maximum number of iterations). To do so, go to Interim Results → Stop Criterion Changes in the results report. You will then see the table shown in Exhibit 4.10, which shows the number of iterations of the PLS-SEM algorithm. This number should be lower than the maximum number of iterations (e.g., 300) that you defined in the PLS-SEM algorithm parameter settings (Chapter 3). At the bottom left side of the table, you will see that the algorithm converged after Iteration 5. If the PLS-SEM algorithm does not converge in fewer than 300 iterations (the default setting in the software), the algorithm could not find a stable solution. This kind of situation almost never occurs. But if it does occur, there are two possible causes of the problem: (1) the selected stop criterion is at a very small level EXHIBIT 4.10  ■  Stop Criterion Table in SmartPLS

128   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

(e.g., 1.0E-25) so that small changes in the coefficients of the measurement models prevent the PLS-SEM algorithm from stopping, or (2) there are problems with the data and they need to be checked carefully. For example, data problems may occur if the sample size is too small or if an indicator has many identical values (i.e., the same data points, which results in insufficient variability). When your PLS path model estimation converges, which it practically always does, you need to examine the following PLS-SEM calculation results tables from the results report for reflective measurement model assessment: Outer Loadings, Composite Reliability, Cronbach’s Alpha, Average Variance Extracted (AVE), and Discriminant Validity. We examine other information in the report in Chapters 5 and 6, when we extend the simple path model by including formative measures and examine the structural model results.

Reflective Measurement Model Evaluation The simple corporate reputation model has three latent variables with reflective measurement models (i.e., COMP, CUSL, and LIKE) as well as a single-item construct (CUSA). For the reflective measurement models, we need the estimates for the relationships between the reflective constructs and their indicators (i.e., outer loadings). Exhibit 4.11 displays the results table for the outer loadings, which can be found under Final Results → Outer Loadings. By default, the outer loadings are also displayed in the Modeling Window after running the PLS-SEM algorithm. All outer loadings of the reflective constructs COMP, CUSL, and LIKE are well EXHIBIT 4.11  ■  Outer Loadings

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

129

above the threshold value of 0.708, which suggests sufficient levels of indicator reliability. The indicator comp_2 (outer loading: 0.798) has the smallest indicator reliability with a value of 0.637 (0.7982), while the indicator cusl_2 (outer loading: 0.917) has the highest indicator reliability, with a value of 0.841 (0.9172). Under Quality Criteria in the results report, left-click on the Construct Reliability and Validity tab to evaluate the composite reliability of the construct measures. Here, you have the option of displaying the composite reliability values using a bar chart or in a matrix format. Exhibit 4.12 shows the internal consistency reliability values in matrix format. With ρ A values of 0.832 (COMP), 0.839 (CUSL), and 0.836 (LIKE), all three reflective constructs have high levels of internal consistency reliability. Clicking on the rho_ A tab shows the bar chart of the constructs’ reliability values (Exhibit 4.12). The horizontal blue line indicates the common minimum threshold level for composite reliability (i.e., 0.70). If a ρ A value is above this threshold value, the corresponding bar is colored green. If a ρ A is lower than 0.70, the bar is colored red. As indicated above, all ρ A values exceed the threshold. Note that the ρ A value of the single-item variable CUSA is 1.00, but this cannot be interpreted as evidence the construct measurement is perfectly reliable and should not be reported with other measures of reliability. Moreover, going to Quality Criteria → Construct Reliability and Validity gives you the option to show the bar charts of Cronbach’s alpha and composite reliability ρC values for all constructs. All bars in the chart appear in green, indicating that all construct measures are above the 0.70 threshold. The specific values of Cronbach’s alpha (0.776 for COMP, 0.831 for CUSL, and 0.831 for LIKE) and the composite reliability ρC (0.865 for COMP, 0.899 for CUSL, and 0.899 for LIKE) can be accessed by left-clicking on the Matrix tab (Exhibit 4.13). Again, as CUSA is measured using a single item, interpreting this construct’s Cronbach’s alpha or composite reliability values is not meaningful. Convergent validity assessment is based on the AVE values, which can be accessed by navigating to Quality Criteria → Construct Reliability and Validity in the results report. As with the internal consistency reliability measures, SmartPLS offers the option of displaying the results using bar charts (Exhibit 4.14) or in a matrix format. In this example, the AVE values of COMP (0.681), CUSL (0.748), and LIKE (0.747) are well above the required minimum level of 0.50. Thus, the measures of the three reflective constructs have high levels of convergent validity. Finally, in the Discriminant Validity tab under Quality Criteria, SmartPLS offers several approaches to assess whether the construct measures empirically demonstrate discriminant validity. According to the Fornell-Larcker criterion, the square root of the AVE of each construct should be higher than the construct’s highest correlation with any other construct in the model (this notion is identical to comparing the AVE with the squared correlations between the constructs). Exhibit 4.15 shows the results of the Fornell-Larcker criterion assessment with the square root of the reflective constructs’ AVE on the diagonal and the correlations

130

EXHIBIT 4.12  ■  Construct Reliability and Validity

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

131

EXHIBIT 4.13  ■  Reliability ρA 1 0.9 0.8 rho_A

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 COMP

CUSA

CUSL

LIKE

Average variance extracted (AVE)

EXHIBIT 4.14  ■  Average Variance Extracted (AVE) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 COMP

CUSA

CUSL

LIKE

between the constructs in the off-diagonal position. For example, the reflective construct COMP has a value of 0.825 for the square root of its AVE, which needs to be compared with all correlation values in the column of COMP. Note that for CUSL, you need to consider the correlations in both the row and column and for LIKE only those in the row. Overall, the square roots of the AVEs for the reflective constructs COMP (0.825), CUSL (0.865), and LIKE (0.864) are all higher than the correlations of these constructs with other latent variables in the path model, thus indicating all constructs are valid measures of unique concepts.

132   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Note that while frequently used in past applied research, the FornellLarcker criterion does not reliably detect discriminant validity issues. Hence, any violation of the Fornell-Larcker criterion should be taken as strong evidence for a severe discriminant validity problem. The primary criterion for discriminant validity assessment is the Heterotrait-Monotrait Ratio (HTMT), which can be accessed via the Discriminant Validity section of the results report. Exhibit 4.16 shows the HTMT values for all pairs of constructs in a matrix format. The next tab also shows these HTMT values in bar charts, using 0.85 as the relevant threshold level. As can be seen, all HTMT values are clearly lower than the more conservative threshold value of 0.85, even for COMP and LIKE as well as CUSA and CUSL, which are very similar from a conceptual viewpoint. Recall that the threshold value for conceptually similar constructs is 0.90.

EXHIBIT 4.15  ■  Fornell-Larcker Criterion

EXHIBIT 4.16  ■  Heterotrait-Monotrait Ratio (HTMT)

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

133

In addition to examining the HTMT ratios, you should test whether the HTMT values are significantly different from the threshold value. Specifically, we assume a 0.85 threshold for all pairs of constructs except for COMP and LIKE as well as CUSA and CUSL, for which we assume a higher threshold (0.90) because of their conceptual similarity. This requires computing bootstrap confidence intervals obtained by running the bootstrapping option. To run the bootstrapping procedure, go back to the Modeling Window and left-click on Calculate → Bootstrapping in the pull-down menu. In the dialog box that opens, choose the bootstrapping options as displayed in Exhibit 4.17 (Chapter 5 includes a more detailed introduction to the bootstrapping procedure and the parameter settings). Make sure to select 10,000 Subsamples and the Complete Bootstrapping option, which includes the results for HTMT (unlike the Basic Bootstrapping option). Under Advanced Settings, click on Percentile Bootstrap, select One Tailed as Test Type, and a 0.05 Significance Level. The results are identical to selection Two Tailed as Test Type and a 0.10 Significance Level. Finally, click on Start Calculation. After running bootstrapping, open the results report. Go to Quality Criteria → Heterotrait-Monotrait Ratio (HTMT). The Confidence Intervals menu that opens up (Exhibit 4.18) shows the original HTMT values (column Original Sample (O)) for each combination of constructs in the model, along with the average HTMT values computed from the 10,000 bootstrap samples, as shown in column Sample Mean (M). Note that the results in Exhibit 4.18 will differ from your results and will change slightly each time you rerun the bootstrapping procedure. The reason is that bootstrapping builds on randomly drawn bootstrap samples, which will differ every time the procedure is run. The differences

EXHIBIT 4.17  ■  Bootstrapping Options in SmartPLS

134   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 4.18  ■  Confidence Intervals for HTMT

in the overall bootstrapping results are marginal, however, provided that a sufficiently large number of bootstrap samples have been drawn (e.g., 10,000). The columns labeled 5% and 95% show the lower and upper bounds of the 95% one-sided bootstrap confidence interval (or the 90% two-sided bootstrap confidence interval, respectively). The statistical test focuses on the right tail of the bootstrap distribution to show that an HTMT value is significantly lower than the corresponding threshold values (0.85 and 0.90) with a 5% probability of error. This is the case, if the result of the upper bound displayed in the 95% column is lower than 0.85 and in case of COMP and LIKE as well as CUSA and CUSL, 0.90, respectively. As can be seen in Exhibit 4.18, neither of the confidence intervals includes the corresponding threshold value. More importantly, even assuming a more conservative threshold of 0.85 for all construct combinations (i.e., including CUSA and CUSL as well as COMP and LIKE), we find that all HTMT values are significantly lower than this value (i.e., the confidence intervals’ upper bound is smaller than 0.85). For example, the lower and upper bounds of the HTMT’s 95% confidence interval for CUSA and COMP are 0.368 and 0.556, respectively (again, your values will likely look slightly different because bootstrapping is a random process). Since the upper bound of 0.565 is lower than 0.85, the HTMT value of 0.465 for CUSA and COMP is significantly lower than the more conservative threshold value of 0.85. To summarize, the bootstrap confidence interval results of the HTMT criterion also clearly demonstrate the discriminant validity of the constructs. Exhibit 4.19 summarizes the results of the reflective measurement model assessment. As can be seen, all model evaluation criteria have been met, providing support for the measures’ reliability and validity.

135

LIKE

CUSL

COMP

Latent Variable

>0.70

0.858

0.798

0.818

0.833

0.917

0.843

0.879

0.870

0.843

Indicators

comp_1

comp_2

comp_3

cusl_1

cusl_2

cusl_3

like_1

like_2

like_3

Loadings

0.711

0.757

0.773

0.711

0.841

0.694

0.669

0.637

0.736

>0.50

Indicator Reliability

Convergent Validity

0.747

0.748

0.681

>0.50

AVE

0.831

0.831

0.776

0.60–0.90

Cronbach’s Alpha

0.836

0.839

0.832

0.60–0.90

Reliability ρA

0.899

0.899

0.865

0.60–0.90

Composite Reliability ρC

Internal Consistency Reliability

EXHIBIT 4.19  ■  Results Summary for Reflective Measurement Models

Yes

Yes

Yes

Significantly lower than 0.85 (0.90)?

HTMT

Discriminant Validity

136   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Summary •

Gain an overview of Stage 5 of the process for using PLS-SEM, which deals with the evaluation of measurement models. PLS-SEM results are reviewed and evaluated using a systematic process that is in line with the confirmatory composite analysis (CCA) guidelines provided by Hair, Howard, and Nitzl (2020). The goal of PLS-SEM is maximizing the explained variance (i.e., the R² value) of the endogenous latent variables in the PLS path model. Evaluation of PLS-SEM results is a two-step approach (Stages 5 and 6) that starts with the assessment of the measurement models’ quality (Stage 5). Each type of measurement model (i.e., reflective or formative) has specific evaluation criteria. With reflective measurement models, reliability and validity must be assessed (Stage 5a). In contrast, evaluation of formative measurement models (Stage 5b) involves testing the measures’ collinearity, convergent validity, and the significance and relevance of the indicator weights. Satisfactory outcomes for the measurement model assessment are a prerequisite for evaluating the relationships in the structural model (Stage 6), which includes testing the significance of path coefficients and the model’s explanatory power (R2) and predictive power (using the PLSpredict procedure). Depending on the specific model and the goal of the study, researchers may want to use additional advanced analyses such as mediation or moderation, which we discuss in Chapters 7 and 8.



Describe Stage 5a: Evaluating reflectively measured constructs. The goal of reflective measurement model assessment is to ensure the reliability and validity of the construct measures and therefore provide support for the suitability of their inclusion in the path model. The key criteria include indicator reliability, internal consistency reliability (Cronbach’s alpha, reliability ρA, and composite reliability ρC), convergent validity, and discriminant validity. Convergent validity means the construct includes more than 50% of the indicator’s variance. Discriminant validity means that every reflective construct must share more variance with its own indicators than with other constructs in the path model. Reflective constructs are appropriate for PLS-SEM analyses if they meet all these requirements.



Use the SmartPLS software to assess reflectively measured constructs in the corporate reputation example. The case study illustration uses the corporate reputation path model and the data set introduced in Chapter 2. The SmartPLS software provides all relevant results for the evaluation of the measurement models. Tables and figures for this example demonstrate how to correctly report and interpret the PLS-SEM results. This hands-on example not only summarizes the concepts introduced before but also provides additional insights for their practical application.

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

137

Review Questions 1. What is indicator reliability, and what is the minimum threshold value for this criterion? 2. What is internal consistency reliability, and what is the minimum threshold value for this criterion? 3. What is average variance extracted, and what is the minimum threshold value for this criterion? 4. Explain the idea behind discriminant validity and how it can be established.

Critical Thinking Questions 1. Characterize the composite confirmatory analysis (CCA). 2. Why are the criteria for reflective measurement model assessment not applicable to formative measures? 3. How do you evaluate single-item constructs? Why is internal consistency reliability a meaningless criterion when evaluating single-item constructs? 4. Should researchers rely purely on statistical evaluation criteria to select a final set of indicators to include in the path model? Discuss the trade-off between statistical analyses and content validity.

Key Terms Average variance extracted (AVE) 120 Bootstrap confidence interval  123 Bootstrapping 123 Communality (construct)  120 Communality (item)  117 Composite reliability (ρC) 119 Confirmatory composite analysis (CCA) 111 Content validity  117 Convergent validity  120

Cronbach’s alpha  118 Cross-loadings 122 Disattenuated correlation  122 Discriminant validity  130 Evaluation criteria  109 Formative measurement model  113 Fornell-Larcker criterion  121 Heterotrait-heteromethod correlations 122 Heterotrait-monotrait ratio (HTMT) 122 (Continued)

138   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued)

Internal consistency reliability  118 Monotrait-heteromethod correlations 122 Nomological validity  113

Reflective measurement model  112 Reliability 111 Reliability coefficient ρA 119 Validity 111

Suggested Readings Chin, W. W. (2010). How to write up and report PLS analyses. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 655–690). Berlin: Springer. Chin, W. W., Cheah, J.-H., Liu, Y., Ting, H., Lim, X.-J., & Cham, T. H. (2020). Demystifying the role of causal-predictive modeling using partial least squares structural equation modeling in information systems research. Industrial Management & Data Systems, 120(12), 2161–2209. Franke, G., & Sarstedt, M. (2019). Heuristics versus statistics in discriminant validity testing: A comparison of four procedures. Internet Research, 29(3), 430–447. Gefen, D., Rigdon, E. E., & Straub, D. W. (2011). Editor’s comment: An update and extension to SEM guidelines for administrative and social science research. MIS Quarterly, 35(2), iii–xiv. Hair, J. F., Howard, M. C., & Nitzl, C. (2020). Assessing measurement model quality in PLS-SEM using confirmatory composite analysis. Journal of Business Research, 109, 101–110. Hair, J. F., Ringle, C. M., & Sarstedt, M. (2011). PLS-SEM: Indeed a silver bullet. Journal of Marketing Theory and Practice, 19(2), 139–151. Hair, J. F., Risher, J. J., Sarstedt, M., & Ringle, C. M. (2019). When to use and how to report the results of PLS-SEM. European Business Review, 31(1), 2–24. Hair, J. F., Sarstedt, M., Ringle, C. M., & Gudergan, S. P. (2018). Advanced issues in partial least squares structural equation modeling (PLS-SEM). Thousand Oaks, CA: Sage Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43(1), 115–135. Manley, S. C., Hair, J. F., Williams, R. I., & McDowell, W. C. (2020). Essential new PLSSEM analysis methods for your entrepreneurship analytical toolbox. International Entrepreneurship and Management Journal, forthcoming.

Chapter 4  ■  Assessing PLS-SEM Results—Part I  

139

Roldán, J. L., & Sánchez-Franco, M. J. (2012). Variance-based structural equation modeling: Guidelines for using partial least squares in information systems research. In M. Mora, O. Gelman, A. L. Steenkamp, & M. Raisinghani (Eds.), Research methodologies, innovations and philosophies in software systems engineering and information systems (pp. 193–221). Hershey, PA: IGI Global. Sarstedt, M., Ringle, C. M., Cheah, J.-H., Ting, H., Moisescu, O. I., & Radomir, L. (2020). Structural model robustness checks in PLS-SEM. Tourism Economics, 26(4), 531–554. Sarstedt, M., & Mooi, E. A. (2019). A concise guide to market research: The process, data, and methods using IBM SPSS statistics (3rd ed.). Berlin: Springer. Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y. M., & Lauro, C. (2005). PLS path modeling. Computational Statistics & Data Analysis, 48(1), 159–205.

Visit the companion site for this book at https://www.pls-sem.net/.

5 ASSESSING PLS-SEM RESULTS—PART II Evaluation of the Formative Measurement Models

LEARNING OUTCOMES 1. Explain the criteria used for the assessment of formative measurement models. 2. Understand the basic concepts of bootstrapping for significance testing in PLS-SEM and apply them. 3. Use the SmartPLS software to apply the formative measurement model assessment criteria and learn how to properly report the results of the practical example on corporate reputation.

CHAPTER PREVIEW Having learned how to evaluate reflective measurement models in the previous chapter (Stage 5a of applying PLS-SEM), our attention now turns to the assessment of formative measurement models (Stage 5b of applying PLS-SEM). The internal consistency perspective that underlies reflective measurement model evaluation 140

Chapter 5  ■  Assessing PLS-SEM Results—Part II  

141

cannot be applied to formative models since formative measures do not necessarily covary. Thus, any attempt to purify formative indicators based on correlation patterns can have negative consequences for a construct measure’s content validity. This notion especially holds for PLS-SEM, which assumes that the formative indicators (more precisely, composite indicators) fully capture the content domain of the construct under consideration. Therefore, instead of employing measures such as composite reliability or average variance extracted (AVE), researchers should rely on other criteria to assess the quality of formative measurement models. The chapter begins with an introduction to the criteria needed to evaluate formative measures. This includes a discussion of the bootstrapping routine that facilitates significance testing of PLS-SEM estimates, including formative indicator weights. These criteria are then applied to the corporate reputation model that is extended for this purpose. While the simple model contains only three reflectively measured constructs as well as one single-item construct, the extended model also includes four antecedent constructs of corporate reputation that are measured using formative indicators. This chapter concludes with the evaluation of measurement models. In Chapter 6, we move to the evaluation of the structural model (Stage 6 of applying PLS-SEM).

STAGE 5B: ASSESSING RESULTS OF FORMATIVE MEASUREMENT MODELS Many researchers incorrectly use reflective measurement model evaluation criteria to assess the quality of formative measures in PLS-SEM, as revealed by the review of PLS-SEM studies in the strategic management and marketing disciplines (Hair, Sarstedt, Pieper, & Ringle, 2012; Hair, Sarstedt, Ringle, & Mena, 2012). However, the statistical evaluation criteria for reflective measurement scales cannot be directly transferred to formative measurement models where indicators are likely to represent the construct’s independent causes and thus do not necessarily correlate highly. Furthermore, formative indicators are assumed to be error-free (Bollen & Diamantopoulos, 2017; Diamantopoulos, 2006; Edwards & Bagozzi, 2000), which means that the internal consistency reliability concept is not appropriate. Assessing convergent validity and discriminant validity of formatively measured constructs using criteria similar to those associated with reflective measurement models is not meaningful (Chin, 1998). Instead, researchers should focus on establishing content validity before empirically evaluating formatively measured constructs. This step requires ensuring that the formative indicators capture all (or at least major) facets of the construct. In creating formative constructs, content validity issues are addressed by the content specification in which the researcher clearly specifies the domain of content the indicators are intended to measure. Researchers must include a comprehensive set of indicators that exhausts the a priori defined formative construct’s domain. The set of

142   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

comprehensive indicators for formatively measured constructs should be identified using a rigorous qualitative approach. Failure to consider all major aspects of the construct (i.e., relevant indicators) entails an exclusion of important parts of the construct itself. In this context, experts’ assessment helps safeguard that proper sets of indicators have been used. In addition to specific reasons for operationalizing the construct as formative (Chapter 2; Albers, 2010; Petter, Straub, & Rai, 2007), researchers should conduct a thorough literature review and ensure a reasonable theoretical grounding when developing measures (Bollen & Diamantopoulos, 2017; Diamantopoulos & Winklhofer, 2001; Jarvis, MacKenzie, & Podsakoff, 2003). In this chapter, we examine the PLS-SEM results of formative measurement models following the procedure outlined in Exhibit 5.1. The first step involves assessing the formative measurement model’s convergent validity by correlating the formatively measured construct with a reflective (or single-item) measure of the same construct (Step 1). At the indicator level, the question arises as to whether each formative indicator indeed delivers a contribution to the formative index by representing the intended meaning. There are two situations in which researchers should critically examine whether a particular indicator should be included in the index: First, an indicator’s information could be redundant if it exhibits high correlations with other indicators of the same construct. This requires examining collinearity among the indicators (Step 2). Second, a formative indicator may not significantly contribute to the construct both relatively and absolutely. The latter aspects can be assessed by examining the (statistical) significance and relevance of the formative indicators (Step 3). Depending on the outcomes of Steps 2 and 3, you may need to revisit the previous steps, starting with establishing convergent validity of the revised set of formative indicators.

EXHIBIT 5.1  ■  Formative Measurement Model Assessment Procedure

Step 1

Assess convergent validity of formative measurement models

Step 2

Assess formative measurement models for collinearity issues

Step 3

Assess the significance and relevance of the formative indicators

Chapter 5  ■  Assessing PLS-SEM Results—Part II  

143

Step 1: Assess Convergent Validity Convergent validity is the extent to which a measure correlates positively with other (e.g., reflective) measures of the same construct using different indicators. Hence, when evaluating formative measurement models, we have to test whether the formatively measured construct is highly correlated with a reflective measure of the same construct. This type of analysis is also known as redundancy analysis (Chin, 1998). The term redundancy analysis stems from the information in the model being redundant in the sense that it is included in the formative construct and again in the reflective one. Specifically, one has to use the formatively measured construct as an exogenous latent variable predicting an endogenous latent variable operationalized through one or more reflective indicators (Exhibit 5.2). The strength of the path coefficient linking the two constructs is indicative of the validity of the designated set of formative indicators in tapping the construct of interest. Ideally, a magnitude of 0.80, but at a formative minimum 0.70 and above, is desired for the path between Y1 and Y1 reflective , which translates into an R² value of 0.64—or at least 0.50. If the analysis exhibreflective its lack of convergent validity (i.e., the R² value of Y1 < 0.50), then the formative indicators of the construct Y1 formative do not contribute at a sufficient degree to its intended content. The formative construct needs to be theoretically refined by exchanging or adding indicators. Note that to execute this approach, the reflective latent variable must be specified in the research design phase and included in data collection for the research. To identify suitable reflective measures of the construct, researchers can draw on scales from prior research, many of which are reviewed in scale handbooks (e.g., Bearden, Netemeyer, & Haws, 2011; Bruner, 2019; Zarantonella & PauwelsDelassus, 2015). Including sets of reflective multi-item measures is not always desirable, however, since they increase the survey length. Long surveys are likely to result in respondent fatigue, decreased response rates, and an increased number of missing values. Furthermore, established reflective measurement instruments may not be available, and constructing a new scale is difficult and time-consuming. EXHIBIT 5.2  ■  R  edundancy Analysis for Convergent Validity Assessment x1 x2 x3 x4

x5 Y1

formative

Y1

reflective

x6 x7

144   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

An alternative is to use a global item that summarizes the essence of the construct the formative indicators purport to measure (Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016). Cheah, Sarstedt, Ringle, Ramayah, and Ting (2018) propose a three-step procedure to generate and validate a global single-item measure to be used in redundancy analyses. We briefly introduce this procedure in Exhibit 5.3. EXHIBIT 5.3  ■  G  uidelines for Generating and Validating a Global Single Item Cheah et al. (2018) have proposed the following three-step procedure for generating and validating global single items to be used as criterion variables in a redundancy analysis. Their procedure requires empirical data to be collected as part of a pilot study to validate the single item. Step 1: Item generation To generate a suitable single item, researchers need to carefully choose a theoretical definition of the concept of interest and identify popular measurement scales that build on this definition. Taking the scale items as input, researchers then need to create a measure that taps the most relevant aspect of the concept and the scale items. The resulting item should be checked for face validity by a panel of experts and members representing the target population. Step 2: Reliability assessment To assess the single item’s reliability, researchers can draw on the following formula: rxy2 rxx = , ryy where rxx is the reliability estimate of the single-item measure x, ryy is the reliability of the reflective multi-item measure of the same concept (e.g., as depicted by ρA; Chapter 4), and rxy is the correlation between the single-item and multi-item measure of the same concept. Hence, reliability assessment by means of this formula requires the simultaneous administration of a single-item and a multi-item measure of the same concept, for example, as part of a pilot study. The internal consistency reliability should be 0.7 or higher. Step 3: Convergent and criterion validity assessment Assessing the single item’s convergent validity requires correlating the single item with the alternative reflective multi-item measure from Step 2. This correlation should be 0.7 or higher. To assess the item’s criterion validity, researchers need to correlate it with a criterion measure with which it is supposed to be related. Defining a firm threshold for an acceptable degree of criterion validity is difficult, as this depends on the constructs under consideration. However, at a bare minimum, the correlation should be significant.

Chapter 5  ■  Assessing PLS-SEM Results—Part II  

145

For the PLS-SEM example on corporate reputation in Chapter 3, an additional statement, “Please assess the extent to which [the company] acts in socially conscious ways,” was developed to be used in assessing convergent validity. The new item was measured on a scale of 1 ( fully disagree) to 7 ( fully agree). This question can be used as an endogenous single-item construct to validate the formative measurement of corporate social responsibility (CSOR). Later in this chapter, we explain how to access the full data set for this PLS-SEM example on corporate reputation as well as how to execute this procedure. Note that while the use of single items is generally not recommended, especially in the context of PLS-SEM (Chapter 2), their role in redundancy analyses is different because single items only serve as a proxy for the constructs under consideration. In other words, the objective is not to fully capture the content domain of the construct but only to consider its salient elements, which serve as a standard of comparison for the formative measurement approach of the construct.

Step 2: Assess Formative Measurement Models for Collinearity Issues Unlike reflective indicators, which are essentially interchangeable, high correlations are not expected between items in formative measurement models. In fact, high correlations between two formative indicators, also referred to as collinearity, can prove problematic from a methodological and interpretational standpoint. Note that when more than two indicators are involved, this situation is called multicollinearity. To simplify our comments, we only refer to collinearity in the following discussion. The most severe form of collinearity occurs if two (or more) formative indicators are entered in the same block of indicators with exactly the same information in them (i.e., they are perfectly correlated). This situation may occur because the same indicator is entered twice or because one indicator is a linear combination of another indicator (e.g., one indicator is a multiple of another indicator such as sales in units and sales in thousand units). Under these circumstances, PLS-SEM cannot estimate one of the two coefficients (technically, a singular data matrix occurs during the model estimation; Chapter 3). Collinearity problems may also appear in the structural model (Chapter 6) if, for example, redundant indicators are used as single items to measure two (or more) constructs. If this occurs, researchers need to eliminate the redundant indicators. While perfect collinearity occurs rather seldom, high levels of collinearity are much more common. High levels of collinearity between formative indicators are a crucial issue because they have an impact on the estimation of weights and their statistical significance. More specifically, in practice, high levels of collinearity often affect the results of analyses in two respects. First, collinearity boosts the standard errors and thus reduces the ability to demonstrate the estimated weights

146   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

are significantly different from zero. This issue is especially problematic in PLSSEM analyses based on smaller sample sizes, where standard errors are generally larger due to sampling error. Second, high collinearity can result in the weights being incorrectly estimated, as well as in their signs being reversed. The following example (Exhibit 5.4) illustrates sign reversal due to high correlations between two formative indicators.

EXHIBIT 5.4  ■  Correlation Matrix Demonstrating Collinearity Formative Measurement Model x1 x2

Correlation Matrix Y1

0.53 Y1 –0.17

x1

Y1

1.00

x1

0.38

1.00

x2

0.14

0.68

x2

1.00

On examining the correlation matrix in Exhibit 5.4 (right side), note that indicators x1 and x 2 are both positively correlated with construct Y1 (0.38 and 0.14, respectively), but have a higher intercorrelation (0.68). In addition, both bivariate correlations of the indicators are positive with the construct Y1 in this situation, and the two indicators are positively intercorrelated. But when the final parameter estimates are computed in the last stage of the algorithm (Chapter 3), the outer weight of x1 is positive (0.53), whereas the outer weight of x 2 is negative (−0.17) as in Exhibit 5.4 (left side). This demonstrates a situation where high collinearity unexpectedly reverses the signs of the weaker indicator (i.e., the indicator less correlated with the construct). When this happens, and the researcher has not assessed collinearity among the indicators, the result would be a false interpretation of the indicator relationships, suggesting misleading conclusions. To assess the level of collinearity, researchers should compute the tolerance (TOL). The tolerance represents the amount of variance of one formative indicator not explained by the other indicators in the same block. For example, in a block of formative indicators, the tolerance for the first indicator x1 can be obtained in two steps: 1. Take the first formative indicator x1 and regress it on all remaining indicators in the same block. Calculate the proportion of variance of x1 associated with the other indicators (Rx21).

Chapter 5  ■  Assessing PLS-SEM Results—Part II  

147

2. Compute the tolerance for indicator x1 (TOLxs ) using 1 – R 2. For example, if the other indicators explain 75% of the first indicator’s variance (i.e., Rx21 = 0.75), the tolerance for x1 is 0.25 (TOLxs = 1.00 – 0.75 = 0.25). A related measure of collinearity is the variance inflation factor (VIF), defined as the reciprocal of the tolerance (i.e., VIFxs = 1 / TOLxs ). Therefore, a tolerance value of 0.25 for x1 (TOLxs ) translates into a VIF value of 1/0.25 = 4.00 for x1 (VIFxs ). The term VIF is derived from its square root (√VIF) being the degree to which the standard error has been increased due to the presence of collinearity. In the example above, a VIF value of 4.00 therefore implies that the standard error has been doubled (√4 = 2.00) due to collinearity. Similarly, the TOL and VIF values are computed for every indicator per formative measurement model. Both collinearity statistics carry the same information, but the reporting of VIF values has become standard practice in scholarly research. As a rule of thumb, VIF values of 5 or above indicate critical collinearity issues among the indicators of formatively measured constructs. However, collinearity issues can also occur at lower VIF values of 3 (Becker, Ringle, Sarstedt, & Völckner, 2015; Mason & Perreault, 1991). Ideally, the VIF values should be close to 3 and lower. Besides the VIF, researchers may also consider using the condition index (CI) to assess the presence of critical collinearity levels in formative measurement models (Götz, Liehr-Gobbers, & Krafft, 2010). However, the CI is more difficult to interpret and not yet included in any PLS-SEM software. Another alternative approach to evaluating collinearity is examining the bivariate correlation between two variables. While less common in practice, bivariate correlations higher than 0.60 have resulted in collinearity issues with formative indicators in PLS path models. If the level of collinearity is very high, as indicated by a VIF value of 5 or higher, one should consider removing one of the corresponding indicators. However, this requires that the remaining indicators still sufficiently capture the construct’s content from a theoretical perspective. Combining the collinear indicators into a single (new) composite indicator (i.e., an index)—for example, by using their average values, their weighted average value, or their factor scores—is another option for treating collinearity problems. The latter step is not without problems, however, because the individual effects of the indicators can become confounded, which can have adverse consequences for the content validity of the index. Alternatively, setting up formative–formative higher-order constructs (Becker, Klein, & Wetzels, 2012; Ringle, Sarstedt, & Straub, 2012; Sarstedt, Hair, Cheah, Becker, & Ringle, 2019) is a solution to address the collinearity problem (Chapter 8). In this case, researchers should try to assign highly correlated indicators to different lower-order components, which likely resolves collinearity issues in the measurement model by design. However, such a step needs to be supported by measurement theory. Exhibit 5.5 displays the process to assess collinearity in formative measurement models based on the VIF. Indicator weights in formative measurement

148   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 5.5  ■  C  ollinearity Assessment in Formative Measurement Models Using the VIF Assess the level of collinearity in the formative measurement model

No critical levels of collinearity (i.e., VIF < 5)

Critical levels of collinearity (i.e., VIF ≥ 5)

Analyze the significance of outer weights and interpret the formative indicators’ absolute and relative contribution

Treat collinearity issues

No critical levels of collinearity (e.g., VIF < 5)

Critical levels of collinearity (e.g., VIF ≥ 5)

Analyze the significance of outer weights and interpret the formative indicators’ absolute and relative contribution

Dismiss the formative measurement model

models should be analyzed for their significance and relevance only if collinearity is not at a critical level. When this is not the case and collinearity issues cannot be treated, researchers should not use and interpret the results of the indicator weights in formative measurement models and may want to reconsider or dismiss the operationalization of the formative measurement model.

Step 3: Assess the Significance and Relevance of the Formative Indicators Another important criterion for evaluating the contribution of a formative indicator, and thereby its relevance, is its outer weight. The outer weight is the

Chapter 5  ■  Assessing PLS-SEM Results—Part II  

149

result of a multiple regression (Hair, Black, Babin, & Anderson, 2019) with the latent variable scores as the dependent variable and the formative indicators as the independent variables (see the PLS-SEM algorithm in Chapter 3). Since the construct itself is formed by its underlying formative indicators as a linear combination of the indicator scores and the outer weights, running such a multiple regression analysis yields an R² value of 1.0. That is, 100% of the construct is explained by the indicators. This characteristic distinguishes formative (i.e., composite) indicators from causal indicators commonly used in CB-SEM (Sarstedt, Hair, Ringle, Thiele, & Gudergan, 2016). In the latter case, the construct measured is not automatically explained in full by its (causal) indicators (Chapter 2). The values of the outer weights are standardized and can therefore be compared with each other. They express each indicator’s relative contribution to the construct, or its relative importance to forming the construct. The estimated values of outer weights in formative measurement models are frequently smaller than the outer loadings of reflective indicators. The key question that arises is whether formative indicators truly contribute to forming the construct. To answer this question, we must test if the outer weights in formative measurement models are significantly different from zero by means of the bootstrapping procedure. Note that bootstrapping also plays a crucial role in other elements of the PLS path model analysis, particularly in the evaluation of the structural model path coefficients (Chapter 6). We explain bootstrapping in more detail later in this chapter. It is important to note that the values of the formative indicator weights are influenced by other relationships in the model (see the PLS-SEM algorithm in Chapter 3). Hence, the exogenous formative construct(s) can have different contents and meanings depending on the endogenous constructs used as outcomes. This is also known as interpretational confounding and represents a situation in which the empirically observed meaning between the construct and its measures differs from the theoretically imposed meaning (Kim, Shin, & Grover, 2010). Such outcomes are not desirable since they limit the generalizability of the results (Bagozzi, 2007). Therefore, comparing formatively measured constructs across several PLS path models with different setups (e.g., different endogenous latent variables) should be approached with caution. With larger numbers of formative indicators used to measure a single construct, it becomes more likely that one or more indicators will have low or even nonsignificant outer weights. Unlike reflective measurement models, where the number of indicators has little bearing on the measurement results, formative measurement has an inherent limit to the number of indicators that can retain a statistically significant weight (Cenfetelli & Bassellier, 2009). Specifically, when indicators are assumed to be uncorrelated, the average outer weight is 1/√n, where n is the number of indicators. For example, with 2 (or 5 or 10) uncorrelated indicators, the maximum possible outer weight is 1/√2 = 0.707 (or 1/√5 = 0.447 or 1√10 = 0.316). Similarly, just as the maximum possible outer weight declines with the number of indicators, the average value of outer weights significantly declines

150   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

with larger numbers of items. Thus, it becomes more likely that additional formative indicators will become nonsignificant. To deal with the potential impact of a large number of indicators, Cenfetelli and Bassellier (2009) propose grouping indicators into two or more distinct constructs. This approach, of course, requires the indicator groups to be conceptually aligned and that the grouping makes sense from a theoretical perspective. For example, the indicators of the performance construct, which we introduce as a driver construct of corporate reputation later in this chapter (see Exhibit 5.14), could be grouped into two sets, as shown in Exhibit 5.6. The indicator items “[the company] is a very well-managed company” (perf_1) and “[the company] has a clear vision about the future” (perf_5) could be used as formative indicators of a separate construct called management competence. Similarly, the indicators “[the company] is an economically stable company” (perf_2), “I assess the business risk for [the company] as modest compared to its competitors” (perf_3), and “I think that [the company] has growth potential” (perf_4) could be used as formative indicators of a second construct labeled economic performance. An alternative is to create a formative–formative hierarchical component model (Becker, Klein, & Wetzels, 2012; Sarstedt, Hair, Nitzl, Ringle, & Howard, 2020). The higher-order component itself (performance) is then formed by the formatively measured lower-order components management competence and economic performance (Exhibit 5.6). For a detailed discussion of higher-order constructs and

EXHIBIT 5.6  ■  Example of a Higher-Order Construct

perf_1 perf_5

Management Competence perf_1 perf_2 Performance

perf_3 perf_4

perf_2

perf_5 perf_3 perf_4

Economic Performance

Chapter 5  ■  Assessing PLS-SEM Results—Part II  

151

their validation, see Hair, Sarstedt, Ringle, and Gudergan (2018, Chapter 2) and Sarstedt et al. (2020). Nonsignificant indicator weights should not automatically be interpreted as indicative of poor measurement model quality. Rather, researchers should also consider a formative indicator’s absolute contribution to (or absolute importance for) its construct. In short, the absolute contribution is the information an indicator provides without considering any other indicators that are associated with the formative construct. The absolute contribution is measured by the formative indicator’s outer loading, which is always provided along with the indicator weights. Different from the outer weights, the outer loadings are derived from bivariate regressions of each indicator on its corresponding construct (which in PLS-SEM is equivalent to the bivariate correlation between each indicator and the construct). When an indicator’s outer weight is nonsignificant but its outer loading is high (i.e., above 0.50), the indicator should be interpreted as absolutely important but not as relatively important. In this situation, the indicator would generally be retained. But when an indicator has a nonsignificant weight and the outer loading is below 0.50, researchers should decide whether to retain or delete the indicator by examining its theoretical relevance and potential content overlap with other indicators of the same construct. If the theory-driven conceptualization of the construct strongly supports retaining the indicator (e.g., by means of expert assessment), it should be kept in the formative measurement model. But if the conceptualization does not strongly support an indicator’s inclusion, the nonsignificant indicator should most likely be removed from further analysis. In contrast, if the outer loading is low (e.g., below 0.10) and nonsignificant, there is no empirical support for the indicator’s relevance in providing content to the formative index (Cenfetelli & Bassellier, 2009). Therefore, such an indicator should be removed from the formative measurement model. Eliminating formative indicators that do not meet threshold levels in terms of their contribution has, from an empirical perspective, almost no effect on the parameter estimates when reestimating the model. Nevertheless, formative indicators should never be discarded simply on the basis of statistical outcomes. Before removing an indicator from the formative measurement model, researchers need to evaluate its relevance from a content validity point of view. Again, omitting a formative indicator results in the omission of some of the construct’s content. Exhibit 5.7 summarizes the decision-making process for keeping or deleting formative indicators. In summary, the evaluation of formative measurement models requires establishing the measures’ convergent validity, assessing the indicators’ collinearity, and analyzing the indicators’ relative and absolute contributions, including their significance. Exhibit 5.8 summarizes the rules of thumb for evaluating formative measurement models.

152   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 5.7  ■  D  ecision-Making Process for Keeping or Deleting Formative Indicators

Outer weight significance testing

Outer weight is significant

Outer weight is not significant

Continue with the interpretation of the outer weight’s absolute and relative size

Analyze the formative indicator’s outer loading

The outer loading is  0.5

The outer loading is  0.5

Test the significance of the formative indicator’s outer loading

Keep the indicator even though it is not significant

The outer loading is  0.5 and not significant

The outer loading is  0.5 but significant

Delete the indicator

Consider removal of the indicator

Bootstrapping Procedure Concept PLS-SEM does not assume the data are normally distributed. Lack of normality means that parametric significance tests used in regression analyses cannot be

Chapter 5  ■  Assessing PLS-SEM Results—Part II  

153

EXHIBIT 5.8  ■  R  ules of Thumb for the Evaluation of Formative Measurement Indicators •• Assess the formative construct’s convergent validity by examining its correlation with an alternative measure of the construct, using reflective measures or a global single item (redundancy analysis). The correlation between the constructs should be 0.70 or higher. •• Collinearity of indicators: Each indicator’s VIF value should ideally be lower than 3 but certainly lower than 5. Otherwise, consider eliminating indicators, merging indicators into a single index, or creating higher-order constructs to treat collinearity problems. •• Examine each indicator’s outer weight (relative importance) and outer loading (absolute importance) and use bootstrapping to assess their significance. •• When an indicator’s weight is significant, there is empirical support to retain the indicator. •• When an indicator’s weight is not significant but the corresponding item loading is relatively high (i.e., ≥0.50), or statistically significant, the indicator should generally be retained. •• If the outer weight is nonsignificant and the outer loading is relatively low (i.e., 0, researchers should next compare the RMSE those indicators with Q predict

203

The model has high predictive power

Yes

The model has medium predictive power

Yes

Are all PLS-SEM values < LM values?

Compare the RMSE (or MAE) values from the PLS-SEM analysis with the LM values for each indicator of the key target construct

EXHIBIT 6.9  ■  Guidelines for Using PLSpredict

The model lacks predictive power

Yes

LM values?


LM values?

No

The model has low predictive power

No

204   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

(or the MAE) values with the naïve LM benchmark. This comparison can have four outcomes: 1. If all indicators in the PLS-SEM analysis have lower RMSE (or MAE) values compared to the naïve LM benchmark, the model has high predictive power. 2. If the majority (or the same number) of indicators in the PLS-SEM analysis yields smaller prediction errors compared to the LM, this indicates a medium predictive power. 3. If a minority of the dependent construct’s indicators produces lower PLS-SEM prediction errors compared to the naïve LM benchmark, this indicates the model has low predictive power. 4. If the PLS-SEM analysis (compared to the LM) yields lower prediction errors in terms of the RMSE (or the MAE) for none of the indicators, this indicates the model lacks predictive power.

Treating Predictive Power Issues If the PLSpredict procedure identifies one or more indicators with a low predictive power, researchers should carefully explore potential explanations. These include (1) data issues, and (2) measurement model issues. In terms of data issues, predictions can be off due to the prediction error’s bias and variance. For example, if an indicator has a very large variance (across observations), this means the prediction for observations far from the mean will suffer. Specifically, using single items to measure abstract concepts decreases the model’s predictive power (Diamantopoulos, Sarstedt, Fuchs, Kaiser, & Wilczynski, 2012; Salzberger, Sarstedt, & Diamantopoulos, 2016; Sarstedt, Diamantopoulos, Salzberger, & Baumgartner, 2016). In the case of single-item measures of abstract concepts, respondents are asked to “automatically consider different aspects of the construct” (Fuchs & Diamantopoulos, 2009; p. 204). But asking respondents to do this is likely to increase the error variance, thus reducing a model’s capability to predict observations in a holdout sample. Similarly, if the data have a U-shaped distribution and the prediction is the mean (i.e., the middle of the U), the predictions for most observations are wrong, although they may, on average, be correct. Reconsidering the outlier treatment (i.e., criteria chosen for removing outliers) or transforming the problematic indicator (Chapter 2) may increase its predictive power. Low predictive power could also be due to measurement model issues. For example, an indicator loading could be low but still sufficiently high for the construct to meet recommended reliability and convergent validity thresholds (Chapter 4). But if the same indicator lacks predictive power, researchers should carefully consider removing it, even if measurement theory supports its inclusion.

Chapter 6  ■  Assessing PLS-SEM Results—Part III  

205

This conclusion (indicator removal) holds particularly in situations in which the 2 analysis’s primary objective is prediction, or when a negative Q predict value points to a very low predictive power. However, researchers should not mechanically remove corresponding indicators but rather carefully evaluate the effect of deleting the indicator on the construct’s reliability, content validity, convergent validity, and discriminant validity (Chapter 4).

Step 5: Model Comparisons Some research situations call for the establishing and comparing of alternative theoretical models. Alternative model configurations that serve as basis for model comparisons typically emerge when considering theories in new contexts or when researchers build conceptual bridges across related streams of inquiry to provide a holistic understanding of the phenomenon of interest (Chapter 2). Such alternative models all focus on the same endogenous construct but differ in their structure, for example, with regard to the number of predictor constructs. Given such a set of models, researchers then try to identify the model that best approximates the data generation process underlying the phenomenon under study. To do so, researchers frequently select the model that yields the highest R 2 value. However, such a procedure inherently favors complex models with many exogenous constructs, including ones that may be only slightly related to the endogenous construct of interest. The reason is that adding additional (nonsignificant) constructs to explain an endogenous latent variable in the structural model always increases its R² value. However, researchers typically prefer parsimonious models that are good at explaining the data (i.e., with high R² values) but are also simple because such models are more likely to generalize to other research settings. To account for the trade-off between a model’s ability to approximate the data generation process and its complexity, Sharma, Sarstedt, Shmueli, Kim, and Thiele (2019) and Sharma, Shmueli, Sarstedt, Danks, and Ray (2021) have proposed using model selection criteria well-known from the regression literature. In their basic form, model selection criteria rely on the method of penalizedlikelihood in which a term to penalize model complexity is added to the likelihood function. However, when the error distribution of the endogenous construct’s scores is normal with a constant variance, the maximum likelihood– based formulas can be derived from the sum of squared errors as produced by the PLS-SEM algorithm (Chapter 3). Sharma et al. (2019) identified the Bayesian information criterion (BIC; Schwarz 1978) and the Geweke and Meese (1981) criterion (GM) as particularly suitable for PLS-SEM–based model selection tasks. Sharma et al. (2021) find that the BIC and GM also achieve a sound tradeoff between model fit and out-of-sample predictive power in the estimation of PLS path models. Their analyses show these criteria are useful substitutes for

206   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

selecting correctly specified models with low prediction error when researchers cannot create a holdout sample—for example, due to low sample size—to assess a model’s out-of-sample predictive power. While BIC and GM exhibit practically the same performance in model selection tasks, BIC is considerably easier to compute. Hence, our illustrations will focus on this criterion. BIC for a certain model i is defined as follows:   SSEi BICi = n log    n

 pk log ( n )  , + n  

where SSEi is the sum of squared errors for the ith model in a set of alternative models, n is the sample size, and pk is the number of predictors of the construct of interest plus 1. Researchers would establish a set of alternative models, run the PLS-SEM algorithm on each model, and compute each model’s BIC value. The best model in the set is the one that minimizes the BIC value. In computing the BIC value of the models, researchers need to focus on one specific construct, which is typically the key target construct in the model. In addition, using model selection criteria requires that the models be compared using the same data set. One issue in the application of the BIC is that—in its simple form (i.e., raw values)—the criterion does not offer any insights regarding the relative weights of evidence in favor of models under consideration (Burnham & Anderson, 2002). More precisely, while the differences in BIC values are useful in ranking and selecting models, such differences can often be small in practice, leading to model selection uncertainty. To resolve this issue, researchers can use the BIC values to compute Akaike weights, which indicate a model’s relative likelihood, given the data and a set of competing models (Danks, Sharma, & Sarstedt, 2020). The BIC-based Akaike weight of a certain model i is defined as:

{

}

1 exp − ∆i ( BIC ) 2 wi ( BIC ) = with K 1 − ∆ exp BIC ( ) ∑ k =1 k 2

{

}

∆i ( BIC ) = BICi − BIC min To illustrate the use of the BIC in model comparisons, consider the three alternative models shown in Exhibit 6.10. Calculating the three models using the PLSSEM algorithm produces BIC values of −328.138 for Model 1, −327.496 for Model 2, and −317.713 for Model 3. Based on the BIC, researchers should opt for Model 1 as this model produces the smallest value.

Chapter 6  ■  Assessing PLS-SEM Results—Part III  

207

EXHIBIT 6.10  ■  Model Comparison

Y1

Y1

Y2

Y3

Y2

Y3

Y4

Model 1

Model 2

Y1

Y2

Y3

Y4 Model 3

In order to compute the BIC-based Akaike weights for the three models, we first have to compute the differences between the smallest and the other BIC values. For Model 1, this computation results in ∆1 ( BIC ) = −327.497 − ( −327.497 ) = 0, for Model 2 in ∆ 2 ( BIC ) = −327.497 − ( −328,138 ) = 0.641 and for Model 3 in ∆ 3 ( BIC ) = −317.713 − ( −328,138 ) = 10.425. We can now compute the Akaike weights for the three models as follows:

{ } { } { } {

1 exp − . 0 2 w1 ( BIC ) = 1. 1. 1 exp − 0 + exp − 0.641 + exp − .10.425 2 2 2 1 1 = = = 57.74% 1 + 0.726 + 0.005 1.732

}

208   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

w2 ( BIC ) =

w3 ( BIC ) =

{

}

1. 0.641 0.726 2 = = 41.92% 1.732 1.732

exp −

{

}

1. 10.425 0.005 2 = = 0.29% 1.732 1.732

exp −

The BIC-based Akaike weights firmly reject Model 3 with a relative likelihood of merely 0.29%. Considering the other weights, we see that they clearly favor Model 1, with a relative likelihood of 57.74%, with some support given to Model 2, with a relative likelihood of 41.92%. These results suggest that, among the three alternative models, Model 1 fits the data best. A further development of the prediction-oriented model comparisons in PLSSEM is the cross-validated predictive ability test (CVPAT; Liengaard et al., 2021). This approach offers a statistical test to decide whether an alternative model offers significantly higher out-of-sample predictive power than an established model. A statistical test is particularly advantageous if the differences in the information criteria for deciding for one or the other model are relatively small. In addition, the test statistic of the CVPAT is suitable for prediction-oriented model comparison in the context of the development and validation of theories. As such, CVPAT offers researchers an important tool for selecting a model on which they can base, for example, strategic management and policy decisions. Future extensions of CVPAT will also support a test for the predictive power assessment of a single model (Hair, 2020). Exhibit 6.11 summarizes the key criteria for evaluating structural model results. EXHIBIT 6.11  ■  Rules of Thumb for Structural Model Evaluation • Examine each set of predictors in the structural model for collinearity. Each predictor’s VIF value should be lower than 5 and preferably lower than 3 to avoid critical collinearity issues. Otherwise, consider eliminating constructs, merging predictors into a single construct, or creating higher-order constructs to treat collinearity problems. • Use bootstrapping to assess the significance of path coefficients. The minimum number of bootstrap samples must be at least as large as the number of valid observations but should be 10,000. Instead of inspecting critical t values and p values for significance testing, we recommend reporting bootstrap confidence intervals since they provide additional information on the stability of path coefficient estimates. Use the percentile method for constructing confidence intervals. However, when the bootstrap distribution of the parameter estimate (e.g., a path coefficient) is highly skewed as evidenced in skewness values beyond −2 and +2, the BCa method can be used. (Continued)

Chapter 6  ■  Assessing PLS-SEM Results—Part III  

209

EXHIBIT 6.11  ■  (Continued) • PLS-SEM aims at maximizing the R 2 values of the endogenous latent variable(s) in the path model. R 2 values should always be considered in the context of the model and the area of application. Excessive R 2 values indicate that the model overfits the data. • The effect size f2 facilitates assessment of an exogenous construct’s contribution to a predictor latent variable’s R 2 value. f2 values of 0.02, 0.15, and 0.35 indicate a predictor construct’s small, medium, or large effect, respectively, on an endogenous construct. • Use PLSpredict to assess a model’s (out-of-sample) predictive power: {

Focus the analysis on a single key target construct.

{

Set k = 10, assuming each subgroup meets the minimum required sample size.

{

Use 10 repetitions.

{

{

2 Qpredict values ≤ 0 indicate the model does not outperform the most naïve benchmark (i.e., the indicator means from the analysis sample).

Compare the RMSE (or the MAE) value obtained by PLS-SEM with the value obtained by LM for each indicator. Check if the PLS-SEM analysis (compared to the LM) yields lower prediction errors in terms of RMSE (or MAE) for all (high predictive power), the majority (medium predictive power), the minority or the same number (low predictive power), or none of the indicators (no predictive power).

• If applicable, engage in model comparisons. Select the model that minimizes the value in BIC compared to the other models in the set. Use Akaike weights to compute each model’s relative likelihood. Consider using CVPAT to test if an alternative model has significantly higher predictive power than an established one.

CASE STUDY ILLUSTRATION— EVALUATION OF THE STRUCTURAL MODEL (STAGE 6) We continue with the extended corporate reputation model as introduced in Chapter 5. If you do not have the PLS path model readily available in SmartPLS 3 (Ringle, Wende, & Becker, 2015), download the file Corporate Reputation .zip from the https://www.pls-sem.net/ website and save it on your hard drive. Then, run the SmartPLS software and click on File → Import project from backup file in the menu. In the box that appears on the screen, locate and open the Corporate Reputation.zip file that you just downloaded. Thereafter, a new project appears with the name Corporate Reputation in the SmartPLS Project Explorer window on the left-hand side. This project contains several models (.splsm files)

210   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

labeled Simple model, Extended model, Redundancy analysis ATTR, Redundancy analysis CSOR, and so forth, plus the data file Corporate reputation data.csv. Next, double-click on Extended model, and the extended PLS path model for the corporate reputation example opens. The assessment of the structural model builds on the results from the standard model estimation, the bootstrapping routine, and the PLSpredict procedure. After running the PLS-SEM algorithm using the same algorithm and missing values settings as in the previous chapters, SmartPLS shows the key results of the model estimation in the Modeling Window (Exhibit 6.12). Per default, we see the path coefficients as well as the R 2 values of the endogenous constructs (shown in the circles). For a more detailed assessment, we need to examine the SmartPLS results report. Following the structural model assessment procedure (Exhibit 6.1), we first need to check the structural model for collinearity issues by examining the VIF values of all sets of predictor constructs in the structural model. To do so, go to Quality Criteria → Collinearity Statistic (VIF) and click on the Inner VIF Values tab. The results table that opens (Exhibit 6.13) shows the VIF values of all combinations of endogenous constructs (represented by the columns) and corresponding exogenous (i.e., predictor) constructs (represented by the rows). Specifically, we assess the following sets of (predictor) constructs for collinearity: (1) ATTR, CSOR, PERF, and QUAL as predictors of COMP (and LIKE); (2) COMP and LIKE as predictors of CUSA; and (3) COMP, LIKE, and CUSA as predictors of CUSL. As can be seen in Exhibit 6.13, all VIF values are clearly below the threshold of 5 and with one exception (i.e., the QUAL predictor) below 3. Since QUAL’s VIF value is close to 3, we conclude that collinearity among the predictor constructs is not a critical issue in the structural model, and we can continue examining the results report. The second step of the structural model assessment procedure (Exhibit 6.1) involves assessing the significance and relevance of the structural model relationships. Starting with the relevance assessment, go to Final Results → Path Coefficients, where we find the path coefficients as shown in the Modeling Window (Exhibit 6.12). Looking at the relative importance of the driver constructs for the perceived competence (COMP), one finds that the customers’ perception of the company’s quality of products and services (QUAL) has the highest path coefficient, followed by the company’s performance (PERF). In contrast, the perceived attractiveness (ATTR) and degree to which the company acts in socially conscious ways (i.e., corporate social responsibility; CSOR) have very little bearing on COMP. These two drivers are, however, of increased importance for establishing a company’s likeability (LIKE). Moving on in the model, we also find that likeability is the primary driver for the customers’ satisfaction and loyalty, as illustrated by the increased path coefficients compared with those of COMP. More interesting, though, is the examination of total effects. Specifically, we can evaluate how strongly each of the four formative driver constructs (ATTR, CSOR, PERF, and QUAL) ultimately influences the key target variable CUSL

211

0.037

0.406

csor_2

csor_3

csor_5

0.416

0.080

0.306

csor_1

csor_4

0.199

0.340

0.194

0.177

perf_5

perf_4

perf_3

perf_2

0.468

0.202

qual_2

perf_1

qual_1

qual_4

attr_1

0.414

0.160

0.201

attr_2

ATTR

qual_6

attr_3

0.658

0.178

0.117

0.059

0.295

0.167

0.086

like_1

0.880

cusa

like_2

0.869

LIKE

0.558

like_3

0.844

0.344

CUSA

0.292

0.436

1.000

0.146

COMP

0.631

0.006

comp_3

0.824 0.821 0.844

comp_2

qual_8

comp_1

qual_7

0.380

0.430

0.398 0.229 0.190

qual_5

QUAL

0.106 −0.005

CSOR

PERF

0.041

qual_3

EXHIBIT 6.12  ■  Results in Modeling Window

0.505

CUSL

0.562

cusl_2 cusl_3

0.917

cusl_1

0.843

0.833

212   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 6.13  ■  VIF Values in the Structural Model

via the mediating constructs COMP, LIKE, and CUSA. Total effects are shown under Final Results → Total Effects in the results report. We read the table shown in Exhibit 6.14 column to row. That is, each column represents a target construct, whereas the rows represent predecessor constructs. For example, with regard to loyalty, we can see that among the four exogenous driver constructs, quality has the strongest total effect on loyalty (0.248), followed by corporate social responsibility (0.105), attractiveness (0.101), and performance (0.089). Therefore, it is advisable for companies to focus on marketing activities that positively influence the customers’ perception of the quality of their products and services. By also taking the construct’s indicator weights into consideration, we can even identify which specific element of quality needs to be addressed. Looking at the outer weights (Final Results → Outer Weights) reveals that qual_6 has the highest outer weight (0.398). This item relates to the survey question “[the company] is a reliable partner for customers.” Thus, marketing managers should try to enhance the customers’ perception of the reliability of their products and services by means of marketing activities. Analysis of structural model relationships shows that several path coefficients (e.g., COMP → CUSL) have rather low values. To assess whether these relationships are significant (Exhibit 6.1), we run the bootstrapping procedure. The extraction of bootstrapping results for the structural model estimates is analogous to the descriptions in the context of the formative measurement model assessment (Chapter 5). To run the bootstrapping procedure, go to Calculate → Bootstrapping in the SmartPLS menu or go to the Modeling Window and click on the Calculate icon, followed by Bootstrapping (note that you first may need to go back to

Chapter 6  ■  Assessing PLS-SEM Results—Part III  

213

EXHIBIT 6.14  ■  Total Effects

the Modeling Window before the Calculate icon appears). We retain all settings for missing value treatment and the PLS-SEM algorithm as in the initial model estimation and select the No Sign Changes option, 10,000 bootstrap samples and select the Complete Bootstrapping option. In the advanced settings, we choose Percentile Bootstrap, Two Tailed testing, and a significance level of 0.05. Next, we click the Start Calculation button. After running the procedure, SmartPLS shows the bootstrapping results for the measurement models and structural model in the Modeling Window. Using the Calculation Results box at the bottom left of the screen, you can choose whether SmartPLS should display t values or p values in the Modeling Window. Exhibit 6.15 shows p values for the structural model relationships as resulting from the bootstrapping procedure. Note that the results will differ from your results and will change again when rerunning bootstrapping as the procedure builds on randomly drawn samples. By going to the bootstrapping report, we get a more detailed overview of the results. The table under Final Results → Path Coefficients provides us with an overview of results, including standard errors, bootstrap mean values, t values, and p values. Clicking on the Confidence Intervals tab in the bootstrapping results report shows the confidence interval as derived from the percentile method (Exhibit 6.16; again, your results will look slightly different because bootstrapping is a random process). In addition, you can access the bias-corrected confidence interval to account for systematic difference between average sample estimates and the population value by clicking the corresponding tab.

214

0.000

csor_5

csor_4

0.000 0.290

0.580

csor_3

csor_2

csor_1

0.000

0.003

perf_5

perf_4

0.001 0.000

0.010

perf_3

perf_2

0.000

0.001

qual_2

perf_1

qual_1

qual_4

attr_1

0.000

qual_6

attr_2

0.002

ATTR attr_3

0.000

0.001

0.092

0.278

0.000

like_1

0.000

cusa

0.000

like_2

0.000

LIKE

0.000

0.000

like_3

0.000

0.000

CUSA

0.000

0.918

comp_3

0.000

0.030

COMP

0.000

comp_2

qual_8

comp_1

qual_7

0.008

0.114

0.000

0.000

0.005 0.000 0.000 0.002

qual_5

QUAL

0.092 0.934

CSOR

PERF

0.426

qual_3

EXHIBIT 6.15  ■  Bootstrap p Values in the Modeling Window

CUSL

cusl_2 cusl_3

0.000

cusl_1 0.000

0.000

215

EXHIBIT 6.16  ■  Bootstrapping Results

216

EXHIBIT 6.17  ■  Bootstrap Samples

Chapter 6  ■  Assessing PLS-SEM Results—Part III  

217

Finally, clicking on the Samples tab in the bootstrapping results report shows the results of each bootstrap run as shown in Exhibit 6.17. The displayed table includes the estimates of all the path coefficients for all 10,000 subsamples. These estimates are used to compute the bootstrapping mean values, standard errors, t values, and p values of all the path coefficients, shown in the Mean, STDEV, T-Values, and P-Values table of the SmartPLS bootstrapping results report. Exhibit 6.18 provides a summary of the path coefficient estimates, t values, p values, and confidence intervals. Again, the user usually reports either the t values (and their significance levels) or the p values or the confidence intervals. We find that all criteria come to the same outcome for the significance of path coefficients. Otherwise, we recommend relying on the bootstrap confidence intervals for significance testing (see Chapter 5 for details). Exhibit 6.18 shows all results only for illustrative purposes.

EXHIBIT 6.18  ■  S  ignificance Testing Results of the Structural Model Path Coefficients

Path t p Coefficients Values Values

a

95% Confidence Intervals

Significancea (p < 0.05)?

ATTR → COMP

0.086

1.579

0.114

[−0.019, 0.194]

No

ATTR → LIKE

0.167

2.666

0.008

[0.037, 0.284]

Yes

COMP → CUSA

0.146

2.173

0.030

[0.015, 0.275]

Yes

COMP → CUSL

0.006

0.103

0.918

[−0.101, 0.113]

No

CSOR → COMP

0.059

1.086

0.278

[−0.044, 0.169]

No

CSOR → LIKE

0.178

3.200

0.001

[0.067, 0.288]

Yes

CUSA → CUSL

0.505

11.960

0.000

[0.419, 0.583]

Yes

LIKE → CUSA

0.436

7.381

0.000

[0.316, 0.550]

Yes

LIKE → CUSL

0.344

6.207

0.000

[0.236, 0.454]

Yes

PERF → COMP

0.295

4.457

0.000

[0.168, 0.428]

Yes

PERF → LIKE

0.117

1.684

0.092

[−0.013, 0.259]

No

QUAL → COMP

0.430

6.400

0.000

[0.300, 0.562]

Yes

QUAL → LIKE

0.380

5.834

0.000

[0.260, 0.515]

Yes

We refer to the bootstrap confidence intervals for significance testing, as described in Chapter 5.

218   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Assuming a 5% significance level, we focus on the 95% bootstrap confidence interval obtained by the percentile approach and find that all relationships in the structural model are significant, except PERF → LIKE, ATTR → COMP, CSOR → COMP, and COMP → CUSL (Exhibit 6.18). These results suggest that companies should concentrate their marketing efforts on enhancing their likeability (by strengthening quality perceptions) rather than their competence to maximize customer loyalty. This is not surprising, considering that customers rated mobile network operators. As their services (provision of network capabilities) are intangible, affective judgments play a much more important role than cognitive judgments for establishing customer loyalty. Furthermore, we learn that ATTR and CSOR only influence LIKE, which is not surprising since these two driver constructs are also more affective in nature. To examine the bootstrapping results for the total effects, go to Final Results → Total Effects. Exhibit 6.19 summarizes the results for the total effects of the exogenous constructs ATTR, CSOR, PERF, and QUAL on the target constructs CUSA and CUSL taken from the bootstrapping results report. As can be seen, all total effects are significant at a 5% level.

EXHIBIT 6.19  ■  Significance Testing Results of the Total Effects

a

t Values

p Values

95% Confidence Intervals

Significancea (p < 0.05)?

0.085

2.817

0.005

[0.024, 0.144]

Yes

ATTR → CUSL

0.101

2.740

0.006

[0.026, 0.172]

Yes

CSOR → CUSA

0.086

3.142

0.002

[0.034, 0.143]

Yes

CSOR → CUSL

0.105

3.107

0.002

[0.042, 0.175]

Yes

PERF → CUSA

0.094

2.454

0.014

[0.021, 0.172]

Yes

PERF → CUSL

0.089

2.034

0.042

[0.005, 0.177]

Yes

QUAL → CUSA

0.228

6.189

0.000

[0.161, 0.306]

Yes

QUAL → CUSL

0.248

5.788

0.000

[0.171, 0.339]

Yes

Total Effect ATTR → CUSA

We refer to the bootstrap confidence intervals for significance testing, as described in Chapter 5.

Chapter 6  ■  Assessing PLS-SEM Results—Part III  

219

Next, we turn our attention to the assessment of the model’s explanatory power as called for in the third step of the structural model assessment procedure (Exhibit 6.1). To do so, we go back to the SmartPLS results report as produced after running the PLS-SEM algorithm (not bootstrapping). To start with, we examine the R² values of the endogenous latent variables, which are available under Quality Criteria → R Square (select the Matrix view). Following our rules of thumb, the R 2 values of COMP (0.631), CUSL (0.562), and LIKE (0.558) can be considered moderate, whereas the R² value of CUSA (0.292) is rather weak. To obtain the effect sizes f 2 for all structural model relationships, go to Quality Criteria → f Square (select the Matrix view). Exhibit 6.20 shows the f 2 values for all combinations of endogenous constructs (represented by the columns) and corresponding exogenous (i.e., predictor) constructs (represented by the rows). For example, LIKE has a medium effect size of 0.159 on CUSA and of 0.138 on CUSL. On the contrary, COMP has no effect on CUSA (0.018) or CUSL (0.000). Please note these results differ from a manual computation of the f 2 values by 2 2 and Rexcluded . This difusing the aforementioned equation with values for Rincluded ference results because SmartPLS uses the latent variable scores of the model that includes all latent variables and then internally excludes latent variables to obtain 2 . On the contrary, when manually computing the f 2 values by estimatthe Rexcluded ing the model with and without a latent variable, the model changes and, thus, so do the latent variable scores. Hence, the difference of the manually computed f 2 values results from the changes in the latent variable scores due to a model modification that is, however, incorrect.

EXHIBIT 6.20  ■  f 2 Effect Sizes

220   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

The fourth step of the structural model assessment procedure (Exhibit 6.1) requires assessing the model’s predictive power using PLSpredict. To initiate the procedure, go to Calculate → PLS Predict in the SmartPLS menu or go to the Modeling Window and click on the Calculate icon, followed by PLS Predict. In the dialog box that opens, we retain the default settings. That is, we run PLSpredict with 10 folds and 10 repetitions. To initiate the analysis, click on Start Calculation. Exhibit 6.21 shows the PLSpredict results report. Note that because the partition of the data in folds relies on a random process, your results will look slightly different from those shown in the results report. We focus our analysis on the model’s key target construct CUSL and consider the RMSE as the default metric for interpreting the prediction error of the construct’s indi2 cators. In an initial step, we interpret the Q predict values. The analysis shows 2 values that all three indicators (i.e., cusl_1, cusl_2, and cusl_3) have Q predict larger than zero, suggesting that the PLS path model outperforms the most naïve benchmark. The following analyses require comparing the RMSE values produced by the PLS-SEM analysis with those produced by the naïve LM benchmark model. To access the LM benchmark model’s results, click on the LM tab in the results report. Exhibit 6.22 shows the corresponding results. Again, note that because PLSpredict relies on a random process, your results will look slightly different from those presented in the results report. Comparing the RMSE values, we find that the PLS-SEM analysis produces smaller prediction errors

EXHIBIT 6.21  ■  PLSpredict Results Report (PLS-SEM)

Chapter 6  ■  Assessing PLS-SEM Results—Part III  

221

EXHIBIT 6.22  ■  PLSpredict Results Report (LM Benchmark Model)

(i.e., smaller RMSE values) than the LM for all three CUSL indicators. Specifically, the analysis produces the following RMSE values (PLS-SEM vs. LM): • cusl_1: 1.300 vs. 1.312, • cusl_2: 1.524 vs. 1.536, and • cusl_3: 1.531 vs. 1.564. These results suggest the model has high predictive power as the PLSSEM analysis outperforms the naïve LM benchmark model for all CUSL indicators. Note that the absolute size of the differences in RMSE values is of secondary importance for two reasons. First, the size of the RMSE values largely depend on the measurement scale of the indicators. As the CUSL indicators are measured on 7-point Likert scales, the range of possible RMSE differences is quite limited. Second, the RMSE values generated by PLSpredict are highly stable. Hence, even marginal differences in the RMSE values are typically significant. In a final step, we compare different configurations of the reputation model. Drawing on Danks, Sharma, and Sarstedt (2020) and Sharma et al. (2021), we compare the original model that serves as the basis for our prior analyses (Model 1), with two more complex variants in which the four driver constructs also relate to CUSA (Model 2) and CUSL (Model 3). Exhibit 6.23 shows the three models under consideration.

222

ATTR

CSOR

PERF

QUAL

CUSA

ATTR

CSOR

PERF

QUAL

Model 2

LIKE

COMP

EXHIBIT 6.23  ■  Model Comparisons

CUSL

CUSA

ATTR

CSOR

PERF

QUAL

Model 1

LIKE

COMP

Model 3

LIKE

COMP

CUSL

CUSA

CUSL

Chapter 6  ■  Assessing PLS-SEM Results—Part III  

223

To compare the models, we estimate each model using the PLS-SEM algorithm and examine the BIC criterion, which can be accessed in the results report under Quality Criteria → Model Selection Criteria. EXHIBIT 6.24  ■  BIC Values

BIC Akaike weights (relative model likelihoods)

Model 1

Model 2

Model 3

−261.602

−261.603

−245.038

49.98%

50.01%

0.01%

Comparing the BIC values, we find that Model 2 yields the lowest BIC value, closely followed by Model 1 and, with a greater difference, Model 3. Considering the marginal difference in BIC values between Models 1 and 2, however, the evidence in favor of the more complex model, which relates all four driver constructs to CUSA is not very strong. To further explore the difference, we compute the BIC-based Akaike weights for the three models. To do so, we first compute each delta score, which gives 0.001 for Model 1, 0 for Model 2, and 16.565 for Model 3. We then compute exp(−0.5*delta) for each model, which gives 0.9995 for Model 1, 1 for Model 2, and 0.0003 for Model 3. Dividing each value by (0.9995 + 1 + 0.0003) gives the Akaike weights shown in Exhibit 6.24. Comparing the weights, we find a marginally higher relative likelihood for Model 1 over Model 2. However, considering that Model 1 is more parsimonious, we would opt for this model rather than the more complex variant Model 2. Furthermore, if you like use CVPAT to test the original corporate reputation model’s predictive power against model alternatives, the following GitHub repository provides a description of this example and its CVPAT’s R code: https://github.com/ECONshare/CVPAT.

Summary •

Understand the concept of model fit in a PLS-SEM context. The notion of model fit as known from CB-SEM is not transferrable to PLS-SEM as the method follows a different aim when estimating model parameters (i.e., maximizing the explained variance instead of minimizing the divergence between covariance matrices). Nevertheless, research has brought forward several PLS-SEM–based model fit measures such as SRMR, RMStheta, and the exact fit test, which, however, have proven ineffective in detecting model misspecifications in settings commonly encountered in applied research. Instead, structural model assessment in PLSSEM focuses on evaluating the model’s explanatory and predictive power.



Assess the path coefficients in the structural model. Path coefficients are assessed in terms of their significance and relevance. Testing for significance

(Continued)

224   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued) requires application of the bootstrapping routine using the percentile method. Results examination and reporting draws on the 95% bootstrap confidence intervals. Next, the relative sizes of path coefficients are compared, as well as the total effects and f² effect sizes. By interpreting these results, researchers can identify the key constructs with the highest relevance to explain the endogenous latent variable(s) in the structural model. •

Evaluate the model’s explanatory power. To assess a model’s explanatory power (also referred to as in-sample predictive power), researchers should rely on the coefficient of determination (R2). The R² values represent the amount of explained variance of the endogenous constructs in the structural model. A well-developed path model to explain certain key target constructs (e.g., customer satisfaction, customer loyalty, or technology acceptance) should deliver sufficiently high R² values. Generally, R2 values should always be interpreted in the context of the model. However, the exact interpretation of the R² value depends on the particular research context and model complexity. Excessive R2 values indicate that the model overfits the data.



Evaluate the model’s predictive power using the PLSpredict procedure. PLSpredict is a holdout sample-based procedure that facilitates assessing a model’s (out-of-sample) predictive power. Based on the concept of k-fold crossvalidation, PLSpredict divides the sample data into k subgroups (referred to as folds) of roughly the same size. The procedure then combines k-1 folds into a training sample that is used to estimate the model. The remaining fold serves as a holdout sample that is used to assess the model’s predictive power. Per default, PLSpredict should be run with k = 10 folds and r = 10 repetitions. To quantify the degree of prediction error, researchers should primarily draw on the RMSE metric or, in case of highly non-symmetrically distributed prediction errors, the MAE. To assess the model’s predictive power, researchers need 2 to first evaluate the Qpredict statistic, followed by a comparison of the RMSE (or MAE) values produced by the PLS-SEM analysis and the LM estimations.



Understand the concept of model comparisons and metrics used for selecting the best model. Some research situations call for the comparison of alternative models. To compare different model configurations and select the best model, the BIC criterion should be used. The model which yields the smallest BIC value is considered the best model in the set. BIC-based Akaike weights offer further evidence for the relative likelihood of a model compared to alternative models in the set. Consider using CVPAT to assess whether an alternative model offers significantly better out-of-sample predictive power than an established model.



Learn how to report and interpret structural model results. By extending the example on corporate reputation to include additional constructs, you can learn to systematically apply the structural model assessment criteria. The SmartPLS software provides all relevant results for the evaluation of

Chapter 6  ■  Assessing PLS-SEM Results—Part III  

225

the structural model. The tables and figures for this example described in the chapter demonstrate how to correctly interpret and report the PLSSEM results. The hands-on example not only summarizes the previously introduced concepts but also provides additional insights for practical applications of PLS-SEM.

Review Questions 1. What are the key criteria for assessing the results of the structural model? 2. Why do you assess the significance of path coefficients? 3. What is an appropriate level of the R² value? 4. How do you assess the predictive power of a model? 5. Which criteria facilitate selecting the best model among a set of alternative models?

Critical Thinking Questions 1. What problems do you encounter in evaluating the structural model results of PLS-SEM? How do you approach this task? 2. Why is the use of model fit measures such as SRMR and exact fit tests not advisable in a PLS-SEM context? 3. Why use the bootstrapping routine for significance testing? Explain the algorithm options and parameters you need to select. 4. Explain the f² effect size. 5. How does the PLSpredict procedure function and what are the consequences of increasing and decreasing the number of folds?

Key Terms Akaike weights  206 Bayesian information criterion (BIC) 205 Blindfolding 197

Coefficient of determination (R2) 195 Critical value  192 Exact fit test  189 Explained variance  195 (Continued)

226   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued) Explanatory power  194 f ² effect size  195 Geweke and Meese criterion (GM)  205 Goodness-of-fit index (GoF)  189 Holdout sample  197 Hypothesized relationships  192 In-sample predictive power  195 k-fold cross-validation  198 Linear regression model (LM) benchmark 201 MAE 200 Mean absolute error (MAE)  200 Model comparisons  205 Model overfit  195 Model parsimony  190 Omission distance D 197 One-tailed test  192 Out-of-sample predictive power  196 Parsimonious models  205 PLSpredict procedure  196

Prediction error  201 Prediction statistics  200 Predictive power  196 2 Qpredict  201 Q2 statistic 197 R2 value 195 Relevance of significant relationships 193 RMStheta 189 Root mean square residual covariance (RMStheta) 189 Root mean square error (RMSE) 200 Significance testing  217 Standard error  192 Standardized root mean square residual (SRMR)  189 Standardized values  192 Total effect  194 Training sample  197 Two-tailed test  192

Suggested Readings Chin, W. W. (2010). How to write up and report PLS analyses. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 655–690). Berlin: Springer. Chin, W. W., Cheah, J.-H., Liu, Y., Ting, H., Lim, X.-J., & Cham, T. H. (2020). Demystifying the role of causal-predictive modeling using partial least squares structural equation modeling in information systems research. Industrial Management & Data Systems, 120(12), 2161–2209. Gefen, D., Rigdon, E. E., & Straub, D. W. (2011). Editor’s comment: An update and extension to SEM guidelines for administrative and social science research. MIS Quarterly, 35(2), iii–xiv. Hair, J. F. (2020). Next generation prediction metrics for composite-based PLS-SEM. Industrial Management & Data Systems, 121(1), 5–11.

Chapter 6  ■  Assessing PLS-SEM Results—Part III  

227

Hair, J. F., Binz Astrachan, C., Moisescu, O. I., Radomir, L., Sarstedt, M., Vaithilingam, S., & Ringle, C. M. (2020). Executing and interpreting applications of PLS-SEM: Updates for family business researchers. Journal of Family Business Strategy, forthcoming. Hair, J. F., Howard, M. C., & Nitzl, C. (2020). Assessing measurement model quality in PLS-SEM using confirmatory composite analysis. Journal of Business Research, 109, 101–110. Hair, J. F., Risher, J. J., Sarstedt, M., & Ringle, C. M. (2019). When to use and how to report the results of PLS-SEM. European Business Review, 31(1), 2–24. Hair, J. F., Sarstedt, M., & Ringle, C. M. (2019). Rethinking some of the rethinking of partial least squares. European Journal of Marketing, 53(4), 566–584. Liengaard, B., Sharma, P. N., Hult, G. T. M., Jensen, M. B., Sarstedt, M., Hair, J. F., & Ringle, C. M. (2021). Prediction: Coveted, yet forsaken? Introducing a cross-validated predictive ability test in partial least squares path modeling. Decision Sciences, 52(2), 362–392. Manley, S. C., Hair, J. F., Williams, R. I., & McDowell, W. C. (2020). Essential new PLSSEM analysis methods for your entrepreneurship analytical toolbox. International Entrepreneurship and Management Journal, forthcoming. Sarstedt, M., Ringle, C. M., Cheah, J.-H., Ting, H., Moisescu, O. I. & Radomir, L. (2020). Structural model robustness checks in PLS-SEM. Tourism Economics, 26(4), 531–554. Sharma, P. N., Shmueli, G., Sarstedt, M., Danks, N., & Ray, S. (2021). Predictionoriented model selection in partial least squares path modeling. Decision Sciences, 52(3), 567–706. Sharma, P. N., Sarstedt, M., Shmueli, G., Kim, K. H., & Thiele, K. O. (2019). PLS-based model selection: The role of alternative explanations in Information Systems research. Journal of the Association for Information Systems, 40(4), 346–397. Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. Shmueli, G., Ray, S., Velasquez Estrada, J. M., & Chatla, S. B. (2016). The elephant in the room: Evaluating the predictive performance of PLS models. Journal of Business Research, 69(10), 4552–4564. Shmueli, G., Sarstedt, M., Hair, J. F., Cheah, J.-H., Ting, H., & Ringle, C. M. (2019). Predictive model assessment in PLS-SEM: Guidelines for using PLSpredict. European Journal of Marketing, 53(11), 2322–2347. Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y. M., & Lauro, C. (2005). PLS path modeling. Computational Statistics & Data Analysis, 48(1), 159–205.

7 MEDIATOR AND MODERATOR ANALYSIS LEARNING OUTCOMES 1. Understand the basic concepts of mediation in a PLS-SEM context. 2. Execute a mediation analysis using SmartPLS. 3. Comprehend the basic concepts of moderation in a PLS-SEM context. 4. Use the SmartPLS software to run a moderation analysis. 5. Get to know the principles of a moderated mediation.

CHAPTER PREVIEW Structural model relationships in PLS-SEM imply that exogenous constructs directly affect endogenous constructs without any systematic influences of other variables. In many instances, however, this assumption does not hold. Specifically, a third variable in the analysis can change our understanding of the nature of the direct relationships under consideration. The two most prominent examples of such extensions include mediation and moderation. Mediation occurs when a mediator construct intervenes between two other directly related constructs. More precisely, a change in the exogenous construct results in a change of the mediator construct, which, in turn, changes

228

Chapter 7  ■  Mediator and Moderator Analysis 

229

the endogenous construct. Analyzing the strength of the mediator construct’s relationships with the other constructs enables researchers to substantiate the mechanisms that underlie the cause–effect relationship between an exogenous construct and an endogenous construct. In the simplest form, the analysis considers only one mediator construct, but the path model can also include several mediator constructs. The other example of a third variable potentially changing a direct relationship in a structural model is moderation. When moderation is present, the strength or even the direction of a relationship between two constructs depends on a third variable (i.e., the moderator). In other words, the relationship between two constructs changes depending on the value of the moderator variable. As an example, the direct relationship between the customer satisfaction and customer loyalty constructs is not the same for all customers but differs depending on moderators such as income, age, and switching costs. For example, the strength of the relationship between customer satisfaction and customer loyalty may be substantially stronger for older customers than for younger customers. Therefore, moderation can (and should) be seen as a means to better understand and account for heterogeneity in the data. Mediation and moderation are similar in that they describe situations in which the relationship between two constructs depends on a third variable. There are fundamental differences, however, in terms of their theoretical foundation, modeling, and interpretation. In this chapter, we explain and differentiate between mediation and moderation, as well as illustrate their implementation using the corporate reputation PLS path model.

MEDIATION Introduction Mediation occurs when a third mediator construct intervenes between two other related constructs. More precisely, a change in the exogenous construct causes a change in the mediator construct, which, in turn, results in a change in the endogenous construct in the PLS path model. Thereby, a mediator construct governs the nature (i.e., the underlying mechanism or process) of the relationship between two constructs. Substantial a priori theoretical support is a key requirement to explore a meaningful mediating effect. When that support is present, mediation can be a useful statistical analysis, if carried out properly. Consider Exhibit 7.1 for an illustration of a mediating effect in terms of direct and indirect effects. Direct effects are the relationships linking two constructs with a single arrow. Indirect effects are those relationships that involve a sequence of relationships with at least one intervening construct involved. Thus, an indirect effect is a sequence of two or more direct effects and is represented visually

230   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 7.1  ■  General Mediation Model

Y2

p1

Y1

p2

p3

Y3

by multiple arrows. Exhibit 7.1 shows both a direct effect p3 between Y1 and Y3 and an indirect effect of Y1 on Y3 in the form of a Y1 → Y2 → Y3 sequence. The indirect effect of p1 ∙ p2 represents the mediating effect of the construct Y2 on the relationship between Y1 and Y3. To understand the conceptual basis of mediation, consider the following example of the relationship between seawater temperature and the number of incidents (e.g., swimmers needing to be rescued). We could, for example, hypothesize the following negative relationship (H1): The higher the seawater temperature (Y1), the lower the number of incidents (Y3), measured by swimmers that need to be rescued. The rationale behind this hypothesis is that the body temperature declines much faster in cold water and, thus, exhausts swimmers more quickly. Therefore, they are more likely to misjudge their chances of swimming out in the sea and returning safely. Hence, in accordance with H1, we assume that swimming in warmer water is less dangerous. The logic of this simple cause– effect relationship is shown in Exhibit 7.2.

EXHIBIT 7.2  ■  Simple Cause–Effect Relationship

Seawater Temperature (Y1)



Number of Incidents (Y3)

Chapter 7  ■  Mediator and Moderator Analysis 

231

Many coastal cities and lifeguard organizations have daily empirical data readily available on the seawater temperature and the number of swimming incidents over many years. When using these data and estimating the relationship (i.e., the correlation between the seawater temperature and the number of incidents), one generally obtains a statistically significant positive result for the hypothesized relationship, but sometimes the relationship is not significant. That is, higher water temperatures typically lead to a higher number of swimming incidents. We would, therefore, likely conclude that swimming in warmer water is more dangerous. But since we had theoretical reasoning suggesting a negative relationship, there must be something we do not understand about the model. This finding reminds us that simple data exploration may result in misleading and false conclusions. A priori theoretical assumptions and logical reasoning are key requirements when applying multivariate analysis techniques. When the results do not match the expectations, researchers may find an explanation based on (1) theoretical considerations, (2) specific data-related issues, or (3) technical peculiarities of the statistical method used. In this example, theoretical considerations enable us to adjust and further examine our previous considerations. To better understand the relationship between the seawater temperature (Y1) and the number of incidents (Y3), it is logical to include the number of swimmers at the selected shoreline (Y2) in the modeling considerations. More specifically, the higher the seawater temperature, the more swimmers there will be at the specific beach (H2). Moreover, the more swimmers there are in the water, the higher the likelihood of incidents occurring in terms of swimmers needing to be rescued (H3). Exhibit 7.3 illustrates this more complex indirect cause– effect relationship.

EXHIBIT 7.3  ■  Complex Cause–Effect Relationship

Number of Swimmers (Y2) +

Seawater Temperature (Y1)

+

Number of Incidents (Y3)

232   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Empirical data might substantiate the positive effects illustrated in Exhibit 7.3. When the complex cause–effect relationship is examined, however, it is still possible to conclude that swimming in warmer water is more dangerous than swimming in colder water since the two relationships both have a positive sign. We therefore combine the simple and the more complex cause–effect relationship models in the mediation model shown in Exhibit 7.4, which includes the mediator construct (Y2). In addition to H1, H2, and H3, we could propose the following hypothesis (H4): The direct relationship between the seawater temperature and number of incidents is mediated by the number of swimmers. If we use the available data to empirically estimate the model in Exhibit 7.4, we may obtain the estimated relationships with the expected signs. The counterintuitive positive relationship between the seawater temperature and the number of incidents becomes negative, as expected, when extending the model by the mediator construct number of swimmers. In this mediation model, the number of swimmers represents an appropriate mechanism to explain the relationship between the seawater temperature and the number of incidents. Hence, the positive indirect effect via the mediator construct reveals the “true” relationship between the seawater temperature and the number of incidents.

EXHIBIT 7.4  ■  Mediation Model

Number of Swimmers (Y2) +

Seawater Temperature (Y1)

+



Number of Incidents (Y3)

This example points out that mediation is a challenging field. An estimated cause–effect relationship may not be the “true” effect because a systematic influence—a certain phenomenon (i.e., a mediator)—has not been accounted for in the theoretical model. For instance, Cepeda-Carrión, Nitzl, and Roldán (2017) and Nitzl, Roldán, and Cepeda Carrión (2016) offer additional examples of mediation analyses in PLS-SEM. Many PLS path models include mediating effects, but in previous studies they have often not been explicitly hypothesized and tested (Hair, Sarstedt, Ringle,

Chapter 7  ■  Mediator and Moderator Analysis 

233

& Mena, 2012). Only when the possible mediation is theoretically substantiated and also empirically tested can the nature of the cause–effect relationship be fully and accurately understood. Again, theory is always the foundation of empirical analyses (Memon, Cheah, Ramayah, & Chuah, 2018). A systematic mediation analysis builds on the theoretically established model and hypothesized relationships, including the mediating effect. To begin with, it is important to estimate and evaluate the measurement models and structural model, including all the mediator constructs. After this step follows the characterization of the mediation analysis’ outcomes and testing of the mediating effects. We address these three steps in the following sections, before addressing multiple mediation analysis and providing a case study illustration of mediation analysis.

Measurement and Structural Model Evaluation in Mediation Analysis Evaluating a mediation model requires that all quality criteria of the measurement models as well as the structural model have been met, as discussed in Chapters 4–6. The analysis begins with assessment of the reflective and formative measurement models. For example, a lack of reliability in the case of reflective mediator constructs has a strong effect on the estimated relationships in the PLS path model (i.e., the indirect paths can become considerably smaller than expected). For this reason, it is important to ensure the reflectively measured mediator constructs exhibit a high level of reliability. After establishing reliable and valid measurement models for the mediator as well as the exogenous and endogenous constructs, the next step is to consider all structural model evaluation criteria. For instance, it is important to ensure that collinearity is not at a critical level, which, otherwise, may entail substantially biased path coefficients. If high levels of collinearity are present, the direct effect may become nonsignificant, suggesting nonmediation (see next section) even though, for example, a complementary (partial) mediation may be present. Likewise, high collinearity levels may result in unexpected sign changes (e.g., from positive to negative), making correct interpretation of different mediation types problematic. Moreover, a lack of the mediator construct’s discriminant validity with the exogenous or endogenous construct could lead to a strong and significant but substantially biased indirect effect, resulting in incorrect implications regarding the mediation. After the relevant assessment criteria for reflective and formative measurement models as well as the structural model have been met, the mediator analysis follows.

Types of Mediating Effects The question of how to test mediation has attracted considerable attention in methodological research over the past decades. Three decades ago, Baron and Kenny (1986) presented an approach to mediation analysis, which many

234   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

researchers still routinely draw upon. More recent research, however, points to conceptual and methodological problems with Baron and Kenny’s (1986) approach (e.g., Hayes, 2018). Against this background, our description builds on Zhao, Lynch, and Chen (2010), who offer a synthesis of prior research on mediation analysis and corresponding guidelines for future research (also see Nitzl, Roldán, & Cepeda-Carrión, 2016; Cepeda-Carrión, Nitzl, & Roldán, 2017). The authors characterize several types of mediating effects, which they classify into two groups. The first group of effects indicates the absence of a mediating effect, referred to as nonmediation: • Direct-only nonmediation—the direct effect is significant but the indirect effect is not. • No-effect nonmediation—neither the direct nor the indirect effect are significant. In the case of a mediating effect being present, Zhao, Lynch, and Chen (2010) distinguish three types of mediation: • Complementary mediation—the indirect effect and the direct effect are both significant and point in the same direction. • Competitive mediation—the indirect effect and the direct effect are both significant but point in opposite directions. • Indirect-only mediation—the indirect effect is significant but the direct effect is not. That is, the mediation analysis may show that mediation does not exist at all (i.e., direct-only nonmediation and no-effect nonmediation) or, in case of a mediation, the mediator construct accounts either for some (i.e., complementary and competitive mediation) or for all of the observed relationships between two latent variables (i.e., indirect-only mediation). In that sense, Zhao, Lynch and Chen’s (2010) procedure closely corresponds to Baron and Kenny’s (1986) concept of partial mediation (i.e., complementary mediation), suppressor variable (i.e., competitive mediation), and full mediation (i.e., indirect-only mediation). Testing for the type of mediation in a PLS path model requires running a series of analyses, which Exhibit 7.5 illustrates based on the general mediation model shown in Exhibit 7.1. The first step addresses the significance of the indirect effect (p1 · p2) via the mediator construct (Y2). If the indirect effect is not significant (right-hand side of Exhibit 7.5), we conclude that Y2 does not function as a mediator in the tested relationship. While this result may seem disappointing at first sight, as it does not provide empirical support for a hypothesized mediating relationship, further analysis of the direct effect p3 can point to as yet undiscovered mediators (other than Y2). Specifically, if the direct effect is significant,

Chapter 7  ■  Mediator and Moderator Analysis 

235

we can conclude it is possible there is an omitted mediator, which potentially explains the relationship between Y1 and Y3 (direct-only nonmediation). If the direct effect is also nonsignificant (no-effect nonmediation), however, we have to conclude that our theoretical framework is flawed. In this case, we should go back to theory and reconsider the path model setup. Note that this situation can occur despite a significant total effect of Y1 on Y3. EXHIBIT 7.5  ■  Mediation Analysis Procedure

Yes

Yes

Yes

Is p1 · p2 · p3 positive?

Is p3 significant?

Is p1 · p2 significant?

No

No

Yes

Is p3 significant?

No

No

Complementary Competitive Indirect-only (partial mediation) (partial mediation) (full mediation)

Direct-only (no mediation)

No effect (no mediation)

We may, however, find general support for a hypothesized mediating relationship in our initial analysis based on a significant indirect effect (left-hand side of Exhibit 7.5). As before, our next interest is with the significance of the direct effect p3. If the direct effect is nonsignificant, we face the situation of indirectonly mediation. In this case, the entire effect of Y1 on Y3 is explained via the mediator construct. When both the direct and indirect effects are significant, we can distinguish between complementary and competitive mediation. Complementary mediation describes a situation in which the direct effect p3 and the indirect effect p1 · p2 point in the same direction (both positive). Hence, the product of the direct effect and the indirect effect (i.e., p1 · p2 · p3) is positive (Exhibit 7.5). While providing support for the hypothesized mediating relationship, complementary mediation also provides a cue that another mediator may have been omitted whose indirect path has the same direction as the direct effect. On the contrary, in competitive mediation—also referred to as inconsistent mediation (MacKinnon, Krull, & Lockwood, 2000)—the direct effect p3 and either indirect effect p1 or p2 have opposite signs. In other words, the product

236   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

of the direct effect and the indirect effect p1 · p2 · p3 is negative (Exhibit 7.5). Competitive mediation provides support for the hypothesized mediating effect, but it also suggests that another mediator may be present whose indirect effect’s sign equals that of the direct effect. It is important to note that in competitive mediation, the mediating construct acts as a suppressor variable, which may substantially decrease the magnitude of the total effect of Y1 on Y3. Therefore, when competitive mediation occurs, researchers need to carefully analyze the theoretical substantiation of all effects involved. Our introductory example in Exhibit 7.4 on the relationship between seawater temperature and number of incidents as mediated by the number of swimmers constitutes a case of competitive mediation. The opposite signs of the direct and indirect effects can offset each other, so that the total effect of seawater temperature on the number of incidents is relatively small.

Testing Mediating Effects Prior testing of the significance of mediating effects relied on the Sobel (1982) test, which should no longer be used. The Sobel test compares the direct relationship between the independent variable and the dependent variable with the indirect relationship between the independent variable and dependent variable that includes the mediation construct (Helm, Eggert, & Garnefeld, 2010). The Sobel test assumes a normal distribution that is not consistent, however, with the nonparametric PLS-SEM method. Moreover, the parametric assumptions of the Sobel test usually do not hold for the indirect effect p1 · p2, since the multiplication of two normally distributed coefficients results in a nonnormal distribution of their product. Furthermore, the Sobel test requires unstandardized path coefficients as input for the test statistic and lacks statistical power, especially when applied to small sample sizes. For these reasons, research has dismissed the Sobel test for evaluating mediation analysis results (e.g., Preacher & Hayes, 2008a; Sattler, Völckner, Riediger, & Ringle, 2010), especially when the model includes latent variables, whose measurement error may have detrimental effects on the model estimates. Instead of using the Sobel test, researchers should bootstrap the sampling distribution of the indirect effect. This approach has also been proposed in a regression context (Preacher & Hayes, 2004, 2008a) and has been implemented in Hayes’s SPSS-based PROCESS macro (http://www.processmacro.org/). Bootstrapping makes no assumptions about the shape of the variables’ distribution or the sampling distribution of the statistics and can be applied to small sample sizes with more confidence. The approach is therefore perfectly suited for the PLS-SEM method and implemented in the SmartPLS software. In addition, bootstrapping the indirect effect yields higher levels of statistical power compared to the Sobel (1982) test. Note that researchers should not use the PROCESS routine proposed for regression models to analyze mediation effects in PLS-SEM (i.e., in a subsequent tandem analysis by using the latent variable

Chapter 7  ■  Mediator and Moderator Analysis 

237

scores obtained by PLS-SEM to run a regression model in PROCESS), since bootstrapping in PLS-SEM provides all relevant results with higher quality than PROCESS (Sarstedt, Hair, Nitzl, Ringle, & Howard, 2020). We discuss this aspect in greater detail in Exhibit 7.6. EXHIBIT 7.6  ■  Mediation Analysis Using PROCESS Versus PLS-SEM PROCESS is a macro available for SPSS and SAS that simplifies the estimation of mediating effects in an ordinary least squares regression framework. Rather than having to manually set up a specific model using syntax language, researchers using PROCESS can select from a broad range of models documented in Hayes (2018). For each model, researchers must set specific arguments to identify each variable’s role in the model (e.g., dependent, independent, mediator). PROCESS then uses bootstrapping to derive inferential statistics for the direct and indirect effects in the specified model. As the use of PROCESS has become standard in estimating mediating effects in regression-based models, some researchers have also used the macro to supplement their PLS-SEM analysis. However, as Sarstedt, Hair, Nitzl, Ringle, and Howard (2020) note, PROCESS is largely unsuitable for estimating complex cause–effect models with latent variables, because such analyses (1) are confined to estimating singular model structures in isolation, and (2) ignore the diluting effect of measurement error. With respect to the first limitation, PROCESS runs separate regressions of each dependent variable on its associated independent variables. These piecewise regressions (executed separately) therefore ignore other elements in the model, including potential antecedent or outcome variables of the variables included in the PROCESS mediation model. PLS-SEM, on the other hand, computes the construct scores, which serve as input for the structural model regressions in an iterative process that considers the entire model structure (Chapter 3). As a result, the iterative nature of the parameter estimations in PLS-SEM considers how the parameter estimations in the partial regressions impact each other. With regard to the second limitation, estimating mediation models that include measures of theoretical concepts using PROCESS first requires researchers to compute construct scores. Doing so typically involves computing the sum or averages of the indicators used to measure a construct. However, this practice is problematic as it ignores the attenuating effect of measurement error. Numerous studies have shown that failure to correct for measurement error produces a combination of under- and over-estimation in the estimates of the entire structural model (e.g., Cole & Preacher, 2014; Hair, Hult, Ringle, Sarstedt, & Thiele, 2017; Yuan, Wen, & Tang, 2020). In contrast, PLS-SEM permits the elimination of measurement error in the analyses. In fact, the need to account for measurement error when estimating relationships among latent—as opposed to observed—variables constitutes the primary advantage of this second-generation method over first-generation techniques like regression analysis, analysis of variance, and many others (Chapter 1). (Continued)

238   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 7.6  ■  (Continued) Of course, researchers may extract the construct scores from their PLS-SEM analysis and use these as input for PROCESS, but such a tandem analysis offers very little additional insights, particularly since modern PLS-SEM software programs such as SmartPLS (Ringle, Wende, & Becker, 2015) or R packages such as SEMinR (Ray, Danks, Velasquez Estrada, Uanhoro, & Bejar, 2020) include all relevant outputs such as bootstrap confidence intervals for interpreting complex mediating effects. In light of the above, there is no need for a tandem use of PLS-SEM and PROCESS. Instead, the analysis of mediation models involving latent variables should draw on PLS-SEM alone.

Multiple Mediation In the previous sections, we considered the case of a single mediator construct, which accounts for the relationship between an exogenous and an endogenous construct. Analyzing such a model setup is also referred to as single mediation analysis. More often than not, however, when evaluating structural models, exogenous constructs exert their influence through more than one mediator construct. This situation requires running a multiple mediation analysis for the hypothesized relationships via more than one mediator in PLS-SEM (CepedaCarrión, Nitzl, & Roldán, 2017; Nitzl, Roldán, & Cepeda-Carrión, 2016). For an example of multiple mediation with two mediators, consider Exhibit 7.7. In this model, p3 represents the direct effect between the exogenous construct and the endogenous construct. The specific indirect effect of Y1 on Y3 via mediator Y2 is quantified as p1 ∙ p2. For the second mediator Y4, the specific indirect effect is given by p4 ∙ p5. In addition, we can consider the specific indirect effect of Y1 on Y3 through both mediators, Y2 and then Y4, which is quantified as p1 ∙ p6 ∙ p5. The total indirect effect is the sum of the specific indirect effects (i.e., p1 ∙ p2 + p4 ∙ p5 + p1 ∙ p6 ∙ p5). Finally, the total effect of Y1 on Y3 is the sum of the direct effect and the total indirect effects (i.e., p3 + p1 ∙ p2 + p4 ∙ p5 + p1 ∙ p6 ∙ p5). To test a model such as the one shown in Exhibit 7.7, researchers may be tempted to run a set of separate mediation analyses, one for each proposed mediator Y2 and Y4. As Preacher and Hayes (2008a, 2008b) point out, however, this approach is problematic for at least two reasons. First, one cannot simply add up the indirect effects calculated in several single mediation analyses to derive the total indirect effect, as the mediators in a multiple mediation model typically will be correlated. As a result, the specific indirect effects, estimated using several single mediation analyses, will be biased and will not sum up to the total indirect effect through the multiple mediators. Second, hypothesis testing and confidence intervals calculated for specific indirect effects may not be accurate due to the omission of other, potentially important, mediators. By considering

Chapter 7  ■  Mediator and Moderator Analysis 

239

EXHIBIT 7.7  ■  Multiple Mediation Model

Y2 p1

p2 p6 p3

Y1

p4

Y3

p5 Y4

all mediators at the same time in one model, we gain a more complete picture of the mechanisms through which an exogenous construct affects an endogenous one. Hence, we recommend including all relevant mediators in the model and, thus, analyzing their hypothesized effects simultaneously (Sarstedt, Hair, Nitzl, Ringle, & Howard, 2020). In a multiple mediation model, researchers can evaluate different types of mediating effects. With reference to the multiple mediation model in Exhibit 7.7, an individual mediating effect considers the effect of Y1 on Y3 via one mediator (e.g., p1 ∙ p2 for Y2). A serial mediating effect considers the indirect effect of Y1 on Y3 via both mediators (i.e., p1 ∙ p5 ∙ p6), whereas a joint mediating effect considers the total indirect effect of Y1 on Y3 (i.e., p1 ∙ p2 + p4 ∙ p5 + p1 ∙ p5 ∙ p6). The analysis of multiple mediation models also follows the procedure by Zhao, Lynch, and Chen (2010) shown in Exhibit 7.5. That is, researchers should test the significance of the indirect effects (i.e., each specific and the total indirect effects) and the direct effect between the exogenous construct and the endogenous construct. Assessing the significance of the specific (single and serial) indirect effects, the total indirect effect, and the direct effect can be immediately done using the SmartPLS bootstrapping outputs. As with the path coefficient significance test (Chapter 6), you should select 10,000 (or more) bootstrap subsamples and report the 95% percentile bootstrap confidence intervals. On this basis, the analysis and results interpretation of a multiple mediation follows

240   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

the same procedure as a single mediation analysis. Nitzl, Roldán, and CepedaCarrión (2016) as well as Cepeda-Carrión, Nitzl, and Roldán (2017) provide additional insights and recommendations on multiple mediation analysis in PLS-SEM. Exhibit 7.8 summarizes the rules of thumb for running a mediation analysis in PLS-SEM. EXHIBIT 7.8  ■  Rules of Thumb for Mediation Analyses in PLS-SEM •• Consider all standard model evaluation criteria in the assessment of mediation models, such as convergent validity, discriminant validity, reliability, multicollinearity, explanatory power, and predictive power. •• Running a mediation analysis requires assessing the significance of, first, the indirect effects and, second, the direct relationship. Based on these results, researchers distinguish between different types of mediation and nonmediation. •• Application of PLS-SEM to test all types of mediation provides more accurate results than does the PROCESS approach. •• To test mediating effects, use bootstrapping instead of the Sobel test. •• To test multiple mediation models, include all mediators simultaneously and distinguish between specific (single or serial) indirect effects and total indirect effects, which you compare with the direct effect.

Case Study Illustration—Mediation To illustrate the estimation of mediating effects, let us, again, consider the extended corporate reputation model in the SmartPLS software (Ringle, Wende, & Becker, 2015). If you do not have the model readily available, please go back to the case study in the previous chapter and import the SmartPLS Project file Corporate Reputation.zip by going to File → Import Project from Backup File in the SmartPLS menu. Next, open the model by double-clicking on Extended Model. The model shown in Exhibit 7.9 will appear in the Modeling Window. In the following discussion, we will further explore the relationship between the two dimensions of corporate reputation (i.e., COMP and LIKE) and the target construct customer loyalty (CUSL). According to Festinger’s (1957) theory of cognitive dissonance, customers who perceive that a company has a favorable reputation are likely to show higher levels of satisfaction in an effort to avoid cognitive dissonance. At the same time, previous research has demonstrated that customer satisfaction is the primary driver of customer loyalty. Therefore, we expect that customer satisfaction mediates the two relationships between likeability and customer loyalty as well as competence and customer loyalty. To analyze these two mediation relationships in the corporate reputation model in more detail, we apply the procedure shown in Exhibit 7.5.

241

qual_2

csor_5

csor_4

csor_3

csor_2

csor_1

perf_5

perf_4

perf_3

perf_2

perf_1

qual_1

attr_2

ATTR

qual_6

attr_3

qual_5

QUAL

qual_4

attr_1

CSOR

PERF

qual_3

EXHIBIT 7.9  ■  Extended Model in SmartPLS

like_1

cusa

like_2

LIKE

COMP

comp_2

qual_8

comp_1

qual_7

like_3

CUSA

comp_3

CUSL

cusl_3

cusl_2

cusl_1

242   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

To begin the mediation analysis, we test the significance of the indirect effects. The indirect effect from COMP via CUSA to CUSL is the product of the path coefficients from COMP to CUSA and from CUSA to CUSL (mediation path 1). Similarly, the indirect effect from LIKE via CUSA to CUSL is the product of the path coefficients from LIKE to CUSA and from CUSA to CUSL (mediation path 2). To test the significance of these path coefficients’ products, we run the bootstrap routine. To do so, go to Calculate → Bootstrapping in the SmartPLS menu or click on the Calculate icon, above the Modeling Window, followed by Bootstrapping (note that you first may need to go back to the Modeling Window before the Calculate icon appears). We retain all settings for the PLS-SEM algorithm and the missing value treatment as before and select 10,000 bootstrap samples and the Basic Bootstrapping option. This selection provides the relevant results for our analysis but differs from the Complete Bootstrapping option in SmartPLS in that Basic Bootstrapping includes fewer results and, thus, is faster. For the mediation analysis, we need the bootstrapping results of the Indirect Effects and for the Path Coefficients (i.e., the direct effects), which are included in Basic Bootstrapping results. In the advanced settings, we choose Percentile Bootstrap, Two Tailed testing and a significance level of 0.05 for this example. Next, we click Start Calculation. After running the procedure, open the SmartPLS bootstrapping report. The table under Final Results → Specific Indirect Effects provides us with an overview of results, including standard errors, bootstrap mean values, t values, and p values. Clicking on the Confidence Intervals tab in the bootstrapping results report shows the confidence interval as derived from the percentile method. Similarly, the table under Final Results → Path Coefficients offers the corresponding results for the direct effects, which we need in the further analysis. Exhibit 7.10 summarizes the bootstrapping results for the relationships between COMP and CUSL as well as LIKE and CUSL. Alternatively, if you are interested in the results for indirect effects of a serial or a joint mediation, open the bootstrapping report Total Indirect Effects. We find that both indirect effects are significant, since neither of the 95% confidence intervals includes zero (see Chapter 5 for how to use confidence intervals for hypothesis testing). When reporting the confidence intervals, it is not EXHIBIT 7.10  ■  Significance Analysis of the Direct and Indirect Effects 95% Indirect Confidence Effect Direct Interval of the Significance (via Effect Direct Effect (p < 0.05)? CUSA)

95% Confidence Interval of the Indirect Effect

Significance (p < 0.05)?

COMP → CUSL

0.006

[−0.101, 0.113]

No

0.074

[0.007, 0.143]

Yes

LIKE → CUSL

0.344

[0.236, 0.454]

Yes

0.220

[0.153, 0.290]

Yes

Chapter 7  ■  Mediator and Moderator Analysis 

243

necessary to also report the t values and p values. The next step of the mediation analysis focuses on the significance of the direct effects from COMP to CUSL and LIKE to CUSL. As determined in Chapter 6 and shown in Exhibit 7.10, the relationship from COMP to CUSL is very weak (0.006) and statistically nonsignificant. Following the mediation analysis procedure in Exhibit 7.5, we conclude that CUSA fully mediates the COMP to CUSL relationship. On the contrary, LIKE exerts a pronounced (0.344) and significant (p < 0.05) effect on CUSL. We therefore conclude that CUSA partially mediates the relationship since both the direct and the indirect effects are significant and meaningful (Exhibit 7.5). To further substantiate the type of partial mediation, we next compute the product of the direct effect and the indirect effect. Since the direct and indirect effects are both positive, the sign of their product is also positive (i.e., 0.344 · 0.220 = 0.076). Consequently, CUSA represents complementary mediation of the relationship from LIKE to CUSL. Our findings provide empirical support for the mediating role of customer satisfaction in the reputation model. More specifically, customer satisfaction represents a mechanism that underlies the relationship between competence and customer loyalty. Competence leads to customer satisfaction, and customer satisfaction in turn leads to customer loyalty. For the relationship between likeability and customer loyalty, customer satisfaction serves as a complementary mediator. Higher levels of likeability increase customer loyalty directly but also increase customer satisfaction, which, in turn, leads to customer loyalty. Hence, some of likeability’s effect on loyalty is explained by satisfaction.

MODERATION Introduction Moderation describes a situation in which the relationship between two constructs is not constant but depends on the values of a third variable, referred to as a moderator variable. The moderator variable (or construct) changes the strength, or even the direction, of a relationship between two constructs in the model. For example, prior research has shown that the relationship between customer satisfaction and customer loyalty differs as a function of the customers’ income (or age; e.g., Homburg & Giering, 2001). More precisely, income has a pronounced negative effect on the satisfaction to loyalty relationship—the higher the income, the weaker the relationship between satisfaction and loyalty. In other words, income serves as a moderator variable that accounts for heterogeneity. Thus, the satisfaction to loyalty relationship is not the same for all customers but differs depending on the income level. In this respect, moderation can (and should) be seen as a means to account for heterogeneity in the data. Moderating relationships are hypothesized a priori by the researcher based on theory, and specifically tested. The testing of the moderating relationship depends

244   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

on whether the researcher hypothesizes that one specific model relationship or that all model relationships depend on the data for the moderator variable. In the prior example, we hypothesized that only the satisfaction to loyalty relationship is significantly influenced by income. These considerations also apply for the corporate reputation model and its relationship between CUSA and CUSL. In such a setting, we would examine if and how the respondents’ income influences the relationship. Exhibit 7.11 shows the theoretical model of such a moderating relationship, which only focuses on the path from satisfaction to loyalty in the corporate reputation model.

EXHIBIT 7.11  ■  Moderation Model Example (Theoretical Model)

Income

Customer Satisfaction

Customer Loyalty

Alternatively, we could also hypothesize that several relationships in the corporate reputation model depend on some customer characteristic, such as gender. For example, we could hypothesize that the effect of corporate reputation (i.e., likeability and competence) on satisfaction and loyalty is different for females compared with males. Gender would then serve as a grouping variable that divides the data into two subsamples, as illustrated in Exhibit 7.12. The same model is then separately estimated for each subsample. Since researchers are usually interested in comparing the models and learning about significant differences between the subsamples, the model estimates for the subsamples are usually compared by means of a multigroup analysis (see Chapter 4 in Hair, Sarstedt, Ringle, & Gudergan, 2018). Specifically, multigroup analysis enables the researcher to test for differences between the path coefficients of identical models estimated for different groups of respondents (e.g., females vs. males). The general objective is to see if there are statistically significant differences between individual group models. For example, with regard to the models in Exhibit 7.12, the multigroup

Chapter 7  ■  Mediator and Moderator Analysis 

245

EXHIBIT 7.12  ■  Multigroup Analysis PLS Path Model Estimates for Male Customers

PLS Path Model Estimates for Female Customers

COMP

COMP 0.19

0.21 0.11

0.09 CUSA

0.38

0.56

CUSL

CUSA 0.22

0.10

LIKE

0.64

CUSL

0.34

LIKE

analysis would enable testing whether the 0.34 relationship between LIKE and CUSL for female customers is significantly higher than the corresponding relationship for male customers (0.10). In this section, our focus is on the modeling and interpretation of an interaction effect that occurs when a moderator variable is assumed to influence one specific relationship. In Chapter 8, we provide a brief introduction to the basic concepts of multigroup analysis. More advanced topics are included in our book Advanced Issues in Partial Least Squares Structural Equation Modeling (Hair, Sarstedt, Ringle, & Gudergan, 2018).

Types of Moderator Variables Moderators can be present in structural models in different forms. They can represent observable traits, such as gender, age, or income. But they can also represent unobservable traits, such as risk tolerance, brand attitude and engagement, or ad liking. Moderators can be measured with a single item or multiple items and using reflective or formative indicators. The most important differentiation, however, relates to the moderator’s measurement scale, which involves distinguishing between categorical (typically dichotomous) and continuous moderators. In the case study on corporate reputation in the mobile phone industry, we could, for example, use the service-type variable (contract vs. prepaid) as a categorical moderator variable. These categorical variables are usually binary (dummy) coded with values of zero and one, whereby the zero represents the reference category. Note that a categorical moderator does not necessarily have

246   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

to represent only two groups. For example, in the case of three groups (e.g., short-term contract, long-term contract, and prepaid), we could split up the moderator into two binary (dummy) variables for short-term contract and longterm contract, which are simultaneously included in the model. Both binary (dummy) coded variables would take the value zero for the reference category, which is prepaid in this example. The other two categories would be indicated by the value of one in each corresponding binary (dummy) coded variable. Similar to ordinary least squares regressions, categorical moderators can be included in a PLS path model to assess the effect of a specific relationship. For example, in the case study on corporate reputation, we could evaluate whether the customers’ gender has a significant bearing on the satisfaction–loyalty link. In most cases, however, researchers use a categorical moderator variable to split the data set into two or more groups and estimate the models separately for each group of data. Running a multigroup analysis enables identification of model relationships that differ significantly between the groups (Chapter 8). This approach offers a more complete picture of the moderator’s influence on the analysis results, as the focus shifts from examining its impact on one specific model relationship to examining its impact on all model relationships. Such an analysis, however, must be grounded in theory. In many situations, researchers have a continuous moderator variable they believe can affect the strength of one specific relationship between two latent variables. Returning to our case study on corporate reputation, we could, for example, hypothesize that the relationship between satisfaction and loyalty is influenced by the customers’ income. More precisely, we could hypothesize that the relationship between customer satisfaction and customer loyalty is weaker for high-income customers and stronger for low-income customers. Such a moderator effect would indicate that the satisfaction to loyalty relationship changes, depending on the level of income. If this moderator effect is not present, we would assume that the strength of the relationship between satisfaction and loyalty is constant. Continuous moderators are typically measured with multiple items but can, in principle, also be measured using only a single item. When the moderator variable represents some abstract unobservable trait (as opposed to some observable phenomenon such as income), however, we clearly advise against the use of single items for construct measurement. Single items are significantly less effective than multi-item scales in terms of construct predictive validity (Diamantopoulos, Sarstedt, Fuchs, Kaiser, & Wilczynski, 2012; Sarstedt, Diamantopoulos, Salzberger, & Baumgartner, 2016), which can be particularly problematic when using such variables as continuous moderators. The reason is that moderation is usually associated with rather limited effect sizes (Aguinis, Beaty, Boik, & Pierce, 2005), so that any lack of predictive power will make it harder to identify significant relationships. This characteristic amplifies the limitations of singleitem measurement in the context of moderation.

Chapter 7  ■  Mediator and Moderator Analysis 

247

Modeling Moderating Effects To gain an understanding of how moderating effects are modeled, consider the path model shown in Exhibit 7.13. This model illustrates our previous example from Exhibit 7.11, in which income serves as a moderator variable (M), influencing the relationship between customer satisfaction (Y1) and customer loyalty (Y2). Different from the theoretical model, which describes the assumed relationships between the three concepts on conceptual grounds, Exhibit 7.13 shows how the theoretical model is represented in the statistical analysis. The moderating effect (p3) is represented by an arrow pointing at the effect p1 linking Y1 and Y2. Furthermore, when including the moderating effect in a PLS path model, there is also a direct relationship (p2) from the moderator to the endogenous construct. This additional path is important (and a frequent source of mistake) as it controls for the direct impact of the moderator on the endogenous construct. If the path p2 were to be omitted, the effect of M on the relationship between Y1 and Y2 (i.e., p3) would be inflated. As can be seen, moderation is similar to mediation in that a third variable (i.e., a mediator or moderator variable) affects the strength of a relationship between two latent variables. The crucial distinction between both concepts is that the moderator variable does not depend on the exogenous construct. In contrast, with mediation there is a direct effect between the exogenous construct and the mediator construct (Memon et al., 2019). EXHIBIT 7.13  ■  Moderation Model Example (Statistical Model)

M

p2

p3 Y1

p1

Y2

The path model in Exhibit 7.13 can also be expressed by the following equation: Y2 = (p1 + p3 · M) · Y1 + p2 · M As can be seen, the influence of Y1 on Y2 depends not only on the strength of the simple effect p1 but also on the product of p3 and M. To understand how a

248   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

moderator variable can be integrated in the model, we need to rewrite the equation as follows: Y2 = p1 · Y1 + p2 · M + p3 · (Y1 · M) This equation shows that including a moderator effect requires the specification of the effect of the exogenous construct (i.e., p1 · Y1), the effect of the moderator variable (i.e., p2 · M), and the product term p3·(Y1 · M), which is also called the interaction term. As a result, the coefficient p3 expresses how the effect p1 changes when the moderator variable M is increased or decreased by one standard deviation unit. Exhibit 7.14 illustrates the concept of an interaction term. As can be seen, the model includes the interaction term as an additional latent variable covering the product of the exogenous construct Y1 and the moderator M. Because of this interaction term, researchers often refer to interaction effects when modeling moderator variables. EXHIBIT 7.14  ■  Interaction Term in Moderation

Y1. M M p3

p2

Y1

p1

Y2

So far, we have looked at a two-way interaction because the moderator interacts with one other variable, the exogenous construct Y1. It is also possible, however, to analyze a multiple moderator model. In such a model, the researcher needs to include multiple interaction terms that map the interplay between each moderator and the exogenous construct as well as among the moderators themselves. Such a setup is also referred to as cascaded moderator analysis. The most common form of a cascaded moderator analysis is a three-way interaction. For example, we could imagine that the moderating effect of income is not constant but is itself influenced by some other variable such as age (N), which would serve as a second moderator variable in the model. In this case, the model would be written as: Y2 = p1 · Y1 + p2 · M + p3 · N + p4 · (Y1 · M) + p5 · (Y1 · N) + p6 · (M · N) + p7 · (Y1 · M · N)

Chapter 7  ■  Mediator and Moderator Analysis 

249

Creating the Interaction Term In the previous section, we introduced the concept of an interaction term to facilitate the inclusion of a moderator variable in the PLS path model. But one fundamental question remains: How should the interaction term be operationalized? Research has proposed several approaches for creating the interaction term (e.g., Becker, Ringle, & Sarstedt, 2018; Henseler & Chin, 2010; Rigdon, Ringle, & Sarstedt, 2010). On the following pages, we discuss three prominent approaches: (1) the product indicator approach, (2) the orthogonalizing approach, and (3) the two-stage approach.

Product Indicator Approach The product indicator approach is a standard approach for creating the interaction term in regression-based analyses and applicable to PLS-SEM. As we will describe later, however, its use is not universally recommended in PLS-SEM. The product indicator approach involves multiplying each indicator of the exogenous construct with each indicator of the moderator variable (Chin, Marcolin, & Newsted, 2003). These so-called product indicators become the indicators of the interaction term. Exhibit 7.15 illustrates the interaction term when both Y1 and M are measured by means of two indicators. Thus, the interaction term has four product indicators. The product indicator approach requires the indicators of the exogenous construct and the moderator variable to be reused in the measurement model of the

EXHIBIT 7.15  ■  Product Indicator Approach x 1 . m1 x 1 . m2

Y1 . M

x 2 . m1 x 2 . m2 m1 m2

x1 x2

M p3

p2

Y1

p1

Y2

x3 x4

250   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

interaction term. This procedure, however, inevitably introduces collinearity in the path model. To reduce collinearity problems, researchers typically standardize the indicators of the moderator prior to creating the interaction term. Standardization converts each variable to a mean of 0 and a standard deviation of 1, which is done by subtracting the variable’s mean from each observation and dividing the result by the variable’s standard deviation (Sarstedt & Mooi, 2019; Chapter 5). This step reduces the collinearity that is introduced by indicator reuse. Furthermore, standardizing the indicators facilitates the interpretation of the moderating effect, a characteristic that we will discuss in greater detail later in this chapter.

Orthogonalizing Approach The orthogonalizing approach is an extension of the product indicator approach. Little, Bovaird, and Widaman (2006) developed the approach to address two issues that are the result of the standardization of variables as implemented in the product indicator approach. First, while indicator standardization reduces the level of collinearity in the PLS path model, it does not fully eliminate it. Despite the standardization, collinearity in the path model may still be substantial, yielding inflated standard errors or biased path coefficient estimates (Chapter 6). Second, when the variables are standardized, one cannot readily compare the direct effect between Y1 and Y2 when no interaction term is included (i.e., the main effect), with the effect between Y1 and Y2 when the interaction term is included (i.e., the simple effect). We will further clarify the distinction between main effect and simple effect when discussing the interpretation of results. The orthogonalizing approach builds on the product indicator approach and requires creating all product indicators of the interaction term. For the model in Exhibit 7.16, this would mean creating four product indicators: x1 · m1, x1 · m2 , x 2 · m1, and x 2 · m2 . The next step is to regress each product indicator on all indicators of the exogenous construct and the moderator variable. For the example in Exhibit 7.16, we would need to establish and estimate the following four regression models: x1 · m1 = b1,11 · x1 + b2,11 · x2 + b3,11 · m1 + b4,11 · m2 + e11 x1 · m2 = b1,12 · x1 + b2,12 · x2 + b3,12 · m1 + b4,12 · m2 + e12 x2 · m1 = b1,21 · x1 + b2,21 · x2 + b3,21 · m1 + b4,21 · m2 + e21 x2 · m2 = b1,22 · x1 + b2,22 · x2 + b3,22 · m1 + b4,22 · m2 + e22 In each regression model, a product indicator (e.g., x1 · m1) represents the dependent variable, while all indicators of the exogenous construct (here, x1 and x2) and the moderator (here, m1 and m2) act as independent variables. When looking at the results, we are not interested in the regression coefficient b. Rather, the residual term e is the outcome of interest. The orthogonalizing approach uses

Chapter 7  ■  Mediator and Moderator Analysis 

251

EXHIBIT 7.16  ■  The Orthogonalizing Approach e11 e12

Y1 . M

e21 e22 m1 m2

x1 x2

M p3

p2

Y1

p1

Y2

x3 x4

the standardized residuals e as indicators for the interaction term, as shown in Exhibit 7.16. This analysis ensures that the indicators of the interaction term do not share any variance with the indicators of the exogenous construct and the moderator. In other words, the interaction term is orthogonal to the other two constructs, precluding any collinearity issues among the constructs involved. Another consequence of orthogonality is that the path coefficient estimates in the model without the interaction term are identical to those with the interaction term. This characteristic greatly facilitates the interpretation of the moderating effect’s strength compared with the product indicator approach. However, because of its reliance on product indicators, the orthogonalizing approach is only applicable when the exogenous construct and the moderator variable are measured reflectively.

Two-Stage Approach Chin, Marcoulin, and Newsted (2003) proposed the two-stage approach as a means to run a moderation analysis. The general applicability of the two-stage approach has its roots in its explicit exploitation of PLS-SEM’s advantage to estimate latent variable scores (Becker, Ringle, & Sarstedt, 2018; Henseler & Chin, 2010; Rigdon, Ringle, & Sarstedt, 2010). The two stages are as follows: Stage 1: The main effects model (i.e., without the interaction term) is estimated to obtain the scores of the latent variables. These are saved for further analysis in the second stage.

252   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Stage 2: The latent variable scores of the exogenous construct and moderator variable from Stage 1 are multiplied to create a single-item measure used to measure the interaction term. All other latent variables are represented by means of single items of their latent variable scores from Stage 1. Exhibit 7.17 illustrates the two-stage approach for our previous model, but two formative indicators are used in Stage 1 to measure the moderator variable. The main effects model in Stage 1 is run to obtain the latent variable scores for Y1, Y2, and M (i.e., LVS(Y1), LVS(Y2), and LVS(M)). The latent variable scores of Y1 and M are then multiplied to form the single item used to measure the interaction term Y1 ∙ M in Stage 2. The latent variables Y1, Y2, and M are each measured with a single item of the latent variable scores from Stage 1. It is important to note that the limitations identified when using single items do not apply in this case, since the single item represents the latent variable scores as obtained from a multi-item measurement in Stage 1. EXHIBIT 7.17  ■  Two-Stage Approach Stage 1 m1 m2

x1 x2

M p2

Y1

x3

Y2

p1

x4

Stage 2 Y1 . M

LVS(Y1) . LVS(M)

LVS(M)

M p3

p2

LVS(Y1)

Y1

p1

Y2

LVS(Y2)

Chapter 7  ■  Mediator and Moderator Analysis 

253

Guidelines for Creating the Interaction Term Which approach should be preferred to create the interaction term? To answer this question, it is important to note that the product indicator approach and the orthogonalizing approach are only applicable to reflective measurement models. The indicator multiplication builds on the assumption that the indicators of the exogenous construct and the moderator each stem from a certain construct domain and are in principle interchangeable. Therefore, both approaches are not applicable when the exogenous construct or the moderator are measured formatively. Since formative indicators do not necessarily have to correspond to a predefined theoretical concept (e.g., when used to measure artifacts; Chapter 2), multiplying them with another set of indicators will confound the interaction term. As an alternative, the two-stage approach is generally applicable, regardless of whether the exogenous construct and the moderator construct have reflective or formative measurement models. We recommend the use of the two-stage approach in most instances. Besides its general applicability, simulation studies support its advantageous properties. More specifically, Henseler and Chin (2010) ran an extensive simulation study, comparing the approaches in terms of their statistical power, point estimation accuracy, and prediction accuracy. Becker, Ringle, and Sarstedt (2018) replicated and extended this simulation study. Their simulation study particularly supports the two-stage approach, showing that it outperforms the other approaches in terms of parameter recovery. In addition, Becker, Ringle, and Sarstedt (2018) examined the impact of different data treatment options on the approach’s performance. The results show that parameter recovery works best when standardizing the indicator data and the interaction term rather than working with unstandardized or mean-centered data. Against this background, we recommend that researchers apply the two-stage approach with standardized data when conducting moderator analyses.

Model Evaluation Measurement and structural model evaluation criteria, as discussed in Chapters 4–6, also apply to moderator models. When assessing reflective measurement models, the moderator variable must meet all relevant criteria in terms of internal consistency reliability, convergent validity, and discriminant validity. Similarly, all formative measurement model criteria universally apply to the moderator variable. For the interaction term, however, there is no such requirement. The product indicator approach and the orthogonalizing approach require multiplying the indicators from two conceptual domains (Chapter 2) to form the interaction term. This does not imply that the product of the indicators stems from one specific conceptual domain. Rather, the interaction term’s measurement model should be viewed as an auxiliary measurement that incorporates the interrelationships between the moderator and the exogenous construct in the

254   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

path model. This characteristic, however, renders any measurement model assessment of the interaction term meaningless. Furthermore, reusing the indicators of the moderator variable and exogenous construct introduces high construct correlations by design, violating common discriminant validity standards. Similarly, established measurement model evaluation standards do not apply when using the two-stage approach since the interaction term is measured with a single item. Therefore, the interaction term does not have to be assessed in the measurement model evaluation step. Finally, it is also important to consider the standard criteria for structural model assessment. In the context of moderation, particular attention should be paid to the f ² effect size of the interaction effect. As explained in Exhibit 6.5 in Chapter 6, this criterion enables an assessment of the change in the R² value when an exogenous construct is omitted from the model. In case of the interactions effect, the f ² effect size indicates how much the moderation contributes to the explanation of the endogenous construct. Recall that the effect size can be calculated as f2=

2 2 − Rexcluded Rincluded , 2 1 − Rincluded

2 2 where Rincluded and Rexcluded are the R² values of the endogenous construct when the interaction term of the moderator model is included in or excluded from the PLS path model. In this way, you can assess the relevance of the moderating effect. General guidelines for assessing ƒ ² suggest that values of 0.02, 0.15, and 0.35 represent small, medium, and large effect sizes, respectively (Cohen, 1988). However, Aguinis, Beaty, Boik, and Pierce (2005) have shown that the average effect size in tests of moderation is only 0.009. Against this background, Kenny (2018) proposes that 0.005, 0.01, and 0.025 constitute more realistic standards for small, medium, and large effect sizes, respectively, but also points out that even these values are optimistic given Aguinis, Beaty, Boik, and Pierce’s (2005) review.

Results Interpretation When interpreting the results of a moderation analysis, the primary interest is with the significance of the interaction term. If the interaction term’s effect on the endogenous construct is significant, we conclude that the moderator M has a significant moderating effect on the relationship between Y1 and Y2. The bootstrapping procedure, as explained in Chapters 5 and 6, facilitates this assessment. In case of a significant moderation, the next step is to determine the strength of the moderating effect. In a model without moderation (i.e., without the moderator variable M) where there is only an arrow linking Y1 and Y2, the effect p1 is referred to as a direct effect or main effect. In case of the product indicator and the two-stage approach,

Chapter 7  ■  Mediator and Moderator Analysis 

255

such main effects are, however, different from the corresponding relationship in a moderator model shown in Exhibit 7.15 (product indicator approach) and Exhibit 7.17 (two-stage approach). Here, in contrast, p1 is referred to as a simple effect, expressing the effect of Y1 on Y2 that is moderated by M. More specifically, the estimated value of p1 represents the strength of the relationship between Y1 and Y2 when the moderator variable M has a value of zero. If the level of the moderator variable is increased (or decreased) by one standard deviation unit, the simple effect p1 is expected to change by the size of p3. For example, if the simple effect p1 equals 0.30 and the moderating effect p3 has a value of −0.10, one would expect the relationship between Y1 and Y2 to decrease to a value of 0.30 + (−0.10) = 0.20, if (ceteris paribus) the mean value of the moderator variable M increases by one standard deviation unit (Henseler & Fassott, 2010). In many model setups, however, zero is not a number on the scale of M or, as in the case in our example, is not a sensible value for the moderator. If this is the case, the interpretation of the simple effect becomes problematic. This is another reason why we need to standardize the indicators of the moderator as described earlier. The standardization shifts the reference point from an income of zero to the average income, and thus facilitates interpretation of the effects. Furthermore, as indicated before, the standardization reduces collinearity among the interaction term, the moderator, and the exogenous constructs, resulting from the reuse of indicators in the case of the product indicator approach. As the nature of the effect between Y1 and Y2 (i.e., p1) differs for models with and without the moderator when using the product indicator or two-stage approach, we need to include an important note of caution. If one is interested in testing the significance of the main effect p1 between Y1 and Y2, the PLS-SEM analysis should be initially executed without the moderator. The evaluation and interpretation of results should follow the procedures outlined in Chapter 6. Then, the moderator analysis follows as complementary analysis for the specific moderating relationship. This issue is important because the direct effect becomes a simple effect in the moderator model, which differs in its estimated value, meaning, and interpretation. The simple effect represents the relationship between an exogenous and an endogenous construct when the moderator variable’s value is equal to its mean value (provided standardization has been applied). Hence, interpreting the simple effect results of a moderator model as if it were a direct effect (e.g., for testing the hypothesis of a significant relationship p1 between Y1 and Y2) may involve incorrect and misleading conclusions (Henseler & Fassott, 2010). The above considerations are relevant only when using the product indicator or two-stage approach for creating the interaction term. The situation is different when using the orthogonalizing approach. As a consequence of the orthogonality of the interaction term, the estimates of the simple effect in a model with an interaction term are very similar to the parameter estimates of the direct effect in a model without interaction. That is, for example, the differentiation between a direct effect or main effect and a simple effect is not relevant when the

256   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

orthogonalizing approach has been used. Hypotheses relating to the direct and the moderating effect can be assessed in one model. Beyond understanding these aspects of moderator analysis, interpretation of moderation results is often quite challenging. For this reason, graphical illustrations of results support understanding them and drawing conclusions. A common way to illustrate the results of a moderation analysis is by means of a slope plot. Webpages such as those by Jeremy Dawson (http://www.jeremydawson .co.uk/slopes.htm) or Kristopher Preacher (http://quantpsy.org/interact/mlr2 .htm) provide online tools for corresponding computations and simple slope plot extractions. In our example of a two-way interaction (Exhibit 7.14), suppose that the relationship between Y1 and Y2 has a value of 0.50, the relationship between M and Y2 has a value of 0.10, and the interaction term (Y1 · M) has a 0.25 relationship with Y2. Exhibit 7.18 shows the slope plot for such a setting, where the x-axis represents the exogenous construct (Y1) and the y-axis the endogenous construct (Y2). EXHIBIT 7.18  ■  Slope Plot 1

Endogenous Construct (Y2)

0.8 High Moderator (M)

0.6 0.4

Low Moderator (M)

0.2 0 −0.2 −0.4 −0.6 −0.8

Low

High Exogenous Construct (Y1)

The two lines in Exhibit 7.18 represent the relationship between Y1 and Y2 for low and high levels of the moderator construct M. Usually, a low level of M is one standard deviation unit below its average (straight line in Exhibit 7.18) while a high level of M is one standard deviation unit above its average (dashed line in Exhibit 7.18). Because of the positive moderating effect, as expressed in the

Chapter 7  ■  Mediator and Moderator Analysis 

257

0.25 relationship between the interaction term and the endogenous construct, the high moderator line’s slope is steeper. That is, the relationship between Y1 and Y2 becomes stronger with high levels of M. For low levels of M, the slope is much flatter, as shown in Exhibit 7.18. Hence, with low levels of the moderator construct M, the relationship between Y1 and Y2 becomes weaker.

Moderated Mediation and Mediated Moderation With the advent of mediation and moderation analyses, there have been occasional discussions of how to combine these two analytic strategies in so called conditional process models (Hayes & Rockwood, 2020). These analyses are framed in terms of a moderated mediation or a mediated moderation, depending on the way the researcher defines the influence of the moderator or mediator construct. Moderated mediation occurs when a moderator variable interacts with a mediator construct, such that the value of the indirect effect changes depending on the value of the moderator variable. Such a situation is also referred to as a conditional indirect effect because the value of the indirect effect is conditional on the value of the moderator variable (Hayes, 2015). In other words, if the mechanism linking an exogenous construct to an endogenous construct through a mediator is a function of another variable, then it can be said to be moderated by that variable. Consider Exhibit 7.19 for an example of a moderated mediation. In this model, the relationship between the exogenous construct Y1 and the mediator construct Y2 is assumed to be moderated by M. Following the standard approach for testing moderation in a path model, we would establish an interaction term of the indicators measuring the moderator variable M and the exogenous construct Y1, as shown in Exhibit 7.20. The resulting conditional indirect effect is expressed as (p1 + p5 · M) · p2 = p1 · p2 + p2 · p5 · M, where M refers to the latent variable scores of the moderator variable.

EXHIBIT 7.19  ■  Moderated Mediation (Theoretical Model)

M Y2

Y1

Y3

258   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 7.20  ■  Interaction Term in a Moderated Mediation Model

Y1 · M

M

p5 p4 Y2 p2

p1

Y1

p3

Y3

To claim that mediation is moderated, researchers traditionally note that there needs to be evidence of a significant moderation of the path linking Y1 to Y2 (e.g., Muller, Judd, & Yzerbyt, 2005; Preacher, Rucker, & Hayes, 2007), which is equivalent to observing a significant path p5 in Exhibit 7.20. More recently, however, Hayes (2015) has stressed that a nonsignificant moderation does not necessarily imply that the indirect effect of Y1 on Y3 is not moderated by M. The author points out that researchers have to consider the moderator’s impact on the indirect effect as a whole rather than on one element of the mediating effect in isolation (in this case, the path p2). To formally test such an impact, Hayes (2015) proposes the index of moderated mediation, which quantifies the effect of the moderator M on the indirect effect of Y1 on Y3 through the mediator Y2. For the conditional indirect effect shown in Exhibit 7.20 (i.e., p1 · p2 + p2 · p5 · M), the index of moderated mediation is equal to the product terms of M: w = p2 · p5 If the index w is significant, we can conclude that the indirect effect of Y1 on Y3 through Y2 depends on M. To test whether this is the case, we need to run bootstrapping and to compute w for each bootstrap sample (e.g., by copying the SmartPLS results table of the path coefficients per bootstrap sample to Microsoft Excel). The standard error of w across all bootstrap samples allows us to compute the t value (i.e., by dividing w by its bootstrap standard error) and the corresponding p value. Alternatively, you can use the ascending order of w per bootstrap sample and, thereby, determine the 95% confidence interval in accordance with the percentile bootstrap approach (Chapter 5). Other types of moderated mediation can occur such as when the relationship between the mediator Y2 and the endogenous construct Y3 is moderated by M. In

Chapter 7  ■  Mediator and Moderator Analysis 

259

this alternative, the conditional indirect effect is p1· (p2 + p5 · M) = p1 · p2 + p1 · p5 · M, and the index of the moderated mediation is given by w = p1 · p5. Similarly, the moderator M can influence both elements of the indirect relationship simultaneously (i.e., p1 and p2 in Exhibit 7.20) and the simple effect p3 between Y1 and Y3. Furthermore, moderation can also occur in multiple mediation models, affecting one or more model relationships; see Hayes (2015) and Hayes and Rockwood (2020) for further details and Ng, Lim, Cheah, Ho, and Tee (2020) for an application of a moderated mediation model. The second way of combining mediation and moderation is by means of a mediated moderation. Again, consider the moderator model in Exhibit 7.13. In the case of a mediated moderation, the moderating effect p3 from the interaction term Y1 · M to the dependent variable Y2 is mediated by another construct. Hence, the mediator construct intervenes with the moderating effect in that a change in the interaction term results in a change of the mediator construct, which, in turn, translates to a variation of the dependent variable in the moderator model. In other words, the mediator construct governs the nature (i.e., the underlying mechanism or process) of the moderating effect. With respect to these considerations, Hayes (2018) advises against the explicit testing of mediated moderation, since the corresponding analysis provides no additional insights into the path model effects. More specifically, the interaction term in a mediated moderation model has no substantive grounding in the measurement or manipulation process. Therefore, quantification of the interaction term has an unclear meaning and no substantive value for interpretation. Furthermore, it is usually very challenging and oftentimes impossible to establish theoretically reasonable underpinnings of mediated moderation models that researchers can test empirically. For these reasons, we follow Hayes’s (2018) call to disregard the concept of mediated moderation. Instead, researchers should focus on moderated mediation. Exhibit 7.21 summarizes the rules of thumb for running a moderation analysis in PLS-SEM. EXHIBIT 7.21  ■  Rules of Thumb for Moderation Analysis in PLS-SEM •• For the creation of the interaction term, the two-stage approach with standardized data and a standardized interaction term should generally be preferred. •• The moderator variable must be assessed for reliability and validity following the standard evaluation procedures for reflective and formative measures. However, this does not hold for the interaction term, which relies on an auxiliary measurement model generated by reusing indicators of the exogenous construct and the moderator variable. (Continued)

260   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 7.21  ■  (Continued) •• A significant interaction term offers evidence for the existence of a moderating effect. •• To assess the strength of the moderating effect, interpret the f2 effect size. Effect sizes of 0.005, 0.010, and 0.025 indicate small, medium, and large effects, respectively. •• In the results interpretation and testing of hypotheses, differentiate between the direct effect (or main effect), on the one hand, and simple effect, on the other. The direct effect expresses the relationship between two constructs when no moderator is included. On the contrary, the simple effect expresses the relationship between two constructs when moderated by a third variable, and this moderator has an average value (provided the data are standardized). •• PLS-SEM is superior to the PROCESS approach for assessing conditional process models that combine mediation and moderation analysis (Sarstedt, Hair, Nitzl, Ringle, & Howard, 2020). •• When testing a moderated mediation model, use Hayes’s (2015) index of moderated mediation. •• Do not use mediated moderation models (Hayes, 2018).

CASE STUDY ILLUSTRATION— MODERATION To illustrate the estimation of moderating effects, let’s consider the extended corporate reputation model again, as shown in Exhibit 7.9 earlier in this chapter. In the following discussion, we focus on the relationship between customer satisfaction and customer loyalty. Specifically, we introduce switching costs as a moderator variable that can be assumed to negatively influence the relationship between satisfaction and loyalty. The higher the perceived switching costs, the weaker the relationship between these two constructs. We use an extended form of Jones, Mothersbaugh, and Beatty’s (2000) scale and measure switching costs reflectively using four indicators (switch_1 to switch_4; Exhibit 7.22), each measured on a 5-point Likert scale (1 = fully disagree, 5 = fully agree). EXHIBIT 7.22  ■  Indicators for Measuring Switching Costs switch_1

It takes me a great deal of time to switch to another company.

switch_2

It costs me too much to switch to another company.

switch_3

It takes a lot of effort to get used to a new company with its specific “rules” and practices.

switch_4

In general, it would be a hassle switching to another company.

Chapter 7  ■  Mediator and Moderator Analysis 

261

We first need to extend the original model by including the moderator variable. To do so, enter a new construct in the model (see Chapter 2 for detailed explanations), rename it to SC (i.e., switching costs), and draw a path relationship from the newly added moderator variable to the CUSL construct. Next, we need to assign the indicators switch_1, switch_2, switch_3, and switch_4 to the SC construct. Exhibit 7.23 shows the resulting main effects model (i.e., the extended model plus SC linked to CUSL) in the SmartPLS Modeling Window. Please note that in case of only one moderator variable, like income or age, you would use a single-item construct (i.e., a construct with only one indicator variable that serves as a moderator; Chapters 2 and 4). In the next step, we need to create the interaction term. The SmartPLS software (Ringle, Wende, & Becker, 2015) offers an option to automatically include an interaction term based on the product indicator, orthogonalizing, or twostage approach. In this case study, our primary concern is with disclosing the significance of a moderating effect (which is usually the case in PLS-SEM applications), making the two-stage approach the method of choice. Furthermore,

EXHIBIT 7.23  ■  Main Effects Model comp_1

comp_2

comp_3 switch_1 switch_2 switch_3

COMP

SC

switch_4

cusl_1 cusa

cusl_2 cusl_3 CUSA

LIKE like_1

like_2

like_3

CUSL

262   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

the two-stage approach is the most versatile approach, as it also works when the exogenous construct or the moderator are measured formatively. To include an interaction term, right-click in the target construct CUSL and choose the option Add Moderating Effect (Exhibit 7.24). In the screen that follows, specify SC as the Moderator Variable and CUSA as the Independent Variable and choose the Two-Stage option as well as Standardized and Automatic under Advanced Settings (Exhibit 7.25). When you click on OK, SmartPLS will include the interaction term labeled Moderating Effect 1 in the Modeling Window. If you like, you can rename the interaction term (e.g., to CUSA * SC) by left-clicking the construct and then

EXHIBIT 7.24  ■  Add Moderating Effect Menu Option

263

EXHIBIT 7.25  ■  Interaction Effect Dialog Box in SmartPLS

264   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 7.26  ■  Moderator Analysis Results in SmartPLS comp_1

comp_2

comp_3

0.824 0.821 0.844

0.895 0.743 0.819

0.631 SC

COMP 0.146

cusa

1.000

−0.020

0.292

0.467

0.319

0.436

switch_2 switch_3

0.892

switch_4

0.834

cusl_1

0.070

0.571

0.917 0.841

CUSA

switch_1

cusl_2 cusl_3

CUSL −0.071 [+]

0.558

CUSA * SC

LIKE 0.880 0.869 0.844 like_1

like_2

like_3

right-clicking this construct to select the Rename option from the menu that appears. The different color of the construct also indicates that it is an interaction term. Again, right-click on the interaction term and choose the menu option Show Indicators of Selected Constructs. The indicator CUSA * SC generated in the second stage of the two-stage approach will then appear in the Modeling Window. We can now proceed with the analysis by running the PLS-SEM algorithm (using the path weighting scheme and mean value replacement for missing values) as described in the earlier chapters. Exhibit 7.26 shows the results in the Modeling Window. The evaluation of the moderator variable’s measurement model shows that the construct measures are reliable and valid. All indicator loadings are above

Chapter 7  ■  Mediator and Moderator Analysis 

265

0.70, and the convergent validity assessment yields an AVE of 0.705, providing support for convergent validity of the switching cost moderator (SC). The composite reliability ρ A has a value of 0.858, indicating internal consistency reliability. In terms of discriminant validity, SC exhibits increased HTMT values only with COMP (0.850) and LIKE (0.802). A further analysis of these HTMT values uses bootstrapping (percentile bootstrap, 10,000 subsamples, no sign changes, complete bootstrapping, one-tailed testing with a 0.05 significance level, and standard settings for the PLS-SEM algorithm and missing value treatment). As discussed in Chapter 4, we assume the more conservative HTMT value of 0.85 for all relevant construct combinations except COMP and LIKE as well as CUSA and CUSL. For these pairs of constructs, we assume the higher 0.90 threshold because of their conceptual similarity. For the additional SC construct, we also consider conceptional similarity with LIKE, COMP, and CUSA and apply the HTMT’s higher threshold of 0.90. The results show that the HTMT values are significantly lower (p < 0.05) than the critical cutoff values of 0.85 and 0.90 (even though the upper bound of the 95% bootstrap confidence interval for the HTMT of COMP and SC is only slightly below 0.90). Hence, we conclude that inclusion of the SC moderator in the model does not entail discriminant validity problems. Due to the inclusion of additional constructs in the path model (i.e., SC and the interaction term), the measurement properties of all other constructs in the path model will change (even though changes will likely be marginal). Reanalyzing all measurement models provides support for the measures’ reliability and validity. Note that the measurement model results shown in the Modeling Window stem from Stage 1 of the two-stage approach. The structural model results, however, stem from Stage 2 of the two-stage approach when all constructs are measured with single items. Our next concern is with the size of the moderating effect. As can be seen in Exhibit 7.26, the interaction term has a negative effect on CUSL (−0.071), whereas the simple effect of CUSA on CUSL is 0.467. Jointly, these results suggest that the relationship between CUSA and CUSL is 0.467 for an average level of switching costs. For higher levels of switching costs (e.g., SC is increased by one standard deviation unit), the relationship between CUSA and CUSL decreases by the size of the interaction term (i.e., 0.467 – 0.071 = 0.396). On the contrary, for lower levels of switching costs (e.g., SC is decreased by one standard deviation unit), the relationship between CUSA and CUSL becomes 0.467 + 0.071 = 0.538. To better comprehend the results of the moderator analysis, go to Final Results → Simple Slope Analysis. The simple slope plot that follows visualizes the two-way interaction effect (Exhibit 7.27). The three lines shown in Exhibit 7.27 represent the relationship between CUSA (x-axis) and CUSL (y-axis). The middle line represents the relationship for an average level of the moderator variable SC. The other two lines represent the relationship between CUSA and CUSL for higher (i.e., mean value of SC plus one

266   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) EXHIBIT 7.27  ■  Simple Slope Plot in SmartPLS

standard deviation unit) and lower (i.e., mean value of SC minus one standard deviation unit) levels of the moderator variable SC. As we can see, the relationship between CUSA and CUSL is positive for all three lines as indicated by their positive slope. Hence, higher levels of customer satisfaction go hand in hand with higher levels of customer loyalty. In addition, we can analyze the moderating effect’s slope in greater detail. The upper line, which represents a high level of the moderator construct SC, has a flatter slope while the lower line, which represents a low level of the moderator construct SC, has a steeper slope. This makes sense since the interaction effect is negative. As a rule of thumb and an approximation, the slope of the high level of the moderator construct SC is the simple effect (i.e., 0.467) plus the interaction effect (−0.071), while the slope of the low level of the moderator construct SC is the simple effect (i.e., 0.467) minus the interaction effect (−0.071). Hence, the simple slope plot supports our previous discussion of the negative interaction term: Higher SC levels entail a weaker relationship between CUSA and CUSL, while lower levels of SC lead to a stronger relationship between CUSA and CUSL. Next, we assess whether the interaction term is significant. For this purpose, we run the Percentile Bootstrap procedure with 10,000 Subsamples, Two Tailed testing, and the standard settings for the PLS-SEM algorithm and the missing value treatment. The analysis yields a p value of 0.024 for the path linking the interaction term and CUSL. Similarly, the 95% percentile bootstrap confidence interval of the interaction term’s effect is [−0.132, −0.009]. As the confidence interval does not include zero, we conclude that the interaction effect is significant, providing support for the existence of a moderating effect. Again, note that these results will slightly differ from yours due to the random nature of the bootstrapping process. Overall, these results provide clear support that SC exerts a

Chapter 7  ■  Mediator and Moderator Analysis 

267

significant and negative effect on the relationship between CUSA and CUSL. The higher the switching costs, the weaker the relationship between customer satisfaction and customer loyalty. For the completeness of the results representation, the final step addresses the moderator’s f ² effect size. Recall that Kenny (2018) defines interaction term effect sizes of 0.005, 0.01, and 0.025 as small, medium, and large, respectively. By going to Quality Criteria → f Square in the SmartPLS algorithm results report, we learn that the f ² effect size of the interaction term (i.e., CUSA * SC ) has a value of 0.014 and, thus, a medium effect.

Summary •

Understand the basic concepts of mediation in a PLS-SEM context. Mediation occurs when a third variable, referred to as a mediator construct, intervenes between two other related constructs. More precisely, a change in the exogenous construct results in a change in the mediator construct, which, in turn, affects the endogenous construct in the model. Analyzing the strength of the mediator construct’s relationships with the other constructs enables the researcher to better understand the mechanisms that underlie the relationship between an exogenous construct and an endogenous construct. In the simplest form, the path model analysis considers only one mediator construct, but the model can involve multiple mediator constructs that can be analyzed simultaneously.



Execute a mediation analysis using SmartPLS. Mediating effects must be theoretically postulated a priori. The analysis then focuses on testing such hypothesized relationships empirically. Researchers distinguish between five types of mediation and nonmediation: direct-only nonmediation, no-effect nonmediation, complementary mediation, competitive mediation, and indirectonly mediation. Testing for the type of mediation requires a series of analyses to assess whether the indirect effect or direct effect are significant.



Comprehend the basic concepts of moderation in a PLS-SEM context. Moderation occurs when the strength or even the direction of a relationship between two constructs depends on a third variable. In other words, the nature of the relationship differs depending on the values of the third variable. Thus, the relationship in our example is not the same for all customers but differs depending on the moderating variable, which could be, for example, customer income, age, gender, and so forth. As such, moderation can (and should) be seen as a means to account for heterogeneity in the data.



Use the SmartPLS software to run a moderation analysis. Modeling moderator variables in PLS-SEM requires researchers to include an interaction term

(Continued)

268   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued) that accounts for the interrelation between the exogenous construct and the moderator variable. The product indicator approach, the orthogonalizing approach, and the two-stage approach are three popular approaches to model the interaction term. The product indicator and orthogonalizing approaches are restricted to setups where the exogenous construct and moderator variable are both measured reflectively. The two-stage approach can be used when reflective and formative measures are involved. The two-stage approach with standardized data and a standardized interaction term should generally be preferred. •

Get to know the principles of a moderated mediation. Moderated mediation occurs when a moderator variable interacts with a mediator construct, such that the value of the indirect effect changes depending on the value of the moderator variable. To analyze the moderator’s impact on the indirect effect as a whole rather than on one element of the mediating effect in isolation, researchers should apply Hayes’s (2015) index of moderated mediation. This index comes in different forms, depending on the type of moderated mediation considered.

Review Questions 1. What does mediation mean? 2. What are the necessary conditions for substantiating an indirect-only mediation? 3. What is the difference between complementary and competitive mediation? 4. What is the interaction term in a moderator analysis, and what does its value mean? 5. What is the most versatile approach for creating an interaction term with regard to the measurement models of the exogenous construct and the moderator variable? 6. What are conditional process models? 7. What is a slope plot?

Critical Thinking Questions 1. Give an example of a mediation model and establish the relevant hypotheses. 2. How would you conduct an analysis with multiple mediators? 3. Why is PLS-SEM superior to PROCESS when analyzing path models with latent variables?

Chapter 7  ■  Mediator and Moderator Analysis 

269

4. Why is it necessary to draw a direct relationship between the moderator and the endogenous construct? 5. Explain what the path coefficients in a moderation model mean. 6. Explain the similarities and differences between mediation and moderation. 7. Why is a mediated moderation problematic from a conceptual point of view?

Key Terms Cascaded moderator analysis  248 Categorical moderator variable  245 Competitive mediation  234 Complementary mediation  234 Conditional indirect effect  257 Conditional process models  257 Continuous moderator variable  246 Direct-only nonmediation  234 Full mediation  234 Heterogeneity 229 Inconsistent mediation  235 Index of moderated mediation  258 Indirect-only mediation  234 Individual mediating effect  239 Interaction effect  245 Interaction term  248 Joint mediating effect  239 Main effect  250 Mediated moderation  259 Mediating effect  229 Mediation 228 Mediation model  232 Mediator construct  228

Moderated mediation  257 Moderating effect  247 Moderation 229 Moderator variable  243 Multiple mediation analysis  238 Multiple moderator model  248 No-effect nonmediation  234 Orthogonalizing approach  250 Partial mediation  234 Product indicator approach  249 Product indicators  249 Serial mediating effect  239 Simple effect  247 Single mediation analysis  238 Slope plot  256 Sobel test  236 Specific indirect effect  238 Suppressor variable  234 Three-way interaction  248 Total indirect effect  238 Two-stage approach (moderation analysis) 251 Two-way interaction  248

Suggested Readings Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. (Continued)

270   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued) Becker, J.-M., Ringle, C. M., & Sarstedt, M. (2018). Estimating moderating effects in PLS-SEM and PLSc-SEM: Interaction term generation*data treatment. Journal of Applied Structural Equation Modeling, 2(2), 1–21. Cepeda-Carrión, G., Nitzl, C., & Roldán, J. L. (2017). Mediation analyses in partial least squares structural equation modeling: Guidelines and empirical examples. In H. Latan, & R. Noonan (Eds.), Partial least squares path modeling: Basic concepts, methodological issues and applications (pp. 173–195). Cham: Springer. Hayes, A. F. (2015). An index and test of linear moderated mediation. Multivariate Behavioral Research, 50(1), 1–22. Hayes, A. F. (2018). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (2nd ed.). New York, NY: Guilford. Hayes, A. F., & Rockwood, N. J. (2020). Conditional process analysis: Concepts, computation, and advances in the modeling of the contingencies of mechanisms. American Behavioral Scientist, 64(1), 19–54. Henseler, J., & Fassott, G. (2010). Testing moderating effects in PLS path models: An illustration of available procedures. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 713–735). Berlin: Springer. Memon, M. A., Cheah, J.-H., Ramayah, T., Ting, H., & Chuah, F. (2018). Mediation analysis: Issues and recommendations. Journal of Applied Structural Equation Modeling, 2(1), i–ix. Memon, M. A., Cheah, J.-H., Ramayah, T., Ting, H., Chuah, F., & Cham, T. H. (2019). Moderation analysis: Issues and guidelines. Journal of Applied Structural Equation Modeling, 3(1), i–ix. Nitzl, C., Roldán, J. L., & Cepeda-Carrión, G. (2016). Mediation analyses in partial least squares structural equation modeling: Helping researchers discuss more sophisticated models. Industrial Management & Data Systems, 116(9), 1849–1864. Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in simple and multiple mediator models. Behavior Research Methods, 40(3), 879–891. Preacher, K. J., Rucker, D. D., & Hayes, A. F. (2007). Assessing moderated mediation hypotheses: Theory, methods, and prescriptions. Multivariate Behavioral Research, 42(1), 185–227. Rigdon, E. E., Ringle, C. M., & Sarstedt, M. (2010). Structural modeling of heterogeneous data with partial least squares. In N. K. Malhotra (Ed.), Review of marketing research (pp. 255–296). Armonk, NY: Sharpe. Sarstedt, M., Hair, J. F., Nitzl, C., Ringle, C. M., & Howard, M. C. (2020). Beyond a tandem analysis of SEM and PROCESS: Use PLS-SEM for mediation analyses! International Journal of Market Research, 62(3), 288–299. Zhao, X., Lynch, J. G., & Chen, Q. (2010). Reconsidering Baron and Kenny: Myths and truths about mediation analysis. Journal of Consumer Research, 37(2), 197–206.

8 OUTLOOK ON ADVANCED METHODS LEARNING OUTCOMES 1. Comprehend the usefulness of a PLS-SEM importance–performance map analysis (IPMA). 2. Learn how to analyze necessity statements in a PLS-SEM context. 3. Understand higher-order constructs and how to apply this concept in PLS-SEM. 4. Evaluate the mode of measurement model with confirmatory tetrad analysis in PLS-SEM (CTA-PLS). 5. Grasp the concept of endogeneity and its treatment when applying PLS-SEM. 6. Understand multigroup analysis in PLS-SEM. 7. Learn techniques to identify and treat unobserved heterogeneity. 8. Understand measurement model invariance and its assessment in PLS-SEM. 9. Become familiar with consistent PLS-SEM (PLSc-SEM).

CHAPTER PREVIEW This primer focuses on PLS-SEM’s foundations. With the knowledge gained from Chapters 1–7, researchers have the understanding for using more advanced 271

272   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

techniques that complement the basic PLS-SEM analyses (also see Table 1 in Ghasemy, Teeroovengadum, Becker, & Ringle, 2021). Moreover, while Chapter 7 introduced the broadly applied mediator and moderator analysis techniques, this chapter provides a brief overview of some other useful and less frequently used advanced methods. The first topic, the importance–performance map analysis (IPMA), represents a particularly valuable tool to extend the results presentation of the standard PLSSEM estimations by contrasting the total effects of the latent variables on a specific target variable with their latent variable scores. The graphical representation of outcomes enables researchers to easily identify critical areas of attention and action. The second topic addresses recent methodological research has also extended the PLS-SEM method to accommodate a sufficiency logic in which an antecedent construct may be sufficient to produce a certain outcome but may not be necessary. Specifically, researchers have jointly used PLS-SEM and necessary condition analysis to derive a nonlinear ceiling function, which facilitates disclosing the minimum level of an antecedent construct required to obtain a certain level of the endogenous construct. The third topic introduces higher-order constructs that measure concepts on different levels of abstraction in a PLS path model. From a conceptual perspective, using higher-order constructs is often more appropriate than relying on standard one-dimensional constructs. Their application typically facilitates reducing the number of structural model relationships, making the PLS path model more parsimonious and easier to grasp. The fourth topic covered is confirmatory tetrad analysis—a useful tool to empirically substantiate the mode of a latent variable’s measurement model (i.e., formative or reflective). The application of confirmatory tetrad analysis enables researchers to avoid incorrect measurement model specifications. The fifth topic that has raised considerable research interest is endogeneity, which occurs when a predictor construct is correlated with the error term of the dependent construct to which it is related. Endogeneity is of particular concern for explanatory analyses as it may entail biased parameter estimates and trigger type I and type II errors. Researchers have proposed various approaches for identifying and treating endogeneity, which can be generalized to a PLS-SEM context. The sixth topic summarizes ways to deal with heterogeneity in the data. We first discuss multigroup analysis, which enables testing for significant differences among path coefficients, typically between two groups. Moreover, we deal with unobserved heterogeneity, which if neglected is a threat to the validity of PLS-SEM results. We also introduce standard as well as more recently proposed latent class techniques and make recommendations regarding their use. Comparisons of PLS-SEM results across different groups are only reasonable if measurement invariance is confirmed. For this purpose, the measurement invariance of composites procedure provides a useful tool in PLS-SEM. Finally, we provide an overview of consistent PLS-SEM, which applies a correction for attenuation to PLS path coefficients. When applied, PLS path models with reflectively measured latent variables estimate results that are the same as CB-SEM, while retaining some of the well-known advantages of PLS-SEM. We discuss most topics in

Chapter 8  ■  Outlook on Advanced Methods  

273

greater detail in our book Advanced Issues in Partial Least Squares Structural Equation Modeling (Hair, Sarstedt, Ringle, & Gudergan, 2018). Also, the article on structural robustness checks in PLS-SEM by Sarstedt, Ringle et al. (2020) further highlights some of the topics discussed in this chapter.

IMPORTANCE–PERFORMANCE MAP ANALYSIS The importance–performance map analysis (IPMA)—also referred to as importance–performance matrix analysis and impact–performance map analysis—extends the standard PLS-SEM results reporting of path coefficient estimates by adding a dimension to the analysis that considers the average values of the latent variable scores (e.g., Kristensen, Martensen, & Grønholdt, 2000; Slack, 1994; Völckner, Sattler, Hennig-Thurau, & Ringle, 2010). More precisely, the IPMA contrasts structural model total effects on a specific target construct (i.e., a specific endogenous latent variable in the PLS path model such as Y4 in Exhibit 8.1) with the average latent variable scores of this construct’s predecessors (e.g., Y1, Y2, and Y3 in Exhibit 8.1). The total effects represent the predecessor constructs’ importance in shaping the target construct (Y4), while their average latent variable scores represent their performance. The goal is to identify predecessors that have a relatively high importance for the target construct (i.e., those that have a strong total effect) but also a relatively low performance (i.e., low average latent variable scores). The aspects underlying these constructs represent potential areas of improvement that may receive high attention. Here, we do not explain all of the technical details of IPMA but refer the interested reader to the comprehensive explications in Ringle and Sarstedt (2016). EXHIBIT 8.1  ■  IPMA Model

Y1 (56)

0.50 0.25 Y3 (76)

0.50 0.25 Y2 (82)

0.50

0.25

Y4 (69)

274   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

An IPMA relies on total effects and the rescaled latent variable scores, both in an unstandardized form. Rescaling the latent variable scores is important to facilitate the comparison of latent variables measured on different scale levels. For example, rescaling is relevant if the indicators of one construct use an interval scale with values from 1 to 5 (e.g., Y1) while the indicators of another construct (e.g., Y2) use an interval scale with values from 1 to 7. Rescaling adjusts each latent variable score so that it can take on values between 0 and 100 (e.g., Höck, Ringle, & Sarstedt, 2010; Kristensen, Martensen, & Grønholdt, 2000). The mean values of these scores indicate the construct’s performance, with 0 representing the lowest and 100 representing the highest performance. Since most researchers are familiar with interpreting percentage values, this kind of performance scale is easy to understand. In our example, Y1 has a performance of 56, Y2 of 76, Y3 of 82, and Y4 of 69. Hence, constructs Y2 and Y3 show a relatively high performance, while Y4 and Y1 have a medium and low (relative) performance, respectively. Next, we need to determine each predecessor construct’s importance in terms of its total effect on the target construct. Recall that the total effect of a relationship between two constructs is the sum of the direct and indirect effects in the structural model. For example, to determine the total effect of Y1 on Y4 (Exhibit 8.1), we have to consider the direct effect between these two constructs (0.50) and the following three indirect effects via Y2 and Y3, respectively: Y1 → Y2 → Y4

= 0.50 ∙ 0.50

= 0.25,

Y1 → Y2 → Y3 → Y4 = 0.50 ∙ 0.25 ∙ 0.25 = 0.03125, and Y1 → Y3 → Y4

= 0.25 ∙ 0.25

= 0.0625.

Adding up the individual indirect effects yields the total indirect effect of Y1 on Y4, which is approximately 0.34. Therefore, the total effect of Y1 on Y4 is 0.84 (0.50 + 0.34). This total effect expresses Y1’s importance in predicting the target construct Y4. Exhibit 8.2 summarizes the direct, indirect, and total effects of the constructs Y1, Y2, and Y3 on the target construct Y4 as shown in Exhibit 8.1. Note that in an IPMA, the direct, indirect, and total effects (just like the latent variable scores) come in an unstandardized form and can take on values much greater than 1. The use of unstandardized total effects allows us to interpret the IPMA in the following way: A one-unit increase of the predecessor’s performance increases the performance of the target construct by the size of the predecessor’s unstandardized total effect, if everything else remains equal (ceteris paribus). In the final step, we combine the importance and performance data, which we summarize in Exhibit 8.3. Using the IPMA data allows us to create an importance-performance map as shown in Exhibit 8.4. The x-axis represents the (unstandardized) total effects of Y1, Y2, and Y3 on the target construct Y4 (i.e., their importance). The y-axis depicts the average rescaled (and unstandardized) latent variable scores of Y1, Y2, and Y3 (i.e., their performance).

Chapter 8  ■  Outlook on Advanced Methods  

275

EXHIBIT 8.2  ■  Direct, Indirect, and Total Effects in the IPMA Predecessor Construct

Direct Effect on Y4

Indirect Effect on Y4

Total Effect on Y4

Y1

0.50

0.34

0.84

Y2

0.50

0.06

0.56

Y3

0.25



0.25

EXHIBIT 8.3  ■  Summary of the IPMA Data Importance

Performance

Y1

0.84

56

Y2

0.56

76

Y3

0.25

82

EXHIBIT 8.4  ■  Importance–Performance Map for the Target Construct Y4 100 95 90 Performance

85

Y3

80

Y2

75 70 65 60

Y1

55 50

0.0

0.2

0.4

0.6

0.8

1.0

Importance

As can be seen in Exhibit 8.4, constructs in the lower right area of the importance–performance map have a high importance for the target construct but show a low performance. Hence, there is a particularly high potential for

276   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

improving the performance of the constructs positioned in this area. Constructs with lower importance, relative to the other constructs in the importance– performance map, have a lower priority for performance improvements. In fact, investing into the performance improvement of a construct that has a very small importance for the target construct would not be logical, since it would have little impact in changing (improving) the target construct. In our example, Y1 is particularly important for explaining the target construct Y4. In a ceteris paribus situation, a one-unit increase in the performance of Y1 increases the performance of Y4 by the value of the total effect, which is 0.84. At the same time, the performance of Y1 is relatively low, so there is substantial room for improvement. Consequently, in the PLS path model example, construct Y1 is the most relevant for managerial actions. IPMA is not limited to the construct level. We can also conduct an IPMA on the indicator level to identify relevant and even more specific areas of improvement. More precisely, we can interpret the unstandardized outer weights as the relative importance of an indicator compared with the other indicators in the measurement model, no matter whether the measurement model is reflective or formative. In our example, such an analysis would be particularly useful for the indicators of the Y1 construct because of its strong total effect on Y4. Applications of the IPMA need to meet two requirements: First, all the indicator coding must have the same direction; a low value represents a negative outcome and a high value a positive outcome. Otherwise, we cannot conclude that higher latent variable values represent a better performance. If this is not the case, the indicator coding needs to be changed by reversing the scale (e.g., on a 5-point scale, 1 becomes 5 and 5 becomes 1, 2 becomes 4 and 4 becomes 2, and 3 remains unchanged). Second, no matter whether the measurement model is formative or reflective, the outer weights must not be negative. If the outer weights are positive, the performance values will be on a scale of 0 to 100. However, if outer weights are significantly negative, the performance values will not be in this specific range but, for example, between −5 and 95. Negative weights might be a result of indicator collinearity. In this case, the researcher may carefully consider removing that indicator (Chapter 5). Finally, the IPMA can be combined with other analyses. For example, Rigdon, Ringle, Sarstedt, and Gudergan (2011) use the IPMA to better compare the group-specific outcomes of a PLS-SEM–based multigroup analysis. In their book on advanced issues of PLS-SEM, Hair, Sarstedt, Ringle, and Gudergan (2018; Chapter 3) present a detailed explanation of the IPMA and a case study illustration on the corporate reputation model example using the SmartPLS 3 software (Ringle, Wende, & Will, 2015); also see Ringle and Sarstedt (2016).

NECESSARY CONDITION ANALYSIS In our interpretation of the corporate reputation example, we inherently followed a sufficiency logic by interpreting the model relationships using expressions such

Chapter 8  ■  Outlook on Advanced Methods  

277

as “competence increases customer satisfaction” or “a higher perceived quality leads to higher competence.” According to a sufficiency logic, a determinant (e.g., competence) may be sufficient to produce the outcome (e.g., customer satisfaction), but it may not be necessary. The absence of competence could be compensated by other determinants, for example, a higher likeability of the company. By contrast, necessity logic implies that an outcome—or a certain level of an outcome—can only be achieved if the necessary cause is in place or is at a certain level. To express necessity, researchers refer to expressions such as “a certain degree of likeability is needed for customer satisfaction,” or “likeability is a precondition for customer satisfaction” (Dul, 2020a). Accordingly, the necessary condition—being a constraint, a bottleneck, or a critical factor—must be satisfied to achieve a certain outcome. Other factors cannot compensate in a situation where a necessary condition is not satisfied. To implement a necessity perspective, Richter, Schubring, Hauff, Ringle, and Sarstedt (2020) proposed using the construct scores from a PLS-SEM analysis as input for a necessary condition analysis (NCA; Dul, 2016). Different from PLS-SEM, which establishes a linear function to express the impact of a set of antecedent constructs on an endogenous construct, the NCA determines a nonlinear ceiling line in a scatterplot of construct scores of an antecedent and an endogenous construct. The ceiling line facilitates identifying a necessary condition by disclosing the minimum level of an antecedent construct (e.g., competence) required to obtain a certain level of the endogenous construct (e.g., customer satisfaction). Researchers document the results in a bottleneck table whose columns represent the conditions that need to be satisfied in order to realize a specific outcome. The NCA also indicates the relative number of observations below the ceiling line (referred to as the ceiling accuracy) as well as whether an antecedent construct represents a statistically necessary condition (referred to as the necessity effect size). The NCA can currently not be run in any standard PLS-SEM software but requires the additional use of the R software package (Dul, 2020b). Richter et al. (2020) discuss the tandem use of PLS-SEM and NCA in detail and derive guidelines for running the analysis.

HIGHER-ORDER CONSTRUCTS The previous chapters dealt with first-order models, which consider a single layer of constructs. In some instances, however, the constructs that researchers wish to examine are quite complex and can also be operationalized at higher levels of abstraction. Establishing such a higher-order construct or higher-order model, also referred to as hierarchical component model in the context of PLS-SEM (Lohmöller, 1989), most often involves testing higher-order structures that contain two layers of constructs (Sarstedt, Hair, Cheah, Becker, & Ringle, 2019). Let us consider the example of the customer satisfaction construct, which can consist of numerous more concrete constructs that capture separate attributes

278   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

of satisfaction (Chapter 2). In the context of services, these might include satisfaction with the quality of the service, the service personnel, the speed of service, or the servicescape. It is then possible to define satisfaction at two levels of abstraction. These concrete components at the first level of abstraction (i.e., first-order) form the more abstract higher-order (i.e., second-order) satisfaction component. There are three main reasons to include higher-order constructs in a PLS path model. First, by establishing higher-order constructs, researchers can reduce the number of relationships in the structural model, making the PLS path model more parsimonious and easier to grasp. Second, higher-order constructs also help to overcome the bandwidth-fidelity dilemma (Cronbach & Gleser, 1965, p. 100), according to which there is a tradeoff “between variety of information (bandwidth) and thoroughness of testing to obtain more certain information (fidelity).” Higher-order constructs provide a means for reducing collinearity among formative indicators by offering a vehicle to rearrange the indicators or constructs across different concrete subdimensions of the more abstract construct. Higher-order constructs have two elements: the higher-order component, which captures the more abstract entity, and the lower-order components, which capture the subdimensions of the higher-order entity. Each higher-order construct type can be characterized by different relationships between (1) the measurement models of the lower-order components, and (2) the relationship between the higher-order component and its lower-order components. For example, the reflective–reflective higher-order construct type has reflectively measured lower-order components and the relationships go from the higher-order component to the lower-order components. Conversely, the reflective–formative higher-order construct type draws on reflectively measured lower-order components, which jointly form the higher-order component. The other two higherorder model types have formatively measured lower-order components, which reflect (formative–reflective higher-order construct) or form (formative– formative higher-order construct) the higher-order component. Exhibit 8.5 illustrates these four higher-order construct types. Researchers typically restrict their modeling approach to two layers of abstraction (i.e., one higher-order component and one layer of lower-order components), but this process can theoretically be extended to any number of layers (Patel, Manley, Hair, Ferrell, & Pieper, 2016). Correspondingly, researchers refer to a second-order construct (e.g., Kocyigit & Ringle, 2011) or third-order construct (e.g., Wetzels, Odekerken-Schroder, & van Oppen, 2009), depending on the numbers of layers of abstraction. An important characteristic of higher-order constructs is that the relationships between the higher- and lower-order components are considered as the higher-order model’s measurement model. That is, even though these relationships appear as structural model relationships since higher- and lower-order components are represented by constructs in the PLS path model, they have to be interpreted as measurement model relationships. In reflective–reflective and

Chapter 8  ■  Outlook on Advanced Methods  

279

EXHIBIT 8.5  ■  Types of Higher-Order Constructs Reflective-Reflective Type x1 x2 x3 x4 x5 x6 x7 x8 x9

LOC1

LOC2

HOC

LOC3

Reflective-Formative Type x1 x2 x3 x4 x5 x6 x7 x8 x9

Formative-Reflective Type x1 x2 x3 x4 x5 x6 x7 x8 x9

LOC1

LOC2

LOC3

HOC

x1 x2 x3 x4 x5 x6 x7 x8 x9

LOC1

LOC2

HOC

LOC3

x1 x2 x3 x4 x5 x6 x7 x8 x9

Formative-Formative Type x1 x2 x3 x4 x5 x6 x7 x8 x9

x1 x2 x3 x4 x5 x6 x7 x8 x9

LOC1

LOC2

LOC3

HOC

x1 x2 x3 x4 x5 x6 x7 x8 x9

Source: Ringle, C. M., Sarstedt, M., and Straub, D. W. (2012). A critical look at the use of PLS-SEM in MIS Quarterly. MIS Quarterly, 36, iii–xiv; permission conveyed through Copyright Clearance Center, Inc. Note: LOC = lower-order component; HOC = higher-order component.

formative–reflective type higher-order constructs, the relationships therefore correspond to indicator loadings, while in reflective–formative and formative– formative higher-order constructs, the relationships correspond to indicator weights. This characteristic introduces a problem in the specification of higherorder constructs in PLS path models. When the lower-order components are conceptually considered as indicators of the higher-order component, how can the higher-order component be statistically identified? In order for the PLSSEM algorithm to run, recall that each construct in a PLS path model needs to have indicators. To resolve this issue, researchers have proposed different approaches for specifying higher-order constructs. The standard approach is to assign all the indicators from the lower-order components to the higher-order component. This repeated indicators approach is also shown in Exhibit 8.5, where each higher-order component reuses the indicators x1 to x9 of its underlying lowerorder components.

280   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Even though the repeated indicators approach is easy to apply in PLS-SEM, its use becomes problematic when a reflective–formative and formative– formative higher-order construct serves as an endogenous construct in a path model (i.e., an arrow goes from another construct to the higher-order component). In this situation, almost all of the variance associated with the higher-order component is explained by its lower-order components, yielding an R² value of (close to) 1.0. As a result, the path coefficient estimates of an exogenous construct explaining the higher-order construct will be close to zero and most certainly nonsignificant (Ringle, Sarstedt, & Straub, 2012). To resolve this issue, Becker, Klein, and Wetzels (2012) have proposed the extended repeated indicators approach, which requires establishing additional relationships from the exogenous construct to all the lower-order components. Instead of interpreting its direct effect, researchers need to interpret its total effect on the higher-order component via the higher-order construct’s lower-order components. As an alternative, researchers have proposed the two-stage approach, which uses the construct scores estimated in the first stage as indicators of the higher-order construct in the second stage. Depending on whether the first stage estimation only considers the lowerorder components or the entire higher-order construct, identified via the repeated indicators approach, researchers distinguish between the disjoint two-stage approach and the embedded two-stage approach. Sarstedt, Hair, Cheah, Becker, and Ringle (2019) provide a discussion and illustration of the different approaches for specifying higher-order constructs. A fundamental challenge in the use of higher-order constructs is the evaluation of its measurement models. Apart from assessing the measurement quality of the lower-order components, researchers also need to consider the higherorder construct as a whole. As the relationships between the higher- and lowerorder components are considered as indicator loadings or weights, researchers need to manually calculate the corresponding statistics to assess the higherorder construct’s measurement quality. For example, in case of reflective– reflective and formative–reflective models, this requires using the indicator loadings as input to manually calculate the average variance extracted and the reliability coefficient ρ A. For these types of higher-order models, researchers also need to establish the discriminant validity by computing the HTMT value of the higher-order construct. In this assessment, the heterotrait-heteromethod correlations correspond to the cross-loadings of the indicators of a construct on the higher-order construct’s lower-order components. Conversely, the monotraitheteromethod correlations correspond to the (construct) correlations of the lower-order components. Sarstedt, Hair, Cheah, Becker, and Ringle (2019) offer a detailed discussion and illustration of how to validate higher-order constructs in PLS-SEM. Hair, Binz Astrachan et al. (2020) discuss discriminant validity assessment in higher-order constructs in greater detail; also see Chapter 2 in Hair, Sarstedt, Ringle, and Gudergan (2018). At the https://www.pls-sem.net/ website

Chapter 8  ■  Outlook on Advanced Methods  

281

of this book, you can download examples of Microsoft Excel spreadsheet files for the manual calculation of evaluation criteria for higher-order constructs and the HTMT criterion.

CONFIRMATORY TETRAD ANALYSIS Measurement model misspecification is a threat to the validity of SEM results (Jarvis, MacKenzie, & Podsakoff, 2003). For example, modeling latent variables reflectively when the conceptualization of the measurement model, and thus the item wordings, should be a formative specification, can result in biased results. The reason is that formative indicators are not necessarily correlated and are often not highly correlated. In addition, formative indicators produce lower outer loadings when represented in a reflective measurement model. Since indicators with lower outer loadings ( 0).

288

Y2

Y1

p23 = 0.30

Y3

p13 = 0.30

Full set of data

EXHIBIT 8.9  ■  Heterogeneity in PLS Path Models

(1)

(1)

p 23 = 0.25

Y3

p 13 = 0.50

Y2

Y1

(2)

Y3 p 23 = 0.35

(2)

p 13 = 0.10

Group 2 (50% of the data)

Y2

Y1

Group 1 (50% of the data)

Chapter 8  ■  Outlook on Advanced Methods  

289

Research has proposed several approaches for comparing two groups of data, which can be classified into the parametric approach and several nonparametric approaches (e.g., Matthews, 2017; Sarstedt, Henseler, & Ringle, 2011). The parametric approach (Keil et al., 2000) was the first approach proposed and has been widely adopted because of its simple implementation. This approach is a modified version of a standard two independent samples t test, which relies on standard errors derived from bootstrapping. As with the standard t test, the parametric approach has two versions (Sarstedt & Mooi, 2019; Chapter 6), depending on whether population variances can be assumed to be equal (homoskedastic) or unequal (heteroskedastic). Prior research suggests the parametric approach is rather liberal as it rejects the null hypothesis of no difference slightly too often compared to an assumed significance level (Klesel, Schubert, Niehaves, & Henseler, 2020; also see Sarstedt, Henseler, & Ringle, 2011). Furthermore, from a conceptual perspective, the parametric approach has limitations since it relies on distributional assumptions, which are inconsistent with the nonparametric nature of PLS-SEM. One example of a nonparametric alternative to multigroup analysis is the permutation test (Chin & Dibbern, 2010). This approach randomly exchanges (i.e., permutes) observations between the groups and re-estimates the model for each permutation. Computing the differences between the group-specific path coefficients per permutation facilitates testing whether the path coefficients also differ in the population. Henseler, Ringle, and Sinkovics (2009) proposed another nonparametric multigroup analysis approach that builds on bootstrapping results. Their PLS-MGA approach compares each bootstrap estimate of one group with all other bootstrap estimates of the same parameter in the other group. By counting the number of occurrences where the bootstrap estimate of the first group is larger than those of the second group, the approach derives a probability value for a one-tailed test. PLS-MGA involves a large number of comparisons of bootstrap estimates (e.g., in a case of 10,000 bootstrap samples, there are 100,000,000 comparisons for each parameter) and reliably tests for group differences. At the same time, the test is geared toward one-sided hypothesis testing. The SmartPLS software supports you in performing the parametric approach, permutation test, and PLSMGA approach when conducting a multigroup analysis (e.g., Ringle, Sarstedt, & Zimmermann, 2011). Klesel, Schubert, Niehaves, and Henseler’s (2020) simulation study shows that the permutation test and the PLS-MGA perform very similar in terms of type I error rates. In addition, they reliably detect large group differences of 0.2 or higher at sample sizes of 600, while the identification of smaller differences requires even larger sample sizes. As the permutation test is slightly more powerful in detecting group differences compared to the PLS-MGA, and also maintains the type I error rate when comparing two groups, we recommend using the permutation test. A potential concern in the application of the permutation test relates to the handling of highly unequal group-specific sample sizes (Matthews, 2017; Sarstedt, Henseler, & Ringle, 2011). However, recent research shows that

290   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

the impact of highly unequal group sizes (i.e., one-third of observations in one group, two-thirds in the other; Klesel, Schubert, Niehaves, & Henseler, 2020) has a very limited impact on the permutation test’s performance, particularly at sample sizes commonly considered in applied research. Nevertheless, if feasible, researchers should attempt to obtain similar group-specific sample sizes as the permutation test performs best under this condition. Finally, Klesel, Schuberth, Henseler, and Niehaves (2019) have proposed an overall model test to evaluate whether the complete model is different across two (or more) groups. The test applies the permutation procedure to compare the average (squared Euclidean or geodesic) distance of the model-implied indicator correlation matrix across the groups. Klesel, Schubert, Niehaves, and Henseler’s (2020) follow-up study provides support for the efficacy of the test when the objective is to compare the overall model across groups—a situation that is, however, seldom encountered in practical applications.

Uncovering Unobserved Heterogeneity Researchers routinely use observable characteristics to partition data into groups and estimate separate models, thereby accounting for observed heterogeneity. However, the sources of heterogeneity in the data are seldom fully known a priori. The analysis on the aggregate data level masks group-specific effects. Consequently, situations arise in which differences related to unobserved heterogeneity prevent the derivation of accurate results. Failure to consider heterogeneity can be a severe threat to the validity of PLS-SEM results (Becker, Rai, Ringle, & Völckner, 2013). If unobserved heterogeneity does not affect the results, researchers can analyze the data on the aggregate level and generalize the PLS-SEM results across groups. Hence, identifying and—if necessary—treating unobserved heterogeneity is of crucial importance when using PLS-SEM. As a result, researchers have long called for the routine application of techniques that facilitate such analyses (Hair, Ringle, & Sarstedt, 2011; Hair, Ringle, & Sarstedt, 2013; Hair, Sarstedt, Ringle, & Mena, 2012). Standard cluster analysis methods such as k-means clustering (Sarstedt & Mooi, 2019; Chapter 9) only focus on the indicator data from a single construct when forming groups of data. They cannot, however, account for the latent variables and their structural model relationships. Moreover, identifying different groups of data for the indicators or construct scores does not necessarily entail uncovering significantly different path model relationships. Hence, well-known clustering methods that only focus on the indicators and latent variable scores usually fail to identify groups in PLS-SEM (Fordellone & Vichi, 2020; Sarstedt & Ringle, 2010). For this reason, research has proposed a wide array of latent class techniques (frequently referred to as response-based segmentation techniques), which generalize, for example, finite mixture, genetic algorithm, or hillclimbing approaches to PLS-SEM (Hair, Sarstedt, Matthews, & Ringle, 2016;

Chapter 8  ■  Outlook on Advanced Methods  

291

Sarstedt, 2008; Sarstedt, Ringle, & Hair, 2017b). Exhibit 8.10 shows the most important latent class techniques for PLS-SEM. The first and most widely applied latent class approach is finite mixture partial least squares (FIMIX-PLS; Hahn, Johnson, Herrmann, & Huber, 2002). Drawing on the mixture regression concept, FIMIX-PLS simultaneously estimates the path coefficients of each observation’s group membership for a predefined number of groups (Ringle, Wende, & Will, 2010; Sarstedt, Ringle, & Gudergan, 2016). Simulation studies show that FIMIX-PLS reliably reveals the existence of heterogeneity in PLS path models and correctly indicates the appropriate number of segments to retain from the data (Sarstedt, Becker, Ringle, & Schwaiger, 2011). At the same time, however, FIMIX-PLS is clearly limited in terms of correctly identifying the underlying segment structure that the groupspecific path coefficients define (Ringle, Sarstedt, Schlittgen, & Taylor, 2013; Ringle, Sarstedt, & Schlittgen, 2014), especially when the path model includes formative measures (Becker, Rai, Ringle, & Völckner, 2013). SmartPLS supports an easy application of FIMIX-PLS to PLS path models created in the software (e.g., Ringle, Sarstedt, & Mooi, 2010; Wilden & Gudergan, 2015). Addressing these limitations, research has proposed a range of other alternatives that assign observations to groups based on some distance criterion. Squillacciotti (2010) introduced the PLS typological path modeling (PLS-TPM) procedure, which Esposito Vinzi, Trinchera, Squillacciotti, and Tenenhaus (2008) advanced by presenting the response-based procedure for detecting unit segments in PLS path modeling (REBUS-PLS). REBUS-PLS gradually reallocates observations from one segment to the other with the goal of minimizing the residuals. In doing so, REBUS-PLS also takes the measurement models into account but is restricted to reflectively measured constructs. Furthermore, PLS-TPM and REBUS-PLS reassign many observations per iteration and conduct a random walk without systematically advancing toward the goal criterion (Ringle, Sarstedt, & Schlittgen, 2014; Ringle, Sarstedt, Schlittgen, & Taylor, 2013). Becker, Rai, Ringle, and Völckner (2013) recognized these issues and presented the prediction-oriented segmentation in PLS-SEM (PLS-POS) approach, which is applicable to all kinds of PLS path models regardless of whether the latent variables are measured reflectively or formatively. Their simulation study shows that PLS-POS performs well for segmentation purposes and provides favorable outcomes compared with alternative segmentation techniques. Researchers can apply PLS-POS by using this method’s implementation in the SmartPLS software. Genetic algorithm segmentation in PLS-SEM (PLS-GAS; Ringle, Sarstedt, & Schlittgen, 2014; Ringle, Sarstedt, Schlittgen, & Taylor, 2013) is another versatile approach to uncover and treat heterogeneity in measurement and structural models. This approach consists of two stages. The first stage uses a genetic algorithm with the objective of finding the partition that minimizes the endogenous latent variables’ unexplained variance. The advantage of implementing a

292

FIMIX-PLS

Finite mixture

PLS-TPM REBUS-PLS

PLS-GAS

Distance-based

PLS-POS

Latent class techniques in PLS-SEM

PLS typological regression approaches

EXHIBIT 8.10  ■  Latent Class Techniques

PLS-SEMKM

PLS-IRRS

Robust regression

Chapter 8  ■  Outlook on Advanced Methods  

293

genetic algorithm is that it has the capability to escape local optimum solutions and thereby covers a wide area of the potential search space before delivering a final best solution. In the second stage, a deterministic hill-climbing approach provides an even better solution. PLS-GAS returns excellent results that usually outperform the outcomes of alternative segmentation methods, but it has the downside that it is computationally demanding. For the latter reason, Schlittgen, Ringle, Sarstedt, and Becker (2016) introduced the iterative reweighted regressions segmentation (PLS-IRRS) method. PLS-IRRS builds on Schlittgen’s (2011) clusterwise robust regression, which determines weights to reduce the influence of observations with extreme values and mitigate the influence of outliers in the data set. In the adaptation of this concept for PLS-SEM–based segmentation, outliers are not treated as such but as their own segment. When robust regression identifies a group of similar outliers, they may therefore become a data group of their own and represent a segmentspecific PLS-SEM solution. At the same time, PLS-IRRS accentuates the impact of inhomogeneous observations in the computation of segment-specific PLSSEM solutions. Like PLS-POS and PLS-GAS, PLS-IRRS is generally applicable to all kinds of PLS path models. Moreover, the method returns excellent results in terms of parameter recovery and predictive power, which fully match those of PLS-GAS (Schlittgen, Ringle, Sarstedt, & Becker, 2016). The key advantage of PLS-IRRS is its speed. In comparison with PLS-GAS, PLS-IRRS is more than 5,000 times faster while providing highly similar results. Because of their performance documented in various simulation studies and their existing implementation in SmartPLS, Sarstedt, Ringle, and Hair (2017b) suggest using a combination of FIMIX-PLS and PLS-POS. To start with, researchers should apply FIMIX-PLS, which provides segment retention criteria to determine the number of segments. Sarstedt, Becker, Ringle, and Schwaiger (2011) researched the performance of these segment retention criteria in depth, providing recommendations regarding their use. The FIMIX-PLS solution (i.e., the group assignment based on the probabilities of membership) then serves as a starting solution for running PLS-POS thereafter. PLS-POS improves the FIMIX-PLS solution and facilitates considering heterogeneity in the formative measurement models of latent variables. See Sarstedt, Ringle, and Hair (2017b) for detailed guidelines on how to jointly apply FIMIX-PLS and PLS-POS in latent class analyses. Matthews, Sarstedt, Hair, and Ringle (2016) present a case study illustrating the use of FIMIX-PLS, drawing on the corporate reputation model and data set of this book. The latent class approaches discussed above identify homogeneous segments in terms of their structural or measurement model relations. However, these differences in relations do not necessarily translate into significant mean differences in the latent variable scores. Addressing this concern, Fordellone and Vichi (2020) proposed the partial least squares k-means (PLS-SEM-KM) method, which facilitates identifying groups of data that maximize score differences while at the same time accounting for structural and measurement model heterogeneity. Their

294   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

method draws on reduced k-means clustering (De Soete & Carroll, 1994) using Euclidean distances to iteratively partition the observations into a predefined number of clusters, which are homogeneous in terms of their measurement and structural model relations.

MEASUREMENT MODEL INVARIANCE A primary concern in multigroup analyses is ensuring measurement invariance, also referred to as measurement equivalence. By establishing measurement invariance, researchers can be confident that group differences in model estimates do not result from the distinctive content and/or meanings of the latent variables across groups. For example, variations in the structural relationships between latent variables could stem from different meanings the groups’ respondents attribute to the phenomena being measured, rather than the true differences in the structural relationships. Hult et al. (2008, p. 1028) describe these concerns and conclude that “failure to establish data equivalence is a potential source of measurement error” (i.e., discrepancies between what is intended to be measured and what is actually measured). When measurement invariance is not established, it can reduce the power of statistical tests, influence the precision of estimators, and provide misleading results. In short, when measurement invariance is not assessed and demonstrated, any conclusions about model relationships are questionable. Hence, multigroup comparisons require establishing measurement invariance to ensure the validity of outcomes and conclusions. Researchers have suggested a variety of methods to assess measurement invariance for covariance-based SEM. Multigroup confirmatory factor analysis based on the guidelines of Steenkamp and Baumgartner (1998) and Vandenberg and Lance (2000) is by far the most common approach to invariance assessment. However, the well-established measurement invariance techniques used to assess CB-SEM’s common factor models cannot be readily transferred to PLS-SEM’s composite models. For this reason, Henseler, Ringle, and Sarstedt (2016) developed the measurement invariance of composite models (MICOM) procedure, which involves three steps: (1) configural invariance (i.e., equal parameterization and way of estimation), (2) compositional invariance (i.e., similar composite scores), and (3) equality of composite mean values and variances. The three steps are hierarchically interrelated, as displayed in Exhibit 8.11. The SmartPLS software supports you in performing the steps of the MICOM procedure. Step 1 addresses the establishment of configural invariance to ensure that a composite has been specified equally for all the groups and emerges as a unidimensional entity in the same nomological net across all the groups. An initial qualitative assessment of the composites’ specification across all the groups must ensure the use of (1) identical indicators per measurement model, and (2) identical data treatment and algorithm settings. Configural invariance is a precondition for compositional invariance (Step 2), which focuses on analyzing whether a composite is formed equally across the groups. When the indicator weights

Chapter 8  ■  Outlook on Advanced Methods  

295

EXHIBIT 8.11  ■  MICOM Procedure

Step 1: Configural invariance?

No

No measurement invariance: The composite does not exist in all groups; the multigroup analysis is not meaningful.

Yes Step 2: Compositional invariance?

No

Yes No Step 3: Equal mean values and variances?

Yes

No measurement invariance: The composite is formed differently across groups; the multigroup analysis is not meaningful. Partial measurement invariance: The standardized coefficients of the structural model can be compared across groups. Full measurement invariance: The data of the different groups can be pooled, which permits moderator analyses.

are estimated for each group, it is essential to ensure that—despite possible differences in the weights—the scores of a composite are the same. The MICOM procedure applies a statistical test to ensure that the composite scores do not significantly differ across groups. The measurement invariance assessment should only continue with Step 3, the equality assessment of the composites’ mean values and variances, if the previous step’s results support measurement invariance. In the case of equal composite mean values and variances, one can run the analyses on the pooled data level. Even though pooling the data is advantageous from a statistical power perspective, researchers must account for potential structural heterogeneity by including interaction effects that serve as moderators (Chapter 7). In summary, running a multigroup analysis requires establishing configural (Step 1) and compositional (Step 2) invariance. If these two steps do not support measurement invariance, the results and differences of the multigroup analysis are invalid. However, if configural and compositional invariance are established, partial measurement invariance is confirmed, which permits comparing the path coefficient estimates across the groups. In addition, if partial measurement invariance is confirmed and the composites have equal mean values and variances across the groups, full measurement invariance is confirmed, which supports the pooled data analysis. Henseler, Ringle, and Sarstedt (2016) provide full details on the MICOM procedure, including simulation study results and an empirical application; also see Hair, Sarstedt, Ringle, and Gudergan (2018; Chapter 4).

296   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

CONSISTENT PLS-SEM In Chapter 1, we compared CB-SEM and PLS-SEM and noted that structural model relationships in PLS-SEM are generally slightly lower, and measurement model relationships are somewhat higher, compared with CB-SEM. Researchers will never obtain exactly the same results when using PLS-SEM as with CB-SEM and should not expect to. The statistical objective and the measurement philosophy of the two SEM methods are different, and thus the results will always differ. Recent research has tried to merge these two SEM techniques to maintain PLS-SEM’s flexibility in terms of distributional assumptions and handling complex models, while obtaining results that are similar to those of CB-SEM. Two approaches have been proposed in recent research: the consistent and efficient PLSe2 method (Bentler & Huang, 2014) and consistent PLS-SEM (PLSc-SEM; Dijkstra, 2014; Dijkstra & Henseler, 2015a). We summarize the latter approach since it has been analyzed in simulation studies (e.g., Dijkstra & Henseler, 2015b) and can be implemented in SmartPLS. Note that we are not implying that these approaches are better than the regular PLS-SEM. Rather, they are designed specifically for very limited situations where the research objective is to mimic CB-SEM results, which very seldom is the objective when applying PLS-SEM. PLSc-SEM’s objective is to correct the correlation rY 1 ,Y2 between two latent variables Y1 and Y2 for measurement error. More precisely, PLSc-SEM corrects the original estimate to obtain the disattenuated (i.e., consistent) correlation r Yc1 ,Y2 by dividing the latent variable correlation rY 1 ,Y2 by the geometric mean of the latent variables’ reliabilities, measured using the reliability coefficient ρ A (Chapter 4). PLSc-SEM follows a four-step approach. In Step 1, the basic PLS-SEM algorithm is run. These results are then used in Step 2 to calculate the reliability ρ A of all reflectively measured latent variables in the PLS path model (Dijkstra, 2014; Dijkstra & Henseler, 2015b). For formatively measured constructs and single-item constructs, ρ A is set to 1. In Step 3, the consistent reliabilities of all latent variables from Step 2 are used to correct the inconsistent correlation matrix of the latent variables obtained in Step 1. More precisely, one obtains the consistent correlation between two constructs by dividing their correlation from Step 1 by the geometric mean (i.e., the square root of the product) of their reliabilities ρ A. This correction applies to all correlations of reflectively measured constructs. The correlation of two formative and/or single-item constructs remains unchanged. The correction for attenuation only applies when at least one reflectively measured latent variable with a consistent reliability ρ A smaller than 1 is involved in the correlation between two constructs in the PLS path model. In Step 4, the consistent correlation matrix of the latent variables allows re-estimating all model relationships yielding consistent path coefficients, corresponding R² values, and outer loadings. Note that significance testing in

Chapter 8  ■  Outlook on Advanced Methods  

297

PLSc-SEM requires running an adjusted bootstrapping routine, which has also been implemented in SmartPLS. In practical applications, PLSc-SEM results can be substantially influenced by low reliability levels of the constructs. As a result, the standardized PLSc-SEM path coefficients can become very high (in some situations considerably larger than 1). Moreover, in more complex PLS path models, collinearity among the latent variables has a strong negative impact on the PLSc-SEM results. In some instances, the structural model relationships become very small. Finally, bootstrapping can produce extreme outcomes, which result in high standard errors in certain relationships, increasing the type II error rate. In light of these limitations, the question arises as to when researchers should use PLSc-SEM. The PLSc-SEM approach is appropriate if researchers assume the data are obtained from a common factor model (Bollen, 2011; Bollen & Bauldry, 2011). In that case, the objective is to mimic CB-SEM results by assuming the construct can be adequately represented by the common variance of its indicators. Simulation studies for such models reveal that CB-SEM and PLSc-SEM return almost identical results of the estimated coefficients (Dijkstra & Henseler, 2015b). While CB-SEM and PLSc-SEM have approximately the same accuracy of estimated parameters and statistical power, PLSc-SEM retains most of PLS-SEM’s advantageous features (Chapter 2). Among others, PLSc-SEM does not have distributional assumptions, can handle complex models, is less affected by incorrect specifications in (subparts of) the model, and will not encounter convergence problems. At the same time, however, the correction for attenuation in Step 3 changes the path coefficients, which have been derived from the original PLS-SEM estimation and which aim at maximizing the endogenous constructs’ explained variance. As a consequence, any assessment of the model’s predictive power (Chapter 6) using the modified PLSc-SEM construct scores is inconsistent with the original PLS-SEM estimation. Considering that research has emphasized the causal– predictive nature as an integral part of PLS-SEM and the key distinguishing feature from CB-SEM, this limitation is highly problematic. In addition, Sarstedt, Hair, Ringle, Thiele, and Gudergan (2016) show that the bias produced by PLScSEM is considerably higher than that produced by CB-SEM when erroneously using the method on data that stem from a composite model population. In light of these limitations, Hair, Sarstedt, and Ringle (2019, p. 567) conclude that PLScSEM “adds very little to existing knowledge of SEM” and that researchers should revert to the widely recognized and accepted CB-SEM approach when estimating common factor models. Nevertheless, PLSc-SEM is an alternative to the standard CB-SEM estimation when attempting to estimate underidentified models or when convergence problems occur. The same limitations apply to Bentler and Huang’s (2014) PLSe2 method, which builds on the PLSc-SEM results but applies a generalized least squares covariance structure estimation on the modified correlation matrix. As such, the method does not unite the advantages of PLS-SEM and CBSEM as suggested by some researchers (Ghasemy, Jamil, & Gaskin, 2021).

298   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM)

Summary •

Comprehend the usefulness of a PLS-SEM importance–performance map analysis (IPMA). The IPMA extends the standard PLS-SEM results reporting of path coefficient estimates by adding a dimension to the analysis that considers the average values of the latent variable scores. The analysis contrasts the total effects of latent variables on a certain target variable (impact) with their rescaled average latent variable scores (performance). The graphical representation of the outcomes enables researchers to easily identify critical areas of (managerial) attention and action (i.e., constructs with high importance but low performance). The IPMA can also be applied on an indicator data level.



Learn how to analyze necessity statements in a PLS-SEM context. PLSSEM analyses inherently follow a sufficiency logic according to which a determinant may be sufficient to produce the outcome. But it may not be necessary. Complementing this perspective, the necessary condition analysis facilitates analyzing whether an outcome or certain level of an outcome can only be achieved if the necessary cause is in place or is at a certain level. Researchers can use the construct scores from a PLS-SEM analysis as input for a necessary condition analysis to document the condition(s) that must be satisfied to achieve an outcome.



Understand higher-order constructs and how to apply this concept in PLS-SEM. Higher-order constructs, also referred to as hierarchical component models, are used to establish a more general higher-order component that represents or summarizes the information of several lower-order components. Higherorder constructs have become increasingly popular in research since they offer a means of establishing more parsimonious path models. Four major types of higher-order constructs are used to identify the measurement models considered to operationalize the lower-order components and represent the relationships between the higher- and the lower-order components: reflective–reflective, reflective–formative, formative–reflective, and formative–formative. Generally, in reflective–reflective and formative– reflective higher-order constructs, the higher-order component represents a more general construct that simultaneously explains all the underlying lowerorder components (i.e., similar to reflective measurement models; Chapters 2 and 4). Conversely, in reflective–formative and formative–formative higherorder constructs, the higher-order component is formed by the lowerorder components. Researchers have proposed different approaches for specifying and estimating higher-order constructs in PLS-SEM. Specifying a reflective–formative or formative–formative higher-order construct as an endogenous construct in a PLS path model requires additional scrutiny as the standard repeated indicators approach is not applicable in this case. Instead, researchers should draw on the extended repeated indicators approach or the two-stage approach to specify these types of higher-order constructs. In

Chapter 8  ■  Outlook on Advanced Methods  

299

evaluating the measurement model of higher-order constructs, researchers need to treat the relationships between the higher-order component and its lower-order component as elements of the measurement model. •

Evaluate the mode of measurement model with confirmatory tetrad analysis in PLS-SEM (CTA-PLS). CTA-PLS is a useful tool to empirically evaluate a latent variable’s mode of measurement model (i.e., formative or reflective). The test requires at least four indicators per measurement model. In the case of reflective measurement, all model-implied nonredundant vanishing tetrads have a residual value that is not significantly different from zero. However, if at least one of the model-implied nonredundant vanishing tetrads is significantly different from zero, one should consider rejecting the reflective measurement model and, instead, assume a formative specification. The CTA-PLS test enables researchers to empirically evaluate measurement models, thus providing guidance that may enable researchers to avoid measurement model misspecification. When evaluating the mode of a measurement model, you must always include theoretical, conceptual, and practical considerations along with the empirical evidence provided by CTA-PLS.



Grasp the concept of endogeneity and its treatment when applying PLS-SEM. Endogeneity occurs when an antecedent construct is correlated with its endogenous construct’s error term. If the objective of model estimation is fully predictive, endogeneity is not an issue. However, in explanatory models with causal relationships the exact estimation of the coefficients is desirable, and researchers should ensure that endogeneity does not substantially impact their results. As PLS-SEM follows a causal–predictive paradigm, researchers should generally ensure that endogeneity does not substantially impact their results. To check for endogeneity, researchers should draw on the Gaussian copula approach. The main advantage of this approach is that it does not require researchers to identify suitable control or instrumental variables but allows for the direct handling of endogeneity issues.



Understand multigroup analysis in PLS-SEM. Multigroup analysis allows testing whether differences between group-specific path coefficients are statistically significant. Research has proposed several approaches for multigroup analysis, which can be classified into the parametric approach and several nonparametric approaches. Simulation studies provide support for the superiority of the nonparametric permutation test, which should therefore be preferred. Even though the approach is robust against highly unequal group sizes, researchers should aim to realize similar group-specific sample sizes as the permutation test performs best under this condition.



Learn techniques to identify and treat unobserved heterogeneity. Unobserved heterogeneity represents a serious threat to the validity of PLS-SEM results. Research has proposed various approaches to identify and treat heterogeneity

(Continued)

300   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued) that generalize, for example, mixture regression, genetic algorithm, or hillclimbing approaches to PLS-SEM. While FIMIX-PLS constitutes the most widely used approach in the field, more recently proposed methods such as PLS-GAS, PLS-IRRS, and PLS-POS are more versatile and have shown superior performance. In light of the current state of implementation, researchers should use FIMIX-PLS to identify the number of segments to retain from the data and to obtain a starting partition for a subsequent PLS-POS analysis. Both methods have been implemented in SmartPLS. •

Understand measurement model invariance and its assessment in PLS-SEM. Group comparisons are valid if measurement invariance has been established. Thereby, researchers ensure that group differences in model estimates do not result from the distinctive content and/or meanings of the latent variables across groups. When measurement invariance is not demonstrated, any conclusions about model relationships are questionable. The measurement invariances of composites (MICOM) procedure represents a useful tool in PLS-SEM. The procedure comprises three steps that test different aspects of measurement invariance: (1) configural invariance (i.e., equal parameterization and way of estimation), (2) compositional invariance (i.e., similar composite scores), and (3) equality of composite mean values and variances.



Become familiar with consistent partial least squares (PLSc-SEM). PLS-SEM allows for reliably estimating composite models. However, when the data stem from a common factor model, PLS-SEM results are biased. In order to reliably estimate common factor models, researchers can draw on the PLSc-SEM method, which corrects the construct correlations and other model estimates for attenuation. PLSc-SEM therefore allows mimicking CB-SEM results. At the same time, however, the correction of the original PLS-SEM estimates is inconsistent with the method’s causal–predictive nature and entails considerable biases when the data stem from a composite model population. In light of these concerns, PLSc-SEM use should be limited to situations where CB-SEM is not functional—for example, when convergence problems emerge.

Review Questions 1 . What kind of practical implications can you draw from IPMA results? 2. What is the purpose of the NCA? 3. What is a higher-order construct? Visualize each of the four types of higherorder constructs introduced in this chapter. 4. What is the purpose of the CTA-PLS?

Chapter 8  ■  Outlook on Advanced Methods  

301

5. What is the difference between observed and unobserved heterogeneity? Why is the consideration of heterogeneity so important when analyzing PLS path models? 6. How can FIMIX-PLS ensure the validity of your results? 7. Why would you run a multigroup analysis in PLS-SEM?

Critical Thinking Questions 1. Explain the key steps for conducting an IPMA. How do you interpret the results? 2. In what way does the necessity perspective of an NCA complement the standard PLS-SEM analysis results? 3. Why is the repeated indicators approach not applicable in a reflective– formative or formative–formative type higher-order construct in situations where another construct serves as an antecedent of the higher-order construct? 4. Provide practical examples of the four major types of higher-order constructs. 5. In which situations is the consideration of endogeneity particularly important? 6. Critically comment on the following statement: “Measurement invariance is not an issue in PLS-SEM because of the method’s focus on prediction and exploration.” 7. Explain how and when PLSc-SEM may complement and extend PLS-SEM.

Key Terms Bandwidth-fidelity dilemma  278 Cluster analysis  290 Clustering 290 Common factor model  297 Compositional invariance  294 Configural invariance  294 Confirmatory tetrad analysis in PLS-SEM (CTA-PLS)  281

Consistent PLS-SEM (PLSc-SEM) 296 Disjoint two-stage approach  280 Embedded two-stage approach 280 Endogeneity 285 Equality of composite mean values and variances  294

(Continued)

302   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued) Extended repeated indicators approach 280 Finite mixture partial least squares (FIMIX-PLS) 291 Formative–formative higher-order construct 278 Formative–reflective higher-order construct 278 Full measurement invariance  295 Gaussian copula approach  286 Genetic algorithm segmentation in PLS-SEM (PLS-GAS)  291 Hierarchical component model  277 Higher-order component  278 Higher-order construct  277 Higher-order model  277 Importance 273 Importance–performance map  274 Importance-performance map analysis (IPMA)  273 Iterative reweighted regressions segmentation (PLS-IRRS)  293 Latent class techniques  287 Lower-order components  278 Measurement equivalence  294 Measurement invariance  294 Measurement invariance of composite models (MICOM) procedure 294 Measurement model misspecification 281 Model-implied nonredundant vanishing tetrads  283 Necessary condition analysis (NCA) 277

Observed heterogeneity  286 Parametric approach  289 Partial least squares k-means (PLS-SEM-KM) 293 Partial measurement invariance 295 Performance 273 Permutation test  289 PLS typological path modeling (PLS-TPM) 291 PLSe2 296 PLS-MGA 289 Prediction-oriented segmentation in PLS-SEM (PLS-POS)  291 Reflective–formative higher-order construct 278 Reflective–reflective higher-order construct 278 Reliability coefficient ρA 296 Repeated indicators approach  279 Rescaling 274 Response-based procedure for detecting unit segments in PLS path modeling (REBUS-PLS) 291 Response-based segmentation techniques 291 Second-order construct  278 Tetrad 281 Two-stage approach (higher-order constructs) 280 Unobserved heterogeneity  286 Vanishing tetrads  282

Suggested Readings Becker, J.-M., Klein, K., & Wetzels, M. (2012). Hierarchical latent variable models in PLS-SEM: Guidelines for using reflective-formative type models. Long Range Planning, 45(5–6), 359–394. Becker, J.-M., Rai, A., Ringle, C. M., & Völckner, F. (2013). Discovering unobserved heterogeneity in structural equation models to avert validity threats. MIS Quarterly, 37(3), 665–694.

Chapter 8  ■  Outlook on Advanced Methods  

303

Bollen, K. A., & Ting, K.-F. (2000). A tetrad test for causal indicators. Psychological Methods, 5(1), 3–22. Cheah, J.-H., Ting, H., Ramayah, T., Memon, M. A., Cham, T.-H., & Ciavolino, E. (2019). A comparison of five reflective–formative estimation approaches: reconsideration and recommendations for tourism research. Quality & Quantity, 53(3), 1421–1458. Chin, W. W., & Dibbern, J. (2010). A permutation-based procedure for multi-group PLS analysis: Results of tests of differences on simulated data and a cross cultural analysis of the sourcing of information system services between Germany and the USA. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 171–193). Berlin: Springer. Dijkstra, T. K., & Henseler, J. (2015). Consistent partial least squares path modeling. MIS Quarterly, 39(2), 297–316. Gudergan, S. P., Ringle, C. M., Wende, S., & Will, A. (2008). Confirmatory tetrad analysis in PLS path modeling. Journal of Business Research, 61(12), 1238–1249. Hahn, C., Johnson, M. D., Herrmann, A., & Huber, F. (2002). Capturing customer heterogeneity using a finite mixture PLS approach. Schmalenbach Business Review, 54(3), 243–269. Hair, J. F., Binz Astrachan, C., Moisescu, O. I., Radomir, L., Sarstedt, M., Vaithilingam, S., & Ringle, C. M. (2020). Executing and interpreting applications of PLS-SEM: Updates for family business researchers. Journal of Family Business Strategy, forthcoming. Hair, J. F., Sarstedt, M., Matthews, L., & Ringle, C. M. (2016). Identifying and treating unobserved heterogeneity with FIMIX-PLS: Part I - method. European Business Review, 28(1), 63–76. Hair, J. F., Sarstedt, M., & Ringle, C. M. (2019). Rethinking some of the rethinking of partial least squares. European Journal of Marketing, 53(4), 566–584. Hair, J. F., Sarstedt, M., Ringle, C. M., & Gudergan, S. P. (2018). Advanced issues in partial least squares structural equation modeling (PLS-SEM). Thousand Oaks, CA: Sage. Henseler, J., Ringle, C. M., & Sarstedt, M. (2016). Testing measurement invariance of composites using partial least squares. International Marketing Review, 33(3), 405–431. Hult, G. T. M., Hair, J. F., Dorian, P., Ringle, C. M., Sarstedt, M., & Pinkwart, A. (2018). Addressing endogeneity in marketing applications of partial least squares structural equation modeling. Journal of International Marketing, 26(3), 1–21. Matthews, L. (2017). Applying multigroup analysis in PLS-SEM: A step-by-step process. In H. Latan & R. Noonan (Eds.), Partial least squares structural equation modeling: Basic concepts, methodological issues and applications (pp. 219–243). Cham: Springer. Matthews, L., Sarstedt, M., Hair, J. F., & Ringle, C. M. (2016). Identifying and treating unobserved heterogeneity with FIMIX-PLS: Part II – A case study. European Business Review, 28(2), 208–224.

(Continued)

304   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (Continued) Patel, V. K., Manley, S. C., Hair, J. F., Ferrell, O. C., & Pieper, T. M. (2016). Is stakeholder theory relevant for European firms? European Management Journal, 36(6), 650–660. Richter, N. F., Schubring, S., Hauff, S., Ringle, C. M., & Sarstedt, M. (2020). When predictors of outcomes are necessary: Guidelines for the combined use of PLS-SEM and NCA. Industrial Management & Data Systems, 120(12), 2243–2267. Ringle, C. M., & Sarstedt, M. (2016). Gain more insight from your PLS-SEM results: The importance-performance map analysis. Industrial Management & Data Systems, 116(9), 1865–1886. Ringle, C. M., Sarstedt, M., & Straub, D. W. (2012). A critical look at the use of PLS-SEM in MIS Quarterly. MIS Quarterly, 36(1), iii–xiv. Sarstedt, M., Becker, J.-M., Ringle, C. M., & Schwaiger, M. (2011). Uncovering and treating unobserved heterogeneity with FIMIX-PLS: Which model selection criterion provides an appropriate number of segments? Schmalenbach Business Review, 63(1), 34–62. Sarstedt, M., Hair, J. F., Cheah, J.-H., Becker, J.-M., & Ringle, C. M. (2019). How to specify, estimate, and validate higher-order constructs. Australasian Marketing Journal, 27(3), 197–211. Sarstedt, M., Ringle, C. M., Cheah, J.-H., Ting, H., Moisescu, O. I. & Radomir, L. (2020). Structural model robustness checks in PLS-SEM. Tourism Economics, 26(4), 531–554. Sarstedt, M., Ringle, C. M., & Hair, J. F. (2017). Treating unobserved heterogeneity in PLS-SEM: A multi-method approach. In R. Noonan & H. Latan (Eds.), Partial least squares structural equation modeling: Basic concepts, methodological issues and applications (pp. 197–217). Cham: Springer. Schloderer, M. P., Sarstedt, M., & Ringle, C. M. (2014). The relevance of reputation in the nonprofit sector: The moderating effect of socio-demographic characteristics. International Journal of Nonprofit & Voluntary Sector Marketing, 19(3), 110–126.

Visit the companion site for this book at https://www.pls-sem.net/.

GLOSSARY 10 times rule:  one way to determine the minimum sample size specific to the PLS path model that one needs for model estimation (i.e., 10 times the number of independent variables of the most complex ordinary least squares regression in the structural model or any formative measurement model). The 10 times rule is not a reliable indication of sample size requirements in PLS-SEM and should at best be seen as a rough estimate. While statistical power analyses provide more reliable minimum sample size estimates, researchers should primarily draw on the inverse square root method, which stands out in terms of precision and ease of use. Absolute contribution:  the information an indicator variable provides about the forma-

tively measured item, ignoring all other indicators. The absolute contribution is provided by the loading of the indicator (i.e., its bivariate correlation with the formatively measured construct). Absolute importance: see Absolute contribution. Akaike weights:  the weight of evidence in favor of a certain model being the best model

for the situation at hand given a set of alternative models. Algorithmic options:  offer different ways to run the PLS-SEM algorithm by, for example, selecting between alternative starting values, stop values, weighting schemes, and maximum number of iterations. Alternating extreme pole responses: a suspicious survey response pattern where a

respondent uses only the extreme poles of the scale (e.g., a 7-point scale) in an alternating order to answer the questions. Artifacts:  human-made concepts that are typically measured with formative indicators. AVE: see Average variance extracted. Average variance extracted (AVE):  a measure of convergent validity. It is the degree to which a latent construct explains the variance of its indicators; see Communality (construct). Bandwidth-fidelity dilemma:  a practical dilemma resulting from the trade-off between using measures that will cover the majority of variation in a trait or measures that will assess a few specific traits more precisely. Bayesian information criterion (BIC):  a criterion for model selection among an alternative set of models. The model with the lowest BIC is preferred. Bias-corrected and accelerated (BCa) bootstrap confidence intervals:  a method for con-

structing confidence intervals that adjusts for biases and skewness in the bootstrap

305

306   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) distribution. The method yields very low type I errors but is limited in terms of statistical power. BIC: see Bayesian information criterion. Blindfolding: a sample reuse technique that omits singular elements of the data

matrix and uses the model estimates to predict the omitted part. It is used to compute the Q2 statistic. Bootstrap cases:  these make up the number of observations drawn in every bootstrap run. The number is set equal to the number of valid observations in the original data set. Bootstrap confidence interval:  provides an estimated range of values that is likely to include an unknown population parameter. It is determined by its lower and upper bounds, which depend on a predefined probability of error and the standard error of the estimation for a given set of sample data. When zero does not fall into the confidence interval, an estimated parameter can be assumed to be significantly different from zero for the prespecified probability of error (e.g., 5%). Bootstrap samples: the number of samples drawn in the bootstrapping procedure.

Generally, 10,000 or more samples are recommended. Bootstrapping: a resampling technique that draws a large number of subsamples

from the original data (with replacement) and estimates models for each subsample. It is used to determine standard errors of coefficients to assess their statistical significance without relying on distributional assumptions. Cascaded moderator analysis:  a type of moderator analysis in which the strength of a

moderating effect is influenced by another variable (i.e., the moderating effect is again moderated). Casewise deletion:  an entire observation (i.e., a case or respondent) is removed from the data set because of missing data. Categorical moderator variable: see Multigroup analysis. Categorical scale: see Nominal scale. Causal indicators:  a type of indicator used in formative measurement models. Causal

indicators do not fully form the latent variable but “cause” it. Therefore, causal indicators must correspond to a theoretical definition of the concept under investigation. Causal links:  are directed relationships between constructs, which can be interpreted as causal if supported by strong theory. CB-SEM: see Covariance-based structural equation modeling. CCA: see Confirmatory composite analysis. Centroid weighting scheme:  uses in the first stage of the PLS-SEM algorithm a value of +1 or −1 for relationships between the constructs in the structural model depending on the sign of their correlations; see Weighting scheme. Cluster analysis: see Clustering.

Glossary  307 Clustering:  a class of methods that partition a set of objects with the goal of obtaining high similarity within the formed groups and high dissimilarity across groups. Coding:  the assignment of numbers to scales in a manner that facilitates measurement. Coefficient of determination (R 2):  a measure of the proportion of an endogenous con-

struct’s variance that is explained by its predictor constructs. It indicates a model’s explanatory power with regard to a specific endogenous construct. Collinearity:  arises when two variables are highly correlated. Common factor-based SEM:  a type of SEM method, which considers the constructs as common factors that explain the covariation between its associated indicators. Common factor model:  assumes that only the variance shared by the indicators used to measure a construct (i.e., the common variance) should be used to estimate the construct and its relationship with other constructs in a model. Exploratory factor analysis (EFA), confirmatory factor analysis (CFA), and CB-SEM (also referred to as common factor-based SEM) are the three main types of analyses based on common factor models. Communality (construct): see Average variance extracted (AVE). Communality (item): see Indicator reliability. Competitive mediation:  a situation in mediation analysis that occurs when the indirect effect and the direct effect are both significant and point in opposite directions. Complementary mediation:  a situation in mediation analysis that occurs when the indi-

rect effect and the direct effect are both significant and point in the same direction. Composite-based SEM: a type of SEM method that represents the constructs as composites, formed by linear combinations of sets of indicator variables. Composite indicators: a type of indicator used in formative measurement models.

Composite indicators form the construct (or composite) fully by means of linear combinations. Composite reliability (ρA ):  A measure of internal consistency reliability, which considered a sound tradeoff between the conservative Cronbach’s alpha and the liberal composite reliability (ρC). Composite reliability (ρC ):  a measure of internal consistency reliability, which, unlike Cronbach’s alpha, does not assume equal indicator loadings. It should be above 0.70 (in exploratory research, 0.60 to 0.70 is considered acceptable). Composite scores: see Construct scores. Composite variable:  a linear combination of several variables. Compositional invariance: exists when the composite scores across the groups are

perfectly correlated. Conditional indirect effect: see Moderated mediation. Conditional process models:  combine mediation and moderation analysis. See mediated

moderation and moderated mediation.

308   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Confidence interval: see Bootstrap confidence interval. Configural invariance: exists when constructs are equally parameterized and esti-

mated across groups. Confirmatory: describes applications that aim at empirically testing theoretically developed models. Confirmatory composite analysis (CCA):  a set of analyses used to verify the quality of a

composite measurement of a theoretical concept of interest. Confirmatory tetrad analysis in PLS-SEM (CTA-PLS):  a statistical procedure that allows

for empirically testing the measurement model setup (i.e., whether the measures should be specified reflectively or formatively). Consistency at large:  describes an improvement of precision of PLS-SEM results when

both the number of indicators per measurement model and the number of observations increase, assuming that the data stem from a common factor model. Consistent PLS-SEM (PLSc-SEM):  a variant of the standard PLS-SEM algorithm that

provides consistent model estimates in common factor models by disattenuating the correlations between pairs of latent variables, thereby mimicking CB-SEM results. Construct scores:  columns of data (vectors) for each latent variable that represent a key result of the PLS-SEM algorithm. The length of every vector equals the number of observations in the data set used. Constructs:  measure theoretical concepts that are abstract, complex, and cannot be

directly observed by means of (multiple) items. Constructs are represented in path models as circles or ovals and are also referred to as latent variables. Content specification:  the specification of the scope of the latent variable; that is, the domain of content the indicators are intended to capture. Content validity:  a subjective but systematic evaluation of how well the domain content of a construct is captured by its indicators. Continuous moderator variable:  a variable that affects the direction and/or strength of the relation between an exogenous latent variable and an endogenous latent variable. Continuous moderator variables can also be used to generate categories, which serve as basis for a subsequent multigroup analysis. Control variables:  the variables that researchers seek to keep constant when conduct-

ing research. Convergence: this is reached when the results of the PLS-SEM algorithm do not

change much. In that case, the PLS-SEM algorithm stops when a prespecified stop criterion (i.e., a small number such as 0.00001) that indicates the minimal changes of PLS-SEM computations has been reached. Thus, convergence has been accomplished when the PLS-SEM algorithm stops because the prespecified stop criterion has been reached and not the maximum number of iterations.

Glossary  309 Convergent validity:  the degree to which a reflectively specified construct explains the variance of its indicators (see Average variable extracted). In formative measurement model evaluation, convergent validity refers to the degree to which the formatively measured construct correlates positively with an alternative (reflective or single-item) measure of the same concept (see Redundancy analysis). Correlation weights: See Mode A. Covariance-based structural equation modeling (CB-SEM):  an approach for estimating structural equation models that assumes that the concepts of interest can be represented by common factors. It can be used for theory testing but has clear limitations in terms of testing a model’s predictive power. Coverage error:  occurs when the bootstrapping confidence interval of a parameter does not correspond to its empirical confidence interval. Critical t value: the cutoff or criterion on which the significance of a coefficient is

determined. If the empirical t value is larger than the critical t value, the null hypothesis of no effect is rejected. Typical critical t values are 2.57, 1.96, and 1.65 for significance levels of 1%, 5%, and 10%, respectively (two-tailed tests). Critical value: see Significance testing. Cronbach’s alpha: a measure of internal consistency reliability that assumes equal indicator loadings. Cronbach’s alpha represents a conservative measure of internal consistency reliability. Cross-loadings:  an indicator’s correlation with other constructs in the model. CTA-PLS: see Confirmatory tetrad analysis. Data matrix:  includes the empirical data that are needed to estimate the PLS path

model. The data matrix must have one column for every indicator in the PLS path model. The rows represent the observations with their responses to every indicator on the PLS path model. Degrees of freedom (df):  the number of values in the final calculation of the test statis-

tic that are free to vary. Diagonal lining:  a suspicious survey response pattern in which a respondent uses the available points on a scale (e.g., a 7-point scale) to place the answers to the different questions on a diagonal line. Direct effect:  a relationship linking two constructs with a single arrow between the

two. Direct-only nonmediation:  a situation in mediation analysis that occurs when the direct

effect is significant but not the indirect effect. Disattenuated correlation:  the correlation between two constructs, if they were perfectly measured (i.e., if they were perfectly reliable).

310   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Discriminant validity:  the extent to which a construct is empirically distinct from other constructs in the model. Disjoint two-stage approach:  uses only the lower-order components of a higher-order construct in its first stage to compute the construct scores, which serve as indicators of the higher-order component in the second stage. Effect indicators: see Reflective measurement. Embedded two-stage approach: uses the entire higher-order construct in its first stage to compute the construct scores, which serve as indicators of the higher-order component in the second stage. Empirical t value:  the test statistic value obtained from the data set at hand (here: the

bootstrapping results). See Significance testing. Endogeneity:  occurs when a predictor construct is correlated with the error term of

the dependent construct to which it is related. Endogenous constructs: see Endogenous latent variables. Endogenous latent variables:  serve only as dependent variables or as both independent and dependent variables in a structural model. Equality of composite mean values and variances:  the final requirement to establish full

measurement invariance. Equidistance:  when the distance between data points of a scale is identical. Error terms:  capture the unexplained variance in constructs and indicators when path models are estimated. Evaluation criteria: used to evaluate the quality of the measurement models and the structural model results in PLS-SEM based on a set of nonparametric evaluation criteria and procedures such as bootstrapping. Exact fit test: a model fit test that applies bootstrapping to derive p values of the

Euclidean distance (dL) or geodesic distance (dG) between the observed correlations and the model-implied correlations. Research has shown that these measures are largely unsuitable for detecting model misspecification in situations commonly encountered in applied research. Exogenous constructs: see Exogenous latent variables. Exogenous latent variables:  latent variables that serve only as independent variables

in a structural model. Explained variance: see Coefficient of determination (R 2). Explaining and predicting (EP) theories:  a theory type that implies both understanding

of underlying causes and prediction, as well as description of theoretical constructs and the relationships among them.

Glossary  311 Explanatory power:  provides information about the strength of the assumed causal relationships in a PLS path model. The primary measure for assessing a PLS path model’s explanatory power is the coefficient of determination (R 2). Exploratory:  describes applications that focus on exploring data patterns and identifying relationships. Extended repeated indicators approach:  a method for estimating a formatively speci-

fied higher-order constructs whose higher-order component serves as an endogenous construct in the PLS path model. Also see Repeated indicators approach. ƒ² effect size:  a measure used to assess the relative impact of a predictor construct on

an endogenous construct in terms of its explanatory power. Factor (score) indeterminacy:  means that one can compute an infinite number of sets

of factor scores matching the specific requirements of a certain common factor model. In contrast to their explicit estimation in PLS-SEM, the scores of common factors as assumed in CB-SEM are indeterminate. Factor weighting scheme:  uses the correlations between constructs in the structural

model to determine their relationships in the first stage of the PLS-SEM algorithm; see Weighting scheme. FIMIX-PLS: see Finite mixture partial least squares. Finite mixture partial least squares (FIMIX-PLS):  a latent class approach that allows for

identifying and treating unobserved heterogeneity in PLS path models. The approach applies mixture regressions to simultaneously estimate group-specific parameters and observations’ probabilities of segment membership. First-generation techniques: statistical methods traditionally used by researchers,

such as regression and analysis of variance. Formative measurement: see Formative measurement model. Formative measurement model: a type of measurement model setup in which the

indicators form the construct, and arrows point from the indicators to the construct. The outer weights estimation of formative measurement models usually uses Mode B in PLS-SEM. Formative measures: see Formative measurement model. Formative–formative higher-order construct:  has formatively measured lower-order components and relationships from the lower-order components to the higher-order component. Formative–reflective higher-order construct:  has formatively measured lower-order

components and relationships from the higher-order component to the lower-order components. Fornell-Larcker criterion:  a measure of discriminant validity that compares the square root of each construct’s average variance extracted with its correlations with all other

312   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) constructs in the model. The Fornell-Larcker criterion is largely unsuitable for detecting discriminant validity problems. Full measurement invariance: this is confirmed when (1) configural invariance,

(2) compositional invariance, and (3) equality of composite mean values and variances are demonstrated. Full mediation:  a situation in mediation analysis that occurs when the mediated effect

is significant but not the direct effect. Hence, the mediator variable fully explains the relationship between an exogenous and an endogenous latent variable. Full mediation is also referred to as indirect-only mediation. Gaussian copula approach:  a method for diagnosing and treating endogeneity, which directly models the correlation of an antecedent construct with its endogenous construct’s error term. Genetic algorithm segmentation in PLS-SEM (PLS-GAS):  a distance-based segmentation method in PLS-SEM that builds on genetic algorithms, a search heuristic, which aims to find a good (not necessary the best) solution for the classification problem. Geweke and Meese criterion (GM):  a criterion for model selection among a set of alter-

native models. The model with the lowest GM is preferred. GM: see Geweke and Meese criterion. GoF: see Goodness-of-fit index. Goodness-of-fit index (GoF):  has been developed as an overall measure of model fit for PLS-SEM. However, as the GoF cannot reliably distinguish valid from invalid models and since its applicability is limited to certain model setups, researchers should avoid its use. Heterogeneity:  occurs when the data underlie groups of data characterized by signifi-

cant differences in terms of model parameters. Heterogeneity can be either observed or unobserved, depending on whether its source can be traced back to observable characteristics (e.g., demographic variables) or whether the sources of heterogeneity are not fully known. Heterotrait-heteromethod correlations:  the correlations of the indicators across con-

structs measuring different constructs. Heterotrait-monotrait ratio (HTMT):  a measure of discriminant validity. The HTMT is

the mean of all correlations of indicators across constructs measuring different constructs (i.e., the heterotrait-heteromethod correlations) relative to the (geometric) mean of the average correlations of indicators measuring the same construct (i.e., the monotrait-heteromethod correlations). Hierarchical component models: see Higher-order constructs. Higher-order component: represents a more abstract dimension of a concept in a

higher-order construct.

Glossary  313 Higher-order constructs:  represent a higher-order structure (usually second-order) that contains several layers of constructs and involves a higher level of abstraction. Higher-order constructs involve a more abstract higher-order component related to two or more lower-order components a reflective or formative way. Higher-order models: see Higher-order constructs. Holdout sample:  a subset of a larger data set or a separate data set not used in model

estimation. HTMT: see Heterotrait-monotrait ratio. Hypothesized relationships:  proposed explanations for constructs that define the path

relationships in the structural model. The PLS-SEM results enable researchers to statistically test these hypotheses and thereby empirically substantiate the existence of the proposed path relationships. Importance:  a term used in the context of IPMA. It is equivalent to the unstandardized

total effect of some latent variable on the target variable. Importance–performance map: a graphical representation of the importance–

performance map analysis. Importance–performance map analysis (IPMA):  extends the standard PLS-SEM results reporting of path coefficient estimates by adding a dimension to the analysis that considers the average values of the latent variable scores. More precisely, the IPMA contrasts structural model total effects on a specific target construct with the average latent variable scores of this construct’s predecessors. Inconsistent mediation: see Competitive mediation. Index:  a set of formative indicators used to measure a construct. Index of moderated mediation: quantifies the effect of a moderator on the indirect effect of an exogenous construct on an endogenous construct through a mediator. Indicator reliability:  the square of a standardized indicator’s outer loading. It repre-

sents how much of the variation in an item is explained by the construct and is referred to as the variance extracted from the item. Indicators:  these are directly measured observations (raw data), also referred to as

either items or manifest variables, which are represented in path models as rectangles. They are also available data (e.g., responses to survey questions or collected from company databases) used in measurement models to measure the latent variables. Indirect effect:  represents a relationship between two latent variables via a third (e.g.,

mediator) construct in the PLS path model. If p1 is the relationship between the exogenous latent variable and the mediator variable, and p2 is the relationship between the mediator variable and the endogenous latent variable, the indirect effect is the product of path p1 and path p2.

314   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Indirect-only mediation:  a situation in mediation analysis that occurs when the indirect effect is significant but not the direct effect. Hence, the mediator variable fully explains the relationship between an exogenous and an endogenous latent variable. Indirectonly mediation is also referred to as full mediation. Individual mediating effect:  a type of mediating effect in a multiple mediation model

which only considers one mediator. Initial values:  the values for the relationships between the latent variables and the

indicators in the first iteration of the PLS-SEM algorithm. Since the user typically has no information which indicators are more important and which indicators are less important per measurement model, an equal weight for every indicator in the PLS path model serves well for the initialization of the PLS-SEM algorithm. In accordance, all relationships in the measurement models have an initial value of +1. Inner model: see Structural model. In-sample predictive power: see Coefficient of determination. Interaction effect: see Moderating effect. Interaction term:  an auxiliary variable entered into the path model to account for the

interaction of the moderator variable and the exogenous construct. Internal consistency reliability:  a form of reliability used to judge the consistency of

results across items on the same test. It determines whether the items measuring a construct are similar in their scores (i.e., if the correlations between the items are strong). Interpretational confounding:  a situation in which the empirical meaning of a construct departs from the theoretically implied meaning. Interval scale:  can be used to provide a rating of objects and has a constant unit of measurement so the distance between the scale points is equal. Inverse square root method: a method for determining the minimum sample size requirement, which uses the value of the path coefficient with the minimum magnitude in the PLS path model as input. IPMA: see Importance–performance map analysis. Items: see Indicators. Iterative reweighted regressions segmentation (PLS-IRRS): a particularly fast and high-performing distance-based segmentation method for PLS-SEM. Joint mediating effect:  a type of mediating effect in a multiple mediation model which considers the total indirect effect of an exogenous on an endogenous construct via all mediators. k-fold cross-validation:  a model validation technique for assessing how the results of a PLS-SEM analysis will generalize to an independent data set. The technique

Glossary  315

combines k-1 subsets into a single training sample that is used to predict the remaining subset. Kurtosis:  is a measure of whether the distribution is too peaked (a very narrow distribution with most of the responses in the center). Latent class techniques: statistical methods that facilitate uncovering and treating

unobserved heterogeneity. Various approaches have been proposed, which generalize, for example, finite mixture, genetic algorithm, or hill-climbing approaches to PLS-SEM. Latent variables:  elements of a structural model that are used to represent theoreti-

cal concepts in statistical models. A latent variable that only explains other latent variables (only outgoing relationships in the structural model) is called exogenous, while latent variables with at least one incoming relationship in the structural model are called endogenous. Also see Constructs. Latent variable scores: see Construct scores. Linear regression model (LM) benchmark:  a benchmark used in PLSpredict, derived from regressing an endogenous construct’s indicators on the indicators of all exogenous constructs. The LM benchmark thereby neglects the measurement model and structural configurations. PLS-SEM results are assumed to outperform the LM benchmark. Listwise deletion: see Casewise deletion. Lower-order components:  represent more concrete subdimension of a concept in a

higher-order construct. MAE: see Mean absolute error (MAE). Main effect:  refers to the direct effect between an exogenous and an endogenous con-

struct in the path model without the presence of a moderating effect. After inclusion of the moderator variable, the main effect typically changes in magnitude. Therefore, it is commonly referred to as simple effect in the context of a moderator model. Manifest variables: see Indicators. Maximum number of iterations:  is needed to ensure that the PLS-SEM algorithm stops.

The goal is to reach convergence. But if convergence cannot be reached, the algorithm should stop after a certain number of iterations. This maximum number of iterations (e.g., 300) should be sufficiently high to allow the PLS-SEM algorithm to converge based on the stop criterion. Also see Convergence. Mean absolute error (MAE):  a metric used in PLSpredict, defined as the average absolute

differences between the predictions and the actual observations, with all the individual differences having equal weight.

Mean value replacement:  inserts the sample mean for the missing data. Should only be used when indicators have less than 5% missing values. Measurement:  the process of assigning numbers to a variable based on a set of rules.

316   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Measurement equivalence: see Measurement invariance. Measurement error:  the difference between the true value of a variable and the value

obtained by a measurement. Measurement invariance: refers to whether or not, under different conditions of observing and studying phenomena (e.g., across different groups of respondents), measurement operations yield measures of the same attribute. Measurement invariance of composite models (MICOM) procedure:  a series of tests to

assess invariance of measures (constructs) across multiple groups of data. The procedure comprises three steps that test different aspects of measurement invariance: (1) configural invariance (i.e., equal parameterization and way of estimation), (2) compositional invariance (i.e., similar composite scores), and (3) equality of composite mean values and variances. Measurement model:  an element of a path model that contains the indicators and their

relationships with the constructs and is also called the outer model in PLS-SEM. Measurement model misspecification:  describes the use of a reflective measurement model when it is formative or the use of a formative measurement model when it is reflective. Measurement model misspecification usually yields invalid results and misleading conclusions. Measurement scale:  a tool with a predetermined number of closed-ended responses

that can be used to obtain an answer to a question. Measurement theory: specifies how constructs should be measured with (a set of) indicators. It determines which indicators to use for construct measurement and the directional relationship between construct and indicators. Mediated moderation:  combines a moderator model with a mediation model in that the continuous moderating effect is mediated. Mediating effect:  occurs when a third construct intervenes between two other related

constructs. Mediation:  represents a situation in which one or more mediator construct(s) explain

the processes through which an exogenous construct influences an endogenous construct. Mediation model: see Mediation. Mediator construct: a construct that intervenes between two other directly related

constructs. Metric scale: represents data on a ratio scale and interval scale; see Ratio scale,

Interval scale. Metrological uncertainty:  the dispersion of the measurement values that can be attributed to the object or concept being measured.

Glossary  317 MICOM: see Measurement invariance of composite models (MICOM) procedure. Minimum sample size requirements:  the number of observations needed to represent the underlying population and to meet the technical requirements of the multivariate analysis method used. See inverse square root method. Missing value treatment:  can employ different methods such as mean replacement, EM (expectation-maximization algorithm), and nearest neighbor to obtain values for missing data points in the set of data used for the analysis. As an alternative, researchers may consider deleting cases with missing values (i.e., casewise deletion). Mode A: uses correlation weights to compute composite scores from sets of indicators.

More specifically, the outer weights are the correlation (or single regression) between the construct and each of its indicators. See Reflective measurement. Mode B: uses regression weights to compute composite scores from sets of indicators.

To obtain the weights, the construct is regressed on its indicators. Hence, the outer weighs in Mode B are the coefficients of a multiple regression model. See Formative measurement. Model comparisons: involve establishing and empirically comparing a set of theoretically justified competing models that represent alternative explanations of the phenomenon under research. Model complexity: indicates how many latent variables, structural model relationships, and indicators exist in a PLS path model. Model-implied nonredundant vanishing tetrads: tetrads considered for significance

testing in CTA-PLS. Model overfit: occurs when the model estimates fit the data set used for model

estimation but do not generalize well to other data sets. Model parsimony: see Parsimonious models. Moderated mediation:  combines a mediation model with a moderator model in that the mediator relationship is moderated by a moderator construct. Moderating effect: see Moderation. Moderation:  occurs when the effect of an exogenous latent variable on an endogenous

latent variable depends on the values of a third variable, referred to as a moderator variable, which impacts the relationship. Moderator effect: see Moderation. Moderator variable: see Moderation. Monotrait-heteromethod correlations: the correlations of indicators measuring the

same construct. Multicollinearity: see Collinearity.

318   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Multigroup analysis: a type of moderator analysis where the moderator variable

is categorical (usually with two categories) and is assumed to potentially affect all relationships in the structural model; it tests whether parameters (mostly path coefficients) differ significantly between two or more groups. Research has proposed a range of approaches to multigroup analysis, which rely on the bootstrapping and permutation procedures. Multiple mediation analysis:  describes a mediation analysis in which multiple mediator variables are being included in the model. Multiple moderator model:  describes a moderation analysis in which multiple modera-

tors are being included in the model. Multivariate analysis: a statistical method that simultaneously analyzes multiple

variables. NCA: see Necessary condition analysis. Necessary condition analysis (NCA): a statistical method that facilitates analyzing

whether an outcome or certain level of an outcome can only be achieved if the necessary cause is in place or is at a certain level. No-effect nonmediation:  a situation in mediation analysis that occurs when neither the direct nor the indirect effect is significant. Nominal scale:  a measurement scale in which numbers are assigned that can be used

to identify and classify objects (e.g., people, companies, products, etc.). Nomological validity:  the degree to which a construct behaves as it should in a system of related constructs. Observed heterogeneity:  occurs when the sources of heterogeneity are known and can

be traced back to observable characteristics such as demographics (e.g., gender, age, income). Omission distance D: determines which data points are deleted when applying the

blindfolding (see Blindfolding) procedure. One-tailed test: see Significance testing. Ordinal scale:  a measurement scale in which numbers are assigned that indicate rela-

tive positions of objects in an ordered series. Orthogonalizing approach:  an approach to model the interaction term when including a moderator variable in the model. It creates an interaction term with orthogonal indicators, which are uncorrelated with the indicators of the independent variable and the moderator variable in the moderator model. Outer loadings: the bivariate correlations between a construct and the indicators. They determine an item’s absolute contribution to its assigned construct. Loadings are of primary interest in the evaluation of reflective measurement models but are also interpreted when formative measures are involved.

Glossary  319 Outer model: see Measurement model. Outer weights:  these are the results of a multiple regression of a construct on its set of

indicators. Weights are the primary criterion to assess each indicator’s relative importance in formative measurement models. Outlier: an extreme response to a particular question or extreme responses to all questions. Out-of-sample predictive power: see Predictive power. Pairwise deletion:  uses all observations with complete responses in the calculation of the model parameters. As a result, different calculations in the analysis may be based on different sample sizes, which can bias the results. The use of pairwise deletion should generally be avoided. Parameter settings: see Algorithmic options. Parametric approach:  a type of multigroup analysis representing a modified version of

a two independent samples t test. Parsimonious models:  models with as few parameters as possible for a given quality

of model estimation results. Partial least squares k-means (PLS-SEM-KM): a clustering method that maximizes group-specific latent variable score differences, while at the same time accounting for heterogeneity in the structural and measurement model relations. Partial least squares structural equation modeling (PLS-SEM): a composite-based method to estimate structural equation models. The goal is to maximize the explained variance of the endogenous latent variables. Partial measurement invariance:  this is confirmed when only (1) configural invariance

and (2) compositional invariance are demonstrated. Partial mediation:  occurs when a mediator variable partially explains the relationship

between an exogenous and an endogenous construct. Partial mediation can come in the form of complementary and competitive mediation, depending on the relationship between the direct and indirect effects. Path coefficients:  estimated path relationships in the structural model (i.e., between

the constructs in the model). They correspond to standardized betas in a regression analysis. Path model:  a diagram that visually displays the hypotheses and variable relationships that are examined when structural equation modeling is applied. Path weighting scheme:  uses the results of partial regression models to determine the relationships between the constructs in the structural model in the first stage of the PLS-SEM algorithm; see Weighting scheme.

320   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Percentile method: an approach for constructing bootstrap confidence intervals. Using the ordered set of parameter estimates obtained from bootstrapping, the lower and upper bounds are directly computed by excluding a certain percentage of lowest and highest values (e.g., as determined by the 2.5% and 97.5% bounds in the case of the 95% bootstrap confidence interval). The percentile method should be preferred when constructing confidence intervals. Performance:  a term used in the context of IPMA. It is the mean value of the unstandardized (and rescaled) scores of a latent variable or an indicator. Permutation test:  a type of multigroup analysis. The test randomly permutes observations between the groups and re-estimates the model to derive a test statistic for the group differences. PLS path modeling: see Partial least squares structural equation modeling. PLS regression:  an analysis technique that explores the linear relationships between

multiple independent variables and a single or multiple dependent variable(s). In developing the regression model, it constructs composites from both the multiple independent variables and the dependent variable(s) by means of principal component analysis. PLS typological path modeling (PLS-TPM): a distance-based segmentation method

developed for PLS-SEM. PLSe2:  a variant of the original PLS-SEM algorithm. Similar to PLSc-SEM, it makes

the model estimates consistent in a common factor model sense. PLS-GAS: see Genetic algorithm segmentation in PLS-SEM. PLS-IRRS: see Iterative reweighted regressions segmentation method. PLS-MGA:  a bootstrap-based multigroup analysis technique. PLS-POS: see Prediction-oriented segmentation approach in PLS-SEM. PLSpredict procedure: a holdout-sample-based procedure that generates case-level

predictions on an item or a construct level to facilitate the assessment of a PLS path model’s predictive power. The PLSpredict procedure relies on the concept of k-fold cross-validation. PLS-SEM: see Partial least squares structural equation modeling. PLS-SEM algorithm:  the heart of the method. Based on the PLS path model and the

indicator data available, the algorithm estimates the scores of all latent variables in the model, which in turn serve for estimating all path model relationships. PLS-SEM bias:  refers to PLS-SEM’s property that structural model relationships are slightly underestimated and relationships in the measurement models are slightly overestimated compared to CB-SEM when using the method on common factor model data. This difference can be attributed to the methods’ different handling of the latent variables in the model estimation but is negligible in most settings typically encountered in empirical research.

Glossary  321 PLSc-SEM: see Consistent PLS-SEM. PLS-SEM-KM: see Partial least squares k-means. PLS-TPM: see PLS typological path modeling segmentation. Prediction: see Predictive power. Prediction error:  the difference between a variable’s predicted and original value. Prediction statistics:  quantify the degree of prediction error. Prediction-oriented segmentation in PLS-SEM (PLS-POS):  a distance-based segmenta-

tion method for PLS-SEM. Predictive power:  indicates a model’s ability to predict new observations. Product indicator approach:  an approach to model the interaction term when including

a moderator variable in the model. It involves multiplying the indicators of the moderator with the indicators of the exogenous latent variable to establish a measurement model of the interaction term. The approach is only applicable when both moderator and exogenous latent variables are measured reflectively. Product indicators:  indicators of an interaction term, generated by multiplication of each indicator of the exogenous construct with each indicator of the moderator variable. See Product indicator approach. p value:  in the context of structural model assessment, it is the probability of error

for assuming that a path coefficient is significantly different from zero. In applications, researchers compare the p value of a coefficient with a significance level selected prior to the analysis to decide whether the path coefficient is statistically significant. 2 Qpredict : a metric used in PLSpredict to assess the model’s predictive power. The metric represents a naïve benchmark for the PLS-SEM results. Values greater zero indicate that the PLS-SEM estimation beats the naïve benchmark in terms of prediction.

Q² statistic:  a measure for evaluating structural models. The computation of Q2 draws

on the blindfolding technique, which uses a subset of the available data to estimate model parameters and then predicts the omitted data. Q2 examines whether a model accurately predicts data points not used in the estimation of model parameters. As the measure blends in-sample and out-of-sample predictive power assessment, we advise against its use. R² value: See Coefficient of determination (R2). Ratio scale:  a measurement scale that has a constant unit of measurement and an

absolute zero point; a ratio can be calculated using the scale points. Raw data:  the unstandardized observations in the data matrix that is used for the PLS path model estimation. REBUS-PLS: see Response-based procedure for detecting unit segments in PLS path

modeling.

322   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Redundancy analysis: a method used to assess a formative construct’s convergent

validity. It tests whether a formatively measured construct is highly correlated with a reflective or single-item measure of the same construct. Reflective measure: see Reflective measurement. Reflective measurement: a type of measurement model setup in which measures

represent the effects (or manifestations) of an underlying construct. Causality is from the construct to its measures (indicators). The outer loadings estimation of reflective measurement models usually uses Mode A in PLS-SEM. Reflective–formative higher-order construct: has reflectively measured lower-order components and relationships from the lower-order components to the higher-order component. Reflective measurement model:  See Reflective measurement. Reflective–reflective higher-order construct: has reflectively measured lower-order

components and relationships from the higher-order component to the lower-order component. Regression weights: See Mode B. Relative contribution:  the unique importance of each indicator by partializing the variance of the formatively measured construct that is predicted by the other indicators. An item’s relative contribution is provided by its weight. Relevance of significant relationships: compares the relative importance of predic-

tor constructs to explain endogenous latent constructs in the structural model. Significance is a prerequisite for the relevance, but not all constructs and their significant path coefficients are highly relevant to explain a selected target construct. Reliability:  the consistency of a measure. A measure is reliable (in the sense of test-

retest reliability) when it produces consistent outcomes under consistent conditions. The most commonly used measure of reliability is the internal consistency reliability. Reliability coefficient ρA:  a measure of internal consistency reliability. Repeated indicators approach: a type of measurement model setup in higher-order constructs that reuses the indicators of the lower-order components as indicators of the higher-order component to identify the higher-order construct. Rescaling:  the act of changing the values of a variable’s scale to fit a predefined range

(e.g., 0 to 100). Response-based procedure for detecting unit segments in PLS path modeling (REBUSPLS):  a distance-based segmentation method for PLS-SEM that builds on the PLS-TPM method. Response-based segmentation techniques: see Latent class techniques. RMSE: see Root mean square error (RMSE).

Glossary  323 RMStheta: see Root mean square residual covariance. Root mean square residual covariance (RMStheta):  a model fit measure that is based on the (root mean square) discrepancy between the observed covariance and the modelimplied correlations. In CB-SEM, an SRMR value indicates good fit, but no threshold value has been introduced in a PLS-SEM context yet. Initial simulation results suggest a (conservative) threshold value for the root mean square residual covariance (RMStheta) of 0.12. That is, RMStheta values below 0.12 indicate a well-fitting model, whereas higher values indicate a lack of fit. However, model fit measures should generally be treated with extreme caution in PLS-SEM. Root mean square error (RMSE):  a metric used in PLSpredict, defined as the square root of the average of the squared differences between the predictions and the actual observations. Scale:  a set of reflective indicators used to measure a construct. Secondary data:  data that have already been gathered, often for a different research purpose and some time ago. Second-generation techniques: overcome the limitations of first-generation tech-

niques, for example, in terms of accounting for measurement error. SEM is the most prominent second-generation data analysis technique. Second-order constructs:  a type of higher-order construct with two levels of abstraction. SEM: see Structural equation modeling. Serial mediating effect:  a type of mediating effect in a multiple mediation model which considers a sequence of effects via two or more mediators simultaneously. Significance testing:  the process of testing whether a certain result likely has occurred

by chance (i.e., whether an effect can be assumed to truly exist in the population). Simple effect:  a cause–effect relationship in a moderator model. The parameter esti-

mate represents the size of the relationship between the exogenous and endogenous latent variable when the moderator variable is included in the model. For this reason, the main effect and the simple effect usually have different sizes. Single mediation analysis:  describes a mediation analysis in which only one mediator

variable is being included in the model. Single-item constructs:  constructs that have only a single item measuring them. Since a single-item construct is equal to its measure, the indicator loading is 1.00, making conventional reliability and convergent validity assessments inappropriate. Singular data matrix:  occurs when a variable in a measurement model is a linear com-

bination of another variable in the same measurement model or when a variable has identical values for all observations. In this case, the variable has no variance and the PLS-SEM approach cannot estimate the PLS path model.

324   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Skewness: the extent to which a variable’s distribution is symmetrical around its

mean value. Slope plot:  a type of line chart used to detect changes in linear slopes between groups. Sobel test:  a test that has been proposed to assess the significance of the indirect

effect in a mediation model. However, research has dismissed the Sobel test for evaluating mediation analysis results of regression models and PLS-SEM. Specific indirect effect:  describes an indirect effect via one single mediator in a mul-

tiple mediation model. SRMR: see Standardized root mean square residual. Standard error:  the standard deviation of the sampling distribution of a given statistic.

Standard errors are important to show how much sampling fluctuation a statistic has. Standardized data:  have a mean value of 0 and a standard deviation of 1 (z-standardization). The PLS-SEM method usually uses standardized raw data. Most software tools automatically standardize the raw data when running the PLS-SEM algorithm. Standardized root mean square residual (SRMR):  a model fit measure, which is defined

as the root mean square discrepancy between the observed correlations and the model-implied correlations. Research has shown that the SRMR is largely unsuitable for detecting model misspecification in situations commonly encountered in applied research. Standardized values:  indicate how many standard deviations an observation is above

or below the mean. Statistical power:  the probability to detect a significant relationship when the relation-

ship is in fact significant in the population. Stop criterion: see Convergence. Straight lining:  describes a situation in which a respondent marks the same response for a high proportion of the questions. Structural equation modeling (SEM):  a set of statistical methods used to estimate relationships between constructs and indicators, while accounting for measurement error. Structural model:  includes the construct and their relationships as derived from the-

ory and logic. Structural theory:  specifies how the latent variables are related to each other. That is,

it shows the constructs and the paths between them. Studentized bootstrap method: computes confidence intervals similarly to a confidence interval based on the t distribution, except that the standard error is derived from the bootstrapping results. Sum scores:  represent a naive way to determine the latent variable scores. Instead of

estimating the relationships in the measurement models, sum scores use the same

Glossary  325

weight for each indicator per measurement model (equal weights) to determine the latent variable scores. As such, the sum scores approach does not account form measurement error. Suppressor variable:  describes the mediator variable in competitive mediation, which absorbs a significant share of or the entire direct effect, thereby substantially decreasing the magnitude of the total effect. Tetrad:  the difference of the product of a pair of covariances and the product of another

pair of covariances. In reflective measurement models, this difference is assumed to be zero or at least close to zero (i.e., they are expected to vanish). Nonvanishing tetrads in a latent variable’s measurement model cast doubt on its reflective specification, suggesting a formative specification. Theoretical t value: see Critical t value. Theory:  a set of systematically related hypotheses developed following the scientific

method that can be used to explain and predict outcomes and can be tested empirically. Three-way interaction: an extension of two-way interaction where the moderator effect is again moderated by another moderator variable. TOL: see Variance inflation factor. Tolerance (TOL): see Variance inflation factor. Total effect:  the sum of the direct effect and the indirect effect between an exogenous and an endogenous latent variable in the path model. Total indirect effect:  the sum of all specific indirect effects in a multiple mediation

model. Training sample:  a subset of a larger data set used for model estimation. Two-stage approach (higher-order constructs): an approach to modeling and esti-

mating an higher-order constructs in PLS-SEM, which is particularly useful when a reflective-formative or formative-formative higher-order construct serves as an endogenous construct in the PLS path model. Two-stage approach (moderation analysis):  an approach to model the interaction term when including a moderator variable in the model. The approach can also be used when the exogenous construct and/or the moderator variable are measured formatively. Two-tailed test: see Significance testing. Two-way interaction:  the standard approach to moderator analysis where the modera-

tor variable interacts with one other exogenous latent variable. Unobserved heterogeneity: occurs when the sources of heterogeneous data struc-

tures are not (fully) known. Validity:  the extent to which a construct’s indicators jointly measure what they are supposed to measure.

326   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Vanishing tetrads: see Tetrad. Variance inflation factor (VIF):  quantifies the severity of collinearity among the indica-

tors in a formative measurement model. The VIF of a certain formative measurement model’s indicator i is directly related to the tolerance value (VIFi = 1/tolerancei). Variance-based SEM: see Partial least squares structural equation modeling. Variate: see Composite variable. VIF: see Variance inflation factor. Weighted PLS-SEM (WPLS):  a modified version of the original PLS-SEM algorithm that

allows the researcher to incorporate sampling weights. Weighting scheme:  describes a particular method to determine the relationships in

the structural model when running the PLS-SEM algorithm. Standard options are the centroid, factor, and path weighting schemes. The final results do not differ much, and one should use the path weighting scheme as a default option since it aims at maximizing the R² values of the PLS path model estimation. WPLS: see Weighted PLS-SEM.

REFERENCES Abdi, H. (2010). Partial least squares regression and projection on latent structure regression (PLS-Regression). Wiley Interdisciplinary Reviews: Computational Statistics, 2(1), 97–106. Aguinis, H., Beaty, J. C., Boik, R. J., & Pierce, C. A. (2005). Effect size and power in assessing moderating effects of categorical variables using multiple regression: A 30-year review. Journal of Applied Psychology, 90(1), 94–107. Aguirre-Urreta, M. I., & Rönkkö, M. (2018). Statistical inference with PLSc using bootstrap confidence intervals. MIS Quarterly, 42(3), 1001–1020. Aguirre-Urreta, M. I., Rönkkö, M., & Marakas, G. M. (2016). Omission of causal indicators: Consequences and implications for measurement. Measurement: Interdisciplinary Research and Perspectives, 14(3), 75–97. Ahrholdt, D. C., Gudergan, S. P., & Ringle, C. M. (2019). Enhancing loyalty: When improving consumer satisfaction and delight matters. Journal of Business Research, 94, 18–27. Albers, S. (2010). PLS and success factor studies in marketing. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 409–425). Berlin: Springer. Ali, F., Rasoolimanesh, S. M., Sarstedt, M., Ringle, C. M., & Ryu, K. (2018). An assessment of the use of partial least squares structural equation modeling (PLS-SEM) in hospitality research. The International Journal of Contemporary Hospitality Management, 30(1), 514–538. Avkiran, N. K., & Ringle, C. M. (Eds.) (2018). Partial least squares structural equation modeling: Recent advances in banking and finance. Cham: Springer. Bagozzi, R. P. (2007). On the meaning of formative measurement and how it differs from reflective measurement: Comment on Howell, Breivik, and Wilcox (2007). Psychological Methods, 12(2), 229–237. Bagozzi, R. P., & Philipps, L. W. (1982). Representing and testing organizational theories: A holistic construal. Administrative Science Quarterly, 27(3), 459–489. Bagozzi, R. P., Yi, Y., & Philipps, L. W. (1991). Assessing construct validity in organizational research. Administrative Science Quarterly, 36(3), 421–458. Barclay, D. W., Higgins, C. A., & Thompson, R. (1995). The partial least squares approach to causal modeling: Personal computer adoption and use as illustration. Technology Studies, 2(2), 285–309.

327

328   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. Bascle, G. (2008). Controlling for endogeneity with instrumental variables in strategic management research. Strategic Organization, 6(3), 285–327. Bayonne, E., Marin-Garcia, J. A., & Alfalla-Luque, R. (2020). Partial least squares (PLS) in operations management research: Insights from a systematic literature review. Journal of Industrial Engineering and Management, 13(3), 565–597. Bearden, W. O., Netemeyer, R. G., & Haws, K. L. (2011). Handbook of marketing scales: Multi-item measures of marketing and consumer behavior research. Thousand Oaks, CA: Sage. Becker, J.-M., & Ismail, I. R. (2016). Accounting for sampling weights in PLS path modeling: Simulations and empirical examples. European Management Journal, 34(6), 606–617. Becker, J.-M., Klein, K., & Wetzels, M. (2012). Hierarchical latent variable models in PLS-SEM: Guidelines for using reflective-formative type models. Long Range Planning, 45(5–6), 359–394. Becker, J.-M., Rai, A., & Rigdon, E. E. (2013). Predictive validity and formative measurement in structural equation modeling: Embracing practical relevance. In Proceedings of the 34th International Conference on Information Systems, Milan, Italy. Becker, J.-M., Rai, A., Ringle, C. M., & Völckner, F. (2013). Discovering unobserved heterogeneity in structural equation models to avert validity threats. MIS Quarterly, 37(3), 665–694. Becker, J.-M., Ringle, C. M., & Sarstedt, M. (2018). Estimating moderating effects in PLS-SEM and PLSc-SEM: Interaction term generation*data treatment. Journal of Applied Structural Equation Modeling, 2(2), 1–21. Becker, J.-M., Ringle, C. M., Sarstedt, M., & Völckner, F. (2015). How collinearity affects mixture regression results. Marketing Letters, 26(4), 643–659. Bentler, P. M., & Huang, W. (2014). On components, latent variables, PLS and simple methods: Reactions to Rigdon’s rethinking of PLS. Long Range Planning, 47(3), 138–145. Berneth, J. B., & Aguinis, H. (2016). A critical review and best-practice recommendations for control variable usage. Personnel Psychology, 69(1), 229–283. Binz Astrachan, C., Patel, V. K., & Wanzenried, G. (2014). A comparative study of CB-SEM and PLS-SEM for theory development in family firm research. Journal of Family Business Strategy, 5(1), 116–128. Bollen, K. A. (2011). Evaluating effect, composite, and causal indicators in structural equation models. MIS Quarterly, 35(2), 359–372. Bollen, K. A., & Bauldry, S. (2011). Three Cs in measurement models: Causal indicators, composite indicators, and covariates. Psychological Methods, 16(3), 265–284.

References  329

Bollen, K. A., & Davies, W. R. (2009). Causal indicator models: Identification, estimation, and testing. Structural Equation Modeling, 16(3), 498–522. Bollen, K. A., & Diamantopoulos, A. (2017). In defense of causal-formative indicators: A minority report. Psychological Methods, 22(3), 581–596. Bollen, K. A., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110(2), 305–314. Bollen, K. A., & Ting, K.-F. (1993). Confirmatory tetrad analysis. In P. V. Marsden (Ed.), Sociological methodology (pp. 147–175). Washington, DC: American Sociological Association. Bollen, K. A., & Ting, K.-F. (2000). A tetrad test for causal indicators. Psychological Methods, 5(1), 3–22. Bruner, G. C. (2019). Marketing scales handbook: Multi-item measures for consumer insight research (Volume 10). Fort Worth, TX: CreateSpace Independent Publishing Platform. Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach. Heidelberg: Springer. Cadogan, J. W., & Lee, N. (2013). Improper use of endogenous formative variables. Journal of Business Research, 66(2), 233–241. Cassel, C., Hackl, P., & Westlund, A. H. (1999). Robustness of partial least squares method for estimating latent variable quality structures. Journal of Applied Statistics, 26(4), 435–446. Cenfetelli, R. T., & Bassellier, G. (2009). Interpretation of formative measurement in information systems research. MIS Quarterly, 33(4), 689–708. Cepeda-Carrión, G., Cegarra-Navarro, J.-G., & Cillo, V. (2019). Tips to use partial least squares structural equation modelling (PLS-SEM) in knowledge management. Journal of Knowledge Management, 23(1), 67–89. Cepeda-Carrión, G., Nitzl, C., & Roldán, J. L. (2017). Mediation analyses in partial least squares structural equation modeling: Guidelines and empirical examples. In H. Latan, & R. Noonan (Eds.), Partial least squares path modeling: Basic concepts, methodological issues and applications (pp. 173–195). Cham: Springer. Cheah, J.-H., Roldán, J. L., Ciavolino, E., Ting, H., & Ramayah, T. (2020). Sampling weight adjustments in partial least squares structural equation modeling: Guidelines and illustrations. Total Quality Management & Business Excellence, forthcoming. Cheah, J.-H., Sarstedt, M., Ringle, C. M., Ramayah, T., & Ting, H. (2018). Convergent validity assessment of formatively measured constructs in PLS-SEM. International Journal of Contemporary Hospitality Management, 30(11), 3192–3210. Cheah, J.-H., Ting, H., Ramayah, T., Memon, M. A., Cham, T.-H., & Ciavolino, E. (2019). A comparison of five reflective–formative estimation approaches: Reconsideration and recommendations for tourism research. Quality & Quantity, 53(3), 1421–1458.

330   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Chernick, M. R. (2008). Bootstrap methods: A guide for practitioners and researchers (2nd ed.). New York, NY: Wiley. Chin, W. W. (1998). The partial least squares approach to structural equation modeling. In G. A. Marcoulides (Ed.), Modern methods for business research (pp. 295–358). Mahwah, NJ: Erlbaum. Chin, W. W. (2003). PLS-Graph 3.0. Houston, TX: Soft Modeling, Inc. Chin, W. W. (2010). How to write up and report PLS analyses. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 655–690). Berlin: Springer. Chin, W. W., Cheah, J.-H., Liu, Y., Ting, H., Lim, X.-J., & Cham, T. H. (2020). Demystifying the role of causal-predictive modeling using partial least squares structural equation modeling in information systems research. Industrial Management & Data Systems, 120(12), 2161–2209. Chin, W. W., & Dibbern, J. (2010). A permutation-based procedure for multi-group PLS analysis: Results of tests of differences on simulated data and a cross cultural analysis of the sourcing of information system services between Germany and the USA. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 171–193). Berlin: Springer. Chin, W. W., Kim, Y. J., & Lee, G. (2013). Testing the differential impact of structural paths in PLS analysis: A bootstrapping approach. In H. Abdi, W. W. Chin, V. Esposito Vinzi, G. Russolillo, & L. Trinchera (Eds.), New perspectives in partial least squares and related methods (pp. 221–229). New York, NY: Springer. Chin, W. W., Marcolin, B. L., & Newsted, P. R. (2003). A partial least squares latent variable modeling approach for measuring interaction effects: Results from a Monte Carlo simulation study and an electronic-mail emotion/adoption study. Information Systems Research, 14(2), 189–217. Chin, W. W., & Newsted, P. R. (1999). Structural equation modeling analysis with small samples using partial least squares. In R. H. Hoyle (Ed.), Statistical strategies for small sample research (pp. 307–341). Thousand Oaks, CA: Sage. Cho, G., Hwang, H., Sarstedt, M., & Ringle, C. M. (2020). Cutoff criteria for overall model fit indexes in generalized structured component analysis. Journal of Marketing Analytics, 8, 189–202. Cochran, W. G. (1977). Sampling techniques. New York, NY: Wiley. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Mahwah, NJ: Erlbaum. Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.

References  331

Cole, D. A., & Preacher, K. J. (2014). Manifest variable path analysis: Potentially serious and misleading consequences due to uncorrected measurement error. Psychological Methods, 19(2), 300–315. Cook, R. D., & Forzani, L. (2020). Fundamentals of path analysis in the social sciences. Working Paper. https://arxiv.org/pdf/2011.06436.pdf Crittenden, V., Astrachan, C., Sarstedt, M., Lourenco, C., & Hair, J. F. (2020). Guest editorial: Measurement and scaling methodologies on brand management. Journal of Product and Brand Management, 29(4), 409–414. Cronbach, L. J., & Gleser, G. C. (1965). Psychological tests and personnel decisions. Oxford, England: University of Illinois Press. Danks, N., & Ray, S. (2018). Predictions from partial least squares models. In F. Ali, S. M. Rasoolimanesh, & C. Cobanoglu (Eds.), Applying partial least squares in tourism and hospitality research (pp. 35–52). Bingley: Emerald. Danks, N., Ray, S., & Shmueli, G. (2017). Evaluating the predictive performance of composites in PLS path modeling. Working Paper, available at SSRN: https://ssrn.com/ abstract=3055222 Danks, N. P., Sharma, P. N., & Sarstedt, M. (2020). Model selection uncertainty and multimodel inference in partial least squares structural equation modeling (PLSSEM). Journal of Business Research, 113, 13–24. Davis, F. D. (1989). Perceived usefulness, perceived ease-of-use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–349. Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge, UK: Cambridge University Press. De Soete, G., & Carroll, J. D. (1994). K-means clustering in a low-dimensional Euclidean space. In E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, & B. Burtschy (Eds.), New approaches in classification and data analysis (Studies in Classification, Data Analysis, and Knowledge Organization Book Series, pp. 212–219). Berlin: Springer DeVellis, R. F. (2017). Scale development: Theory and applications (4th ed.). Thousand Oaks, CA: Sage. Diamantopoulos, A. (2006). The error term in formative measurement models: Interpretation and modeling implications. Journal of Modelling in Management, 1(1), 7–17. Diamantopoulos, A., & Riefler, P. (2011). Using formative measures in international marketing models: A cautionary tale using consumer animosity as an example. In M. Sarstedt, M. Schwaiger, & C. R. Taylor (Eds.), Measurement and research methods in international marketing (Advances in International Marketing, 22, pp. 11–30). Bingley: Emerald. Diamantopoulos, A., Riefler, P., & Roth, K. P. (2008). Advancing formative measurement models. Journal of Business Research, 61(12), 1203–1218.

332   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Diamantopoulos, A., Sarstedt, M., Fuchs, C., Kaiser, S., & Wilczynski, P. (2012). Guidelines for choosing between multi-item and single-item scales for construct measurement: A predictive validity perspective. Journal of the Academy of Marketing Science, 40(3), 434–449. Diamantopoulos, A., & Siguaw, J. A. (2006). Formative versus reflective indicators in organizational measure development: A comparison and empirical illustration. British Journal of Management, 17(4), 263–282. Diamantopoulos, A., & Winklhofer, H. M. (2001). Index construction with formative indicators: An alternative to scale development. Journal of Marketing Research, 38(2), 269–277. Dijkstra, T. K. (2010). Latent variables and indices: Herman Wold’s basic design and partial least squares. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 23–46). Berlin: Springer. Dijkstra, T. K. (2014). PLS’ Janus face—response to Professor Rigdon’s “Rethinking partial least squares modeling: In praise of simple methods.” Long Range Planning, 47(3), 146–153. Dijkstra, T. K., & Henseler, J. (2015a). Consistent and asymptotically normal PLS estimators for linear structural equations. Computational Statistics & Data Analysis, 81(1), 10–23. Dijkstra, T. K., & Henseler, J. (2015b). Consistent partial least squares path modeling. MIS Quarterly, 39(2), 297–316. do Valle, P. O., & Assaker, G. (2016). Using partial least squares structural equation modeling in tourism research: A review of past research and recommendations for future applications. Journal of Travel Research, 55(6), 695–708. Drolet, A. L., & Morrison, D. G. (2001). Do we really need multiple-item measures in service research? Journal of Service Research, 3(3), 196–204. Dul, J. (2016). Necessary condition analysis (NCA): Logic and methodology of “necessary but not sufficient” causality. Organizational Research Methods, 19(1), 10–52. Dul, J. (2020a). Conducting necessary condition analysis. London: Sage. Dul, J. (2020b). R package NCA: Necessary condition analysis version 3.03 [computer software]. Retrieved from https://cran.r-project.org/web/packages/NCA/ Ebbes, P., Papies, D., & van Heerde, H. J. (2016). Dealing with endogeneity: A nontechnical guide for marketing researchers. In C. Homburg, M. Klarmann, & A. Vomberg (Eds.), Handbook of market research. Cham: Springer. Eberl, M. (2010). An application of PLS in multi-group analysis: The need for differentiated corporate-level marketing in the mobile communications industry. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 487–514). Berlin: Springer.

References  333

Eberl, M., & Schwaiger, M. (2005). Corporate reputation: Disentangling the effects on financial performance. European Journal of Marketing, 39(7/8), 838–854. Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between constructs and measures. Psychological Methods, 5(2), 155–174. Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185. Efron, B., & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science, 1(1), 54–75. Esposito Vinzi, V., Chin, W. W., Henseler, J., & Wang, H. (Eds.). (2010). Handbook of partial least squares: Concepts, methods and applications (Springer Handbooks of Computational Statistics Series, Vol. II). Berlin: Springer. Esposito Vinzi, V., Trinchera, L., Squillacciotti, S., & Tenenhaus, M. (2008). REBUSPLS: A response-based procedure for detecting unit segments in PLS path modelling. Applied Stochastic Models in Business and Industry, 24(5), 439–458. Evermann, Jörg, & Rönkkö, M. (2021). Recent developments in PLS. Communications of the Association for Information Systems, forthcoming. Evermann, J., & Tate, M. (2016). Assessing the predictive performance of structural equation model estimators. Journal of Business Research, 69(10), 4565–4582. Falk, R. F., & Miller, N. B. (1992). A primer for soft modeling. Akron, OH: University of Akron Press. Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160. Festinger, L. (1957). A theory of cognitive dissonance. Stanford, CA: Stanford University Press. Fordellone, M., & Vichi, M. (2020). Finding groups in structural equation modeling through the partial least squares algorithm. Computational Statistics & Data Analysis, 147, 106957. Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18(1), 39–50. Fornell, C. G. (1982). A second generation of multivariate analysis: An overview. In C. G. Fornell (Ed.), A second generation of multivariate analysis (pp. 1–21). New York, NY: Praeger. Fornell, C. G. (1987). A second generation of multivariate analysis: Classification of methods and implications for marketing research. In M. J. Houston (Ed.), Review of marketing (pp. 407–450). Chicago: American Marketing Association. Fornell, C. G., & Bookstein, F. L. (1982). Two structural equation models: LISREL and PLS applied to consumer exit-voice theory. Journal of Marketing Research, 19(4), 440–452.

334   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Fornell, C. G., Johnson, M. D., Anderson, E. W., Cha, J., & Bryant, B. E. (1996). The American Customer Satisfaction Index: Nature, purpose, and findings. Journal of Marketing, 60(4), 7–18. Franke, G., & Sarstedt, M. (2019). Heuristics versus statistics in discriminant validity testing: A comparison of four procedures. Internet Research, 29(3), 430–447. Fu, J.-R. (2006). VisualPLS—An enhanced GUI for LVPLS (PLS 1.8 PC) version 1.04 [computer software]. Retrieved from http://www2.kuas.edu.tw/prof/fred/vpls/ Fuchs, C., & Diamantopoulos, A. (2009). Using single-item measures for construct measurement in management research: Conceptual issues and application guidelines. Business Administration Review, 69(2), 195–210. Garson, G. D. (2016). Partial least squares: Regression and structural equation models. Asheboro, NC: Statistical Associates Publishers. Gefen, D., Rigdon, E. E., & Straub, D. W. (2011). Editor’s comment: An update and extension to SEM guidelines for administrative and social science research. MIS Quarterly, 35(2), iii–xiv. Geisser, S. (1974). A predictive approach to the random effects model. Biometrika, 61(1), 101–107. George, D., & Mallery, P. (2019). IBM SPSS Statistics 25 step by step: A simple guide and reference (15th ed.). New York, NY: Routledge. Geweke, J., & Meese, R. (1981). Estimating regression models of finite but unknown order. International Economic Review, 22(1), 55–70. Ghasemy, M., Jamil, H., & Gaskin, J. E. (2021). Have your cake and eat it too: PLSe2 = ML + PLS. Quality & Quantity, 55, 497–541. Ghasemy, M., Teeroovengadum, V., Becker, J.-M., & Ringle, C. M. (2020). This fast car can move faster: A review of PLS-SEM application in higher education research. Higher Education, 80, 1121–1152. Goodhue, D. L., Lewis, W., & Thompson, R. (2012). Does PLS have advantages for small sample size or non-normal data? MIS Quarterly, 36(3), 981–1001. Götz, O., Liehr-Gobbers, K., & Krafft, M. (2010). Evaluation of structural equation models using the partial least squares (PLS) approach. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 691–711). Berlin: Springer. Gregor, S. (2006). The nature of theory in information systems. MIS Quarterly, 30(3), 611–642. Grimm, M. S., & Wagner, R. (2020). The impact of missing values on PLS, ML and FIML model fit. Archives of Data Science, Series A, 6(1), 04. Gudergan, S. P., Ringle, C. M., Wende, S., & Will, A. (2008). Confirmatory tetrad analysis in PLS path modeling. Journal of Business Research, 61(12), 1238–1249.

References  335

Guttman, L. (1955). The determinacy of factor score matrices with implications for five other basic problems of common-factor theory. British Journal of Statistical Psychology, 8(2), 65–81. Haenlein, M., & Kaplan, A. M. (2004). A beginner’s guide to partial least squares analysis. Understanding Statistics, 3(4), 283–297. Hahn, C., Johnson, M. D., Herrmann, A., & Huber, F. (2002). Capturing customer heterogeneity using a finite mixture PLS approach. Schmalenbach Business Review, 54(3), 243–269. Hair, J. F. (2020). Next generation prediction metrics for composite-based PLS-SEM. Industrial Management & Data Systems, 121(1), 5–11. Hair, J. F., Binz Astrachan, C., Moisescu, O. I., Radomir, L., Sarstedt, M., Vaithilingam, S., & Ringle, C. M. (2020). Executing and interpreting applications of PLS-SEM: Updates for family business researchers. Journal of Family Business Strategy, forthcoming. Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis (8th ed.). London: Cengage Learning. Hair, J. F., Hollingsworth, C. L., Randolph, A. B., & Chong, A. Y. L. (2017). An updated and expanded assessment of PLS-SEM in information systems research. Industrial Management & Data Systems, 117(3), 442–458. Hair, J. F., Howard, M. C., & Nitzl, C. (2020). Assessing measurement model quality in PLS-SEM using confirmatory composite analysis. Journal of Business Research, 109, 101–110. Hair, J. F., Hult, G. T. M., Ringle, C. M., & Sarstedt, M. (2014). A primer on partial least squares structural equation modeling (PLS-SEM). Thousand Oaks, CA: Sage. Hair, J. F., Hult, G. T. M., Ringle, C. M., & Sarstedt, M. (2017). A primer on partial least squares structural equation modeling (PLS-SEM) (2nd ed.). Thousand Oaks, CA: Sage. Hair, J. F., Hult, G. T. M., Ringle, C. M., Sarstedt, M., Castillo Apraiz, J., Cepeda-Carrión, G., & Roldán, J. L. (2019). Manual de Partial Least Squares Structural Equation Modeling (PLS-SEM) (Segunda Edición). Barcelona: OmniaScience. Hair, J. F., Hult, G. T. M., Ringle, C. M., Sarstedt, M., Dank, N., & Ray, S. (2022). Partial least squares structural equation modeling (PLS-SEM) using R. Cham: Springer. Hair, J. F., Hult, G. T. M., Ringle, C. M., Sarstedt, M., Magno, F., Cassia, F., & Scafarto, F. (2020). Le Equazioni Strutturali Partial Least Squares: Introduzione alla PLS-SEM (Seconda Edizione). Milano: Franco Angeli. Hair, J. F., Hult, G. T. M., Ringle, C. M., Sarstedt, M., Richter, N. F., & Hauff, S. (2017). Partial Least Squares Strukturgleichungsmodellierung (PLS-SEM): Eine anwendungsorientierte Einführung. München: Vahlen. Hair, J. F., Hult, G. T. M., Ringle, C. M., Sarstedt, M., & Thiele, K. O. (2017). Mirror, mirror on the wall: A comparative evaluation of composite-based structural equation modeling methods. Journal of the Academy of Marketing Science, 45(5), 616–632.

336   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Hair, J. F., Matthews, L., Matthews, R., & Sarstedt, M. (2017). PLS-SEM or CB-SEM: Updated guidelines on which method to use. International Journal of Multivariate Data Analysis, 1(2), 107–123. Hair, J. F., Page, M. J., & Brunsveld, N. (2020). Essentials of business research methods (4th ed.). New York, NY: Routledge. Hair, J. F., Ringle, C. M., Gudergan, S. P., Fischer, A., Nitzl, C., & Menictas, C. (2019). Partial least squares structural equation modeling-based discrete choice modeling: An illustration in modeling retailer choice. Business Research, 12(1), 115–142. Hair, J. F., Ringle, C. M., & Sarstedt, M. (2011). PLS-SEM: Indeed a silver bullet. Journal of Marketing Theory and Practice, 19(2), 139–151. Hair, J. F., Ringle, C. M., & Sarstedt, M. (2012). Partial least squares: The better approach to structural equation modeling? Long Range Planning, 45(5–6), 312–319. Hair, J. F., Ringle, C. M., & Sarstedt, M. (2013). Partial least squares structural equation modeling: Rigorous applications, better results and higher acceptance. Long Range Planning, 46(1–2), 1–12. Hair, J. F., Risher, J. J., Sarstedt, M., & Ringle, C. M. (2019). When to use and how to report the results of PLS-SEM. European Business Review, 31(1), 2–24. Hair, J. F., & Sarstedt, M. (2019). Composites vs. factors: Implications for choosing the Right SEM method. Project Management Journal, 50(6), 1–6. Hair, J. F., & Sarstedt, M. (2021a). Data, measurement, and causal inferences in machine learning: Opportunities and challenges for marketing. Journal of Marketing Theory & Practice, 29(1), 65–77. Hair, J. F., & Sarstedt, M. (2021b). Explanation plus prediction – the logical focus of project management research. Project Management Journal, forthcoming. Hair, J. F., Sarstedt, M., Matthews, L., & Ringle, C. M. (2016). Identifying and treating unobserved heterogeneity with FIMIX-PLS: Part I - method. European Business Review, 28(1), 63–76. Hair, J. F., Sarstedt, M., Pieper, T., & Ringle, C. M. (2012). The use of partial least squares structural equation modeling in strategic management research: A review of past practices and recommendations for future applications. Long Range Planning, 45(5–6), 320–340. Hair, J. F., Sarstedt, M., & Ringle, C. M. (2019). Rethinking some of the rethinking of partial least squares. European Journal of Marketing, 53(4), 566–584. Hair, J. F., Sarstedt, M., Ringle, C. M., & Gudergan, S. P. (2018). Advanced issues in partial least squares structural equation modeling (PLS-SEM). Thousand Oaks, CA: Sage. Hair, J. F., Sarstedt, M., Ringle, C. M., & Mena, J. A. (2012). An assessment of the use of partial least squares structural equation modeling in marketing research. Journal of the Academy of Marketing Science, 40(3), 414–433.

References  337

Hanafi, M. (2007). PLS path modelling: Computation of latent variables with the estimation mode B. Computational Statistics, 22(2), 275–292. Hanafi, M., Dolce, P., & El Hadri, Z. (2021). Generalized properties for Hanafi–Wold’s procedure in partial least squares path modeling. Computational Statistics, 36, 603–614. Hayduk, L. A., & Littvay, L. (2012). Should researchers use single indicators, best indicators, or multiple indicators in structural equation models? BMC Medical Research Methodology, 12(159), 12–159. Hayes, A. F. (2015). An index and test of linear moderated mediation. Multivariate Behavioral Research, 50(1), 1–22. Hayes, A. F. (2018). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (2nd ed.). New York, NY: Guilford. Hayes, A. F., & Rockwood, N. J. (2020). Conditional process analysis: Concepts, computation, and advances in the modeling of the contingencies of mechanisms. American Behavioral Scientist, 64(1), 19–54. Helm, S., Eggert, A., & Garnefeld, I. (2010). Modelling the impact of corporate reputation on customer satisfaction and loyalty using PLS. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 515–534). Berlin: Springer. Henseler, J. (2010). On the convergence of the partial least squares path modeling algorithm. Computational Statistics, 25(1), 107–120. Henseler, J. (2017a). ADANCO 2.0.1 user manual. Kleve: Composite Modeling. Henseler, J. (2017b). Bridging design and behavioral research with variance-based structural equation modeling. Journal of Advertising, 46(1), 178–192. Henseler, J. (2020). Composite-based structural equation modeling: Analyzing latent and emergent variables. New York, NY: Guilford Press. Henseler, J., & Chin, W. W. (2010). A comparison of approaches for the analysis of interaction effects between latent variables using partial least squares path modeling. Structural Equation Modeling, 17(1), 82–109. Henseler, J., Dijkstra, T. K., Sarstedt, M., Ringle, C. M., Diamantopoulos, A., Straub, D. W., et al. (2014). Common beliefs and reality about partial least squares: Comments on Rönkkö & Evermann (2013). Organizational Research Methods, 17(1), 182–209. Henseler, J., & Fassott, G. (2010). Testing moderating effects in PLS path models: An illustration of available procedures. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 713–735). Berlin: Springer.

338   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Henseler, J., Hubona, G. S., & Ray, P. A. (2016). Using PLS path modeling in new technology research: Updated guidelines. Industrial Management & Data Systems, 116(1), 1–19. Henseler, J., Ringle, C. M., & Sarstedt, M. (2012). Using partial least squares path modeling in international advertising research: Basic concepts and recent issues. In S. Okazaki (Ed.), Handbook of research in international advertising (pp. 252–276). Cheltenham: Edward Elgar. Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43(1), 115–135. Henseler, J., Ringle, C. M., & Sarstedt, M. (2016). Testing measurement invariance of composites using partial least squares. International Marketing Review, 33(3), 405–431. Henseler, J., Ringle, C. M., & Sinkovics, R. R. (2009). The use of partial least squares path modeling in international marketing. In R. R. Sinkovics & P. N. Ghauri (Eds.), New challenges to international marketing (Advances in International Marketing, 20, pp. 277–319). Bingley: Emerald. Henseler, J., & Sarstedt, M. (2013). Goodness-of-fit indices for partial least squares path modeling. Computational Statistics, 28(2), 565–580. Henseler, J., & Schuberth, F. (2020). Using confirmatory composite analysis to assess emergent variables in business research. Journal of Business Research, 120, 147–156. Höck, C., Ringle, C. M., & Sarstedt, M. (2010). Management of multi-purpose stadiums: Importance and performance measurement of service interfaces. International Journal of Services Technology and Management, 14(2–3), 188–207. Homburg, C., & Giering, A. (2001). Personal characteristics as moderators of the relationship between customer satisfaction and loyalty—An empirical analysis. Psychology & Marketing, 18(1), 43–66. Hu, L.-T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3(4), 424–453. Hui, B. S., & Wold, H. (1982). Consistency and consistency at large of partial least squares estimates. In H. Wold & K. G. Jöreskog (Eds.), Systems under indirect observation, Part II (pp. 119–130). Amsterdam: North Holland. Hulland, J. (1999). Use of partial least squares (PLS) in strategic management research: A review of four recent studies. Strategic Management Journal, 20(2), 195–204. Hulland, J., Baumgartner, H., & Smith, K. M. (2018). Marketing survey research best practices: Evidence and recommendations from a review of JAMS articles. Journal of the Academy of Marketing Science, 46(1), 92–108. Hult, G. T. M., Hair, J. F., Dorian, P., Ringle, C. M., Sarstedt, M., & Pinkwart, A. (2018). Addressing endogeneity in marketing applications of partial least squares structural equation modeling. Journal of International Marketing, 26(3), 1–21.

References  339

Hult, G. T. M., Ketchen, D. J., Griffith, D. A., Finnegan, C. A., Gonzalez-Padron, T., Harmancioglu, N., Huang, Y., Talay, M. B., & Cavusgil, S. T. (2008). Data equivalence in cross-cultural international business research: Assessment and guidelines. Journal of International Business Studies, 39(6), 1027–1044. Hwang, H., Sarstedt, M., Cheah, J.-H., & Ringle, C. M. (2020). A concept analysis of methodological research on composite-based structural equation modeling: Bridging PLSPM and GSCA. Behaviormetrika, 47(1), 219–241. Hwang, H., & Takane, Y. (2004). Generalized structured component analysis. Psychometrika, 69(1), 81–99. Jarvis, C. B., MacKenzie, S. B., & Podsakoff, P. M. (2003). A critical review of construct indicators and measurement model misspecification in marketing and consumer research. Journal of Consumer Research, 30(2), 199–218. JCGM/WG1. (2008). Joint Committee for Guides in Metrology/Working Group on the Expression of Uncertainty in Measurement (JCGM/WG1): Evaluation of measurement data—Guide to the expression of uncertainty in measurement. Retrieved from https:// www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf Jones, M. A., Mothersbaugh, D. L., & Beatty, S. E. (2000). Switching barriers and repurchase intentions in services. Journal of Retailing, 76(2), 259–274. Jöreskog, K. G. (1973). A general method for estimating a linear structural equation system. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences (pp. 255–284). New York, NY: Seminar Press. Jöreskog, K. G., & Wold, H. (1982). The ML and PLS techniques for modeling with latent variables: Historical and comparative aspects. In K. G. Jöreskog & H. Wold (Eds.), Systems under indirect observation, Part I (pp. 263–270). Amsterdam: North Holland. Kamakura, W. A. (2015). Measure twice and cut once: The carpenter’s rule still applies. Marketing Letters, 26(3), 237–243. Kaufmann, L., & Gaeckler, J. (2015). A structured review of partial least squares in supply chain management research. Journal of Purchasing and Supply Management, 21(4), 259–272. Keil, M., Saarinen, T., Tan, B. C. Y., Tuunainen, V., Wassenaar, A., & Wei, K.-K. (2000). A cross-cultural study on escalation of commitment behavior in software projects. MIS Quarterly, 24(2), 299–325. Kenny, D. A. (2018). Moderation. Retrieved from http://davidakenny.net/cm/moderation .htm Khan, G., Sarstedt, M., Shiau, W.-L., Hair, J. F., Ringle, C. M., & Fritze, M. (2019). Methodological research on partial least squares structural equation modeling (PLS-SEM): A social network analysis. Internet Research, 29(3), 407–429.

340   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Kim, G., Shin, B., & Grover, V. (2010). Research note: Investigating two contradictory views of formative measurement in information systems research. MIS Quarterly, 34(2), 345–365. Klarmann, M., & Feurer, S. (2018). Control variables in marketing research. Marketing ZFP – Journal of Research and Management, 40(2), 26–40. Klesel, M., Schuberth, F., Henseler, J., & Niehaves, B. (2019). A test for multigroup comparison using partial least squares path modeling. Internet Research, 29(3), 464–477. Klesel, M., Schuberth, F., Niehaves, B., & Henseler, J. (2020). Multigroup analysis in Information Systems research using PLS-PM: A systematic investigation of approaches. Working Paper. Kock, N. (2020). WarpPLS 7.0 user manual. Laredo, TX: ScriptWarp Systems. Kock, N., & Hadaya, P. (2018). Minimum sample size estimation in PLS-SEM: The inverse square root and gamma-exponential methods. Information Systems Journal, 28(1), 227–261. Kocyigit, O., & Ringle, C. M. (2011). The impact of brand confusion on sustainable brand satisfaction and private label proneness: A subtle decay of brand equity. Journal of Brand Management, 19(3), 195–212. Kristensen, K., Martensen, A., & Grønholdt, L. (2000). Customer satisfaction measurement at Post Denmark: Results of application of the European Customer Satisfaction Index Methodology. Total Quality Management, 11(7), 1007–1015. Latan, H., & Noonan, R. (Eds.). (2017). Partial least squares structural equation modeling: Basic concepts, methodological issues and applications. Cham: Springer. Lee, L., Petter, S., Fayard, D., & Robinson, S. (2011). On the use of partial least squares path modeling in accounting research. International Journal of Accounting Information Systems, 12(4), 305–328. Liengaard, B., Sharma, P. N., Hult, G. T. M., Jensen, M. B., Sarstedt, M., Hair, J. F., & Ringle, C. M. (2021). Prediction: Coveted, yet forsaken? Introducing a cross-validated predictive ability test in partial least squares path modeling. Decision Sciences, 52(2), 362–392. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. New York, NY: Wiley. Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered powered and product terms: Implications for modeling interaction terms among latent variables. Structural Equation Modeling, 13(4), 497–519. Lohmöller, J.-B. (1987). LVPLS 1.8 [computer software]. Cologne: Zentralarchiv für Empirische Sozialforschung. Lohmöller, J.-B. (1989). Latent variable path modeling with partial least squares. Heidelberg: Physica.

References  341

Loo, R. (2002). A caveat on using single-item versus multiple-item scales. Journal of Managerial Psychology, 17(1), 68–75. MacKenzie, S. B., Podsakoff, P. M., & Podsakoff, N. P. (2011). Construct measurement and validation procedures in MIS and behavioral research: Integrating new and existing techniques. MIS Quarterly, 35(2), 293–334. MacKinnon, D. P., Krull, J. L., & Lockwood, C. M. (2000). Equivalence of the mediation, confounding and suppression effect. Prevention Science, 1(4), 173–181. Manley, S. C., Hair, J. F., Williams, R. I., & McDowell, W. C. (2020). Essential new PLSSEM analysis methods for your entrepreneurship analytical toolbox. International Entrepreneurship and Management Journal, forthcoming. Marcoulides, G. A., & Chin, W. W. (2013). You write but others read: Common methodological misunderstandings in PLS and related methods. In H. Abdi, W. W. Chin, V. Esposito Vinzi, G. Russolillo, & Trinchera (Eds.), New perspectives in partial least squares and related methods (pp. 31–64). New York, NY: Springer. Marcoulides, G. A., & Saunders, C. (2006). PLS: A silver bullet? MIS Quarterly, 30(2), iii–ix. Mason, C. H., & Perreault, W. D. (1991). Collinearity, power, and interpretation of multiple regression analysis. Journal of Marketing Research, 28(3), 268–280. Mateos-Aparicio, G. (2011). Partial least squares (PLS) methods: Origins, evolution, and application to social sciences. Communications in Statistics–Theory and Methods, 40(13), 2305–2317. Matthews, L. (2017). Applying multigroup analysis in PLS-SEM: A step-by-step process. In H. Latan & R. Noonan (Eds.), Partial least squares path modeling: Basic concepts, methodological issues and applications (pp. 219–243). Cham: Springer. Matthews, L., Sarstedt, M., Hair, J. F., & Ringle, C. M. (2016). Identifying and treating unobserved heterogeneity with FIMIX-PLS: Part II – A case study. European Business Review, 28(2), 208–224. McDonald, R. P. (1996). Path analysis with composite variables. Multivariate Behavioral Research, 31(2), 239–270. Memon, M. A., Cheah, J.-H., Ramayah, T., Ting, H., & Chuah, F. (2018). Mediation analysis: Issues and recommendations. Journal of Applied Structural Equation Modeling, 2(1), i–ix. Memon, M. A., Cheah, J.-H., Ramayah, T., Ting, H., Chuah, F., & Cham, T. H. (2019). Moderation analysis: Issues and guidelines. Journal of Applied Structural Equation Modeling, 3(1), i–ix. Monecke, A., & Leisch, F. (2012). semPLS: Structural equation modeling using partial least squares. Journal of Statistical Software, 48(3), 1–32.

342   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Monecke, A., & Leisch, F. (2013). R package semPLS: Structural equation modeling using partial least squares version 1.0-10 [computer software]. Retrieved from https:// cran.r-project.org/web/packages/semPLS/ Muller, D., Judd, C. M., & Yzerbyt, V. Y. (2005). When moderation is mediated and mediation is moderated. Journal of Personality and Social Psychology, 89(6), 852–863. Ng, S. I., Lim, Q. H., Cheah, J.-H., Ho, J. A., & Tee, K. K. (2020). A moderated-mediation model of career adaptability and life satisfaction among working adults in Malaysia. Current Psychology, forthcoming. Nitzl, C. (2016). The use of partial least squares structural equation modelling (PLSSEM) in management accounting research: Directions for future theory development. Journal of Accounting Literature, 37(December), 19–35. Nitzl, C., & Chin, W. W. (2017). The case of partial least squares (PLS) path modeling in managerial accounting research. Journal of Management Control, 28(2), 137–156. Nitzl, C., Roldán, J. L., & Cepeda-Carrión, G. (2016). Mediation analyses in partial least squares structural equation modeling: Helping researchers discuss more sophisticated models. Industrial Management & Data Systems, 116(9), 1849–1864. Nunnally, J. C., & Bernstein, I. (1994). Psychometric theory. New York, NY: McGraw-Hill. Oliver, R. L. (1980). A cognitive model for the antecedents and consequences of satisfaction. Journal of Marketing Research, 17(4), 460–469. Papies, D., Ebbes, P., & van Heerde, H. J. (2016). Addressing endogeneity in marketing models. In P. S. H. Leeflang, J. E. Wieringa, T. H. A. Bijmolt, & K. H. Pauwels (Eds.), Advanced methods in modeling markets (pp. 581–627). Cham: Springer. Park, S., & Gupta, S. (2012). Handling endogenous regressors by joint estimation using copulas. Marketing Science, 31(4), 567–586. Patel, V. K., Manley, S. C., Hair, J. F., Ferrell, O. C., & Pieper, T. M. (2016). Is stakeholder theory relevant for European firms? European Management Journal, 36(6), 650–660. Peng, D. X., & Lai, F. (2012). Using partial least squares in operations management research: A practical guideline and summary of past research. Journal of Operations Management, 30(6), 467–480. Petter, S. (2018). “Haters gonna hate”: PLS and information systems research. ACM SIGMIS Database: The DATABASE for Advances in Information Systems, 49(2), 10–13. Petter, S., Straub, D., & Rai, A. (2007). Specifying formative constructs in information systems research. MIS Quarterly, 31(4), 623–656. Pontius, R. G., Thontteh, O., & Chen, H. (2008). Components of information for multiple resolution comparison between maps that share a real variable. Environmental and Ecological Statistics, 15(2), 111–142. Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, & Computers, 36(4), 717–731.

References  343

Preacher, K. J., & Hayes, A. F. (2008a). Asymptotic and resampling strategies for assessing and comparing indirect effects in simple and multiple mediator models. Behavior Research Methods, 40(3), 879–891. Preacher, K. J., & Hayes, A. F. (2008b). Contemporary approaches to assessing mediation in communication research. In A. F. Hayes, D. Slater, & L. B. Snyder (Eds.), The SAGE sourcebook of advanced data analysis methods for communication research (pp. 13–54). Thousand Oaks, CA: Sage. Preacher, K. J., Rucker, D. D., & Hayes, A. F. (2007). Assessing moderated mediation hypotheses: Theory, methods, and prescriptions. Multivariate Behavioral Research, 42(1), 185–227. Rademaker, M. E., Schuberth, F., Schamberger, T., Klesel, M., Dijkstra, T. K., & Henseler, J. (2020). R package cSEM: Composite-based structural equation modeling version 0.3.0 [computer software]. Retrieved from: https://cran.r-project.org/web/ packages/cSEM/ Radomir, L., & Moisescu, O. I. (2019). Discriminant validity of the customer-based corporate reputation scale: Some causes for concern. Journal of Product & Brand Management, 29(4), 457–469. Radomir, L., & Wilson, A. (2018). Corporate reputation: The importance of service quality and relationship investment. In N. Avkiran, & C. M. Ringle (Eds.), Partial least squares structural equation modeling (International Series in Operations Research & Management Science, Vol. 267, pp. 77–123). Cham: Springer. Raithel, S., Sarstedt, M., Scharf, S. M., & Schwaiger, M. (2012). On the value relevance of customer satisfaction: Multiple drivers and multiple markets. Journal of the Academy of Marketing Science, 40(5), 509–525. Raithel, S., & Schwaiger, M. (2015). The effects of corporate reputation perceptions of the general public on shareholder value. Strategic Management Journal, 36(6), 945–956. Raithel, S., Wilczynski, P., Schloderer, M. P., & Schwaiger, M. (2010). The valuerelevance of corporate reputation during the financial crisis. Journal of Product and Brand Management, 19(6), 389–400. Ramayah, T., Cheah, J.-H., Chuah, F., Ting, H., & Memon, M. A. (2016). Partial least squares structural equation modeling (PLS-SEM) using SmartPLS 3.0: An updated and practical guide to statistical analysis. Kuala Lumpur: Pearson Malaysia. Ramirez, E., David, M. E., & Brusco, M. J. (2013). Marketing’s SEM based nomological network: Constructs and research streams in 1987–1997 and in 1998–2008. Journal of Business Research, 66(9), 1255–1260. Ray, S., Danks, N. P., Velasquez Estrada, J. M., Uanhoro, J., & Bejar, A. H. C. (2020). R package seminr: Domain-specific language for building and estimating structural equation models version 1.1.0 [computer software]. Retrieved from: https://cran.rproject.org/web/packages/seminr/

344   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Reinartz, W., Haenlein, M., & Henseler, J. (2009). An empirical comparison of the efficacy of covariance-based and variance-based SEM. International Journal of Research in Marketing, 26(4), 332–344. Rhemtulla, M., van Bork, R., & Borsboom, D. (2020). Worse than measurement error: Consequences of inappropriate latent variable measurement models. Psychological Methods, 25(1), 30–45. Richter, N. F., Cepeda Carrión, G., Roldán, J. L., & Ringle, C. M. (2016). European management research using partial least squares structural equation modeling (PLSSEM): Editorial. European Management Journal, 34(6), 589–597. Richter, N. F., Schubring, S., Hauff, S., Ringle, C. M., & Sarstedt, M. (2020). When predictors of outcomes are necessary: Guidelines for the combined use of PLS-SEM and NCA. Industrial Management & Data Systems, 120(12), 2243–2267. Richter, N. F., Sinkovics, R. R., Ringle, C. M., & Schlägel, C. (2016). A critical look at the use of SEM in International Business Research. International Marketing Review, 33(3), 376–404. Rigdon, E. E. (2012). Rethinking partial least squares path modeling: In praise of simple methods. Long Range Planning, 45(5–6), 341–358. Rigdon, E. E. (2013). Partial least squares path modeling. In G. R. Hancock & R. D. Mueller (Eds.), Structural equation modeling: A second course (2nd ed., pp. 81–116). Charlotte, NC: Information Age. Rigdon, E. E. (2014a). Comment on “Improper use of endogenous formative variables.” Journal of Business Research, 67(1), 2800–2802. Rigdon, E. E. (2014b). Rethinking partial least squares path modeling: Breaking chains and forging ahead. Long Range Planning, 47(3), 161–167. Rigdon, E. E. (2016). Choosing PLS path modeling as analytical method in European management research: A realist perspective. European Management Journal, 34(6), 598–605. Rigdon, E. E., Becker, J.-M., & Sarstedt, M. (2019a). Factor indeterminacy as metrological uncertainty: Implications for advancing psychological measurement. Multivariate Behavioral Research, 54(3), 429–443. Rigdon, E. E., Becker, J.-M., & Sarstedt, M. (2019b). Parceling cannot reduce factor indeterminacy in factor analysis: A research note. Psychometrika, 84(3), 772–780. Rigdon, E. E., Becker, J.-M., Rai, A., Ringle, C. M., Diamantopoulos, A., Karahanna, E., Straub, D., & Dijkstra, T. K. (2014). Conflating antecedents and formative indicators: A comment on Aguirre-Urreta and Marakas. Information Systems Research, 25(4), 780–784. Rigdon, E. E., Ringle, C. M., & Sarstedt, M. (2010). Structural modeling of heterogeneous data with partial least squares. In N. K. Malhotra (Ed.), Review of marketing research (pp. 255–296). Armonk, NY: Sharpe.

References  345

Rigdon, E. E., Ringle, C. M., Sarstedt, M., & Gudergan, S. P. (2011). Assessing heterogeneity in customer satisfaction studies: Across industry similarities and within industry differences. In M. Sarstedt, M. Schwaiger, & C. R. Taylor (Eds.), Measurement and research methods in international marketing (Advances in International Marketing, 22, pp. 169–194). Bingley: Emerald Rigdon, E. E., Sarstedt, M., & Becker, J.-M. (2020). Quantify uncertainty in behavioral research. Nature Human Behaviour, 4(4), 329–331. Rigdon, E. E., Sarstedt, M., & Ringle, C. M. (2017). On comparing results from CB-SEM and PLS-SEM: Five perspectives and five recommendations. Marketing ZFP, 39(3), 4–16. Ringle, C. M., & Sarstedt, M. (2016). Gain more insight from your PLS-SEM results: The importance-performance map analysis. Industrial Management & Data Systems, 116(9), 1865–1886. Ringle, C. M., Sarstedt, M., Mitchell, R., & Gudergan, S. P (2020). Partial least squares structural equation modeling in HRM research. International Journal of Human Resource Management, 31(12), 1617–1643. Ringle, C. M., Sarstedt, M., & Mooi, E. A. (2010). Response-based segmentation using finite mixture partial least squares: Theoretical foundations and an application to American customer satisfaction index data. Annals of Information Systems, 8, 19–49. Ringle, C. M., Sarstedt, M., & Schlittgen, R. (2014). Genetic algorithm segmentation in partial least squares structural equation modeling. OR Spectrum, 36(1), 251–276. Ringle, C. M., Sarstedt, M., Schlittgen, R., & Taylor, C. R. (2013). PLS path modeling and evolutionary segmentation. Journal of Business Research, 66(9), 1318–1324. Ringle, C. M., Sarstedt, M., & Straub, D. W. (2012). A critical look at the use of PLS-SEM in MIS Quarterly. MIS Quarterly, 36(1), iii–xiv. Ringle, C. M., Sarstedt, M., & Zimmermann, L. (2011). Customer satisfaction with commercial airlines: The role of perceived safety and purpose of travel. Journal of Marketing Theory and Practice, 19(4), 459–472. Ringle, C. M., Wende, S., & Becker, J.-M. (2015). SmartPLS 3 [computer software]. Bönningstedt: SmartPLS. Retrieved from https://www.smartpls.com Ringle, C. M., Wende, S., & Will, A. (2005). SmartPLS 2 [computer software]. Bönningstedt: SmartPLS. Retrieved from https://www.smartpls.com Ringle, C. M., Wende, S., & Will, A. (2010). Finite mixture partial least squares analysis: Methodology and numerical examples. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 195–218). Berlin: Springer. Roldán, J. L., & Sánchez-Franco, M. J. (2012). Variance-based structural equation modeling: Guidelines for using partial least squares in information systems research. In M. Mora, O. Gelman, A. L. Steenkamp, & M. Raisinghani (Eds.), Research methodologies,

346   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) innovations and philosophies in software systems engineering and information systems (pp. 193–221). Hershey, PA: IGI Global. Rönkkö, M., & Evermann, J. (2013). A critical examination of common beliefs about partial least squares path modeling. Organizational Research Methods, 16(3), 425–448. Rönkkö. M., McIntosh, C. N., & Antonakis, J. (2015). On the adoption of partial least squares in psychological research: Caveat emptor. Personality and Individual Differences, 87, 76–84. Rönkkö, M., McIntosh, C. N., Antonakis, J., & Edwards, J. R. (2016). Partial least squares path modeling: Time for some serious thoughts. Journal of Operations Management, 47–48, 9–27. Rossiter, J. R. (2002). The C-OAR-SE procedure for scale development in marketing. International Journal of Research in Marketing, 19(4), 305–335. Rossiter, J. R. (2011). Measurement for the social sciences: The C-OAR-SE method and why it must replace psychometrics. Berlin: Springer. Rossiter, J. R. (2016). How to use C-OAR-SE to design optimal standard measures. European Journal of Marketing, 50(11), 1924–1941. Russo, D., & Stol, K.-J. (2021). PLS-SEM for software engineering research: An introduction and survey. ACM Computing Surveys, 54(4). Salzberger, T., Sarstedt, M., & Diamantopoulos, A. (2016). Measurement in the social sciences: Where C-OAR-SE delivers and where it does not. European Journal of Marketing, 50(11), 1942–1952. Sarstedt, M. (2008). A review of recent approaches for capturing heterogeneity in partial least squares path modelling. Journal of Modelling in Management, 3(2), 140–161. Sarstedt, M., Becker, J.-M., Ringle, C. M., & Schwaiger, M. (2011). Uncovering and treating unobserved heterogeneity with FIMIX-PLS: Which model selection criterion provides an appropriate number of segments? Schmalenbach Business Review, 63(1), 34–62. Sarstedt, M., Bengart, P., Shaltoni, A. M., & Lehmann, S. (2018). The use of sampling methods in advertising research: A gap between theory and practice. International Journal of Advertising, 37(4), 650–663. Sarstedt, M., & Cheah, J.H. (2019). Partial least squares structural equation modeling using SmartPLS: A software review. Journal of Marketing Analytics, 7(3), 196–202. Sarstedt, M., Diamantopoulos, A., Salzberger, T., & Baumgartner, P. (2016). Selecting single items to measure doubly-concrete constructs: A cautionary tale. Journal of Business Research, 69(8), 3159–3167. Sarstedt, M., Hair, J. F., Cheah, J.-H., Becker, J.-M., & Ringle, C. M. (2019). How to specify, estimate, and validate higher-order models. Australasian Marketing Journal, 27(3), 197–211.

References  347

Sarstedt, M., Hair, J. F., Nitzl, C., Ringle, C. M., & Howard, M. C. (2020). Beyond a tandem analysis of SEM and PROCESS: Use PLS-SEM for mediation analyses! International Journal of Market Research, 62(3), 288–299. Sarstedt, M., Hair, J. F., Ringle, C. M., Thiele, K. O., & Gudergan, S. P. (2016). Estimation issues with PLS and CBSEM: Where the bias lies! Journal of Business Research, 69(10), 3998–4010. Sarstedt, M., Henseler, J., & Ringle, C. M. (2011). Multigroup analysis in partial least squares (PLS) path modeling: Alternative methods and empirical results. In M. Sarstedt, M. Schwaiger, & C. R. Taylor (Eds.), Measurement and research methods in international marketing (Advances in International Marketing, 22, pp. 195–218). Bingley: Emerald. Sarstedt, M., & Mooi, E. A. (2019). A concise guide to market research: The process, data, and methods using IBM SPSS statistics (3rd ed.). Berlin: Springer. Sarstedt, M., & Ringle, C. M. (2010). Treating unobserved heterogeneity in PLS path modelling: A comparison of FIMIX-PLS with different data analysis strategies. Journal of Applied Statistics, 37(8), 1299–1318. Sarstedt, M., Ringle, C. M., Cheah, J.-H., Ting, H., Moisescu, O. I., & Radomir, L. (2020). Structural model robustness checks in PLS-SEM. Tourism Economics, 26(4), 531–554. Sarstedt, M., Ringle, C. M., & Gudergan, S. P. (2016). Guidelines for treating unobserved heterogeneity in tourism research: A comment on Marques and Reis (2015). Annals of Tourism Research, 57(March), 279–284. Sarstedt, M., Ringle, C. M., & Hair, J. F. (2017a). Partial least squares structural equation modeling. In C. Homburg, M. Klarmann, & A. Vomberg (Eds.), Handbook of Market Research. Cham: Springer. Sarstedt, M., Ringle, C. M., & Hair, J. F. (2017b). Treating unobserved heterogeneity in PLS-SEM: A multi-method approach. In R. Noonan & H. Latan (Eds.), Partial least squares structural equation modeling: Basic concepts, methodological issues and applications (pp. 197–217). Cham: Springer. Sarstedt, M., Ringle, C. M., Henseler, J., & Hair, J. F. (2014). On the emancipation of PLS-SEM: A commentary on Rigdon (2012). Long Range Planning, 47(3), 154–160. Sarstedt, M., Ringle, C. M., Smith, D., Reams, R., & Hair, J. F. (2014). Partial least squares structural equation modeling (PLS-SEM): A useful tool for family business researchers. Journal of Family Business Strategy, 5(1), 105–115. Sarstedt, M., & Schloderer, M. P. (2010). Developing a measurement approach for reputation of non-profit organizations. International Journal of Nonprofit & Voluntary Sector Marketing, 15(3), 276–299. Sarstedt, M., & Wilczynski, P. (2009). More for less? A comparison of single-item and multi-item measures. Business Administration Review, 69(2), 211–227.

348   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Sarstedt, M., Wilczynski, P., & Melewar, T. (2013). Measuring reputation in global markets: A comparison of reputation measures’ convergent and criterion validities. Journal of World Business, 48(3), 329–339. Sattler, H., Völckner, F., Riediger, C., & Ringle, C. (2010). The impact of brand extension success factors on brand extension price premium. International Journal of Research in Marketing, 27(4), 319–328. Schafer, J. L., & Graham, J. L. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. Schlittgen, R. (2011). A weighted least-squares approach to clusterwise regression. Advances in Statistical Analysis, 95(2), 205–217. Schlittgen, R., Ringle, C. M., Sarstedt, M., & Becker, J.-M. (2016). Segmentation of PLS path models by iterative reweighted regressions. Journal of Business Research, 69(10), 4583–4592. Schlittgen, R., Sarstedt, M., & Ringle, C. M. (2020). Data generation for composite-based structural equation modeling methods. Advances in Data Analysis and Classification, 14, 747–757. Schloderer, M. P., Sarstedt, M., & Ringle, C. M. (2014). The relevance of reputation in the nonprofit sector: The moderating effect of socio-demographic characteristics. International Journal of Nonprofit & Voluntary Sector Marketing, 19(2), 110–126. Schneeweiß, H. (1991). Models with latent variables: LISREL versus PLS. Statistica Neerlandica, 45(2), 145–157. Schuberth, F., Henseler, J., & Dijkstra, T. K. (2018). Confirmatory composite analysis. Frontiers in Psychology, 9, 2541. Schubring, S., Lorscheid, I., Meyer, M., & Ringle, C. M. (2016). The PLS agent: Predictive modeling with PLS-SEM and agent-based simulation. Journal of Business Research, 69(10), 4604–4612. Schwaiger, M. (2004). Components and parameters of corporate reputation: An empirical study. Schmalenbach Business Review, 56(1), 46–71. Schwaiger, M., Raithel, S., & Schloderer, M. P. (2009). Recognition or rejection: How a company’s reputation influences stakeholder behavior. In J. Klewes & R. Wreschniok (Eds.), Reputation capital: Building and maintaining trust in the 21st century (pp. 39–55). Berlin: Springer. Schwaiger, M., Sarstedt, M., & Taylor, C. R. (2010). Art for the sake of the corporation: Audi, BMW Group, DaimlerChrysler, Montblanc, Siemens, and Volkswagen help explore the effect of sponsorship on corporate reputations. Journal of Advertising Research, 50(1), 77–90. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

References  349

Sharma, P. N., Sarstedt, M., Shmueli, G., Kim, K. H., & Thiele, K. O. (2019). PLS-based model selection: The role of alternative explanations in information systems research. Journal of the Association for Information Systems, 40(4), 346–397. Sharma, P. N., Shmueli, G., Sarstedt, M., Danks, N., & Ray, S. (2021). Predictionoriented model selection in partial least squares path modeling. Decision Sciences, 52(3), 567–607. Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. Shmueli, G., & Koppius, O. R. (2011). Predictive analytics in information systems research. MIS Quarterly, 35(3), 553–572. Shmueli, G., Ray, S., Velasquez Estrada, J. M., & Chatla, S. B. (2016). The elephant in the room: Evaluating the predictive performance of PLS models. Journal of Business Research, 69(10), 4552–4564. Shmueli, G., Sarstedt, M., Hair, J. F., Cheah, J.-H., Ting, H., & Ringle, C. M. (2019). Predictive model assessment in PLS-SEM: Guidelines for using PLSpredict. European Journal of Marketing, 53(11), 2322–2347. Slack, N. (1994). The importance-performance matrix as a determinant of improvement priority. International Journal of Operations and Production Management, 14(5), 59–75. Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13, 290–312. Spector, P. E., & Brannick, M. T. (2011). Methodological urban legends: The misuse of statistical control variables. Organizational Research Methods, 14(2), 287–305. Squillacciotti, S. (2010). Prediction-oriented classification in PLS path modeling. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications in marketing and related fields (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 219–233). Berlin: Springer. Steenkamp, J. B. E. M., & Baumgartner, H. (1998). Assessing measurement invariance in cross national consumer research. Journal of Consumer Research, 25(1), 78–107. Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, 36(2), 111–147. Streukens, S., & Leroi-Werelds, S. (2016). Bootstrapping and PLS-SEM: A step-by-step guide to get more out of your bootstrapping results. European Management Journal, 34(6), 618–632. Svensson, G., Ferro, C., Hogevold, N., Padin, C., Varela, J. C. S., & Sarstedt, M. (2018). Framing the triple bottom line approach: Direct and mediation effects between economic, social, and environmental elements. Journal of Cleaner Production, 197(1), 972–991.

350   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Temme, D., Kreis, H., & Hildebrandt, L. (2010). A comparison of current PLS path modeling software: Features, ease-of-use, and performance. In V. Esposito Vinzi, W. W. Chin, J. Henseler, & H. Wang (Eds.), Handbook of partial least squares: Concepts, methods and applications (Springer Handbooks of Computational Statistics Series, Vol. II, pp. 737–756). Berlin: Springer. Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76(2), 257–284. Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y.-M., & Lauro, C. (2005). PLS path modeling. Computational Statistics & Data Analysis, 48(1), 159–205. Usakli, A., & Kucukergin, K. G. (2018). Using partial least squares structural equation modeling in hospitality and tourism: Do researchers follow practical guidelines? International Journal of Contemporary Hospitality Management, 30(11), 3462–3512. Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–70. Vanwinckelen, G., & Blockeel, H. (2012). On estimating model accuracy with repeated cross-validation. In BeneLearn 2012: 21st Belgian-Dutch conference on machine learning (pp. 39–44). Ghent. Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2003). User acceptance of information technology: Toward a unified view. MIS Quarterly, 27(3), 425–478. Völckner, F., Sattler, H., Hennig-Thurau, T., & Ringle, C. M. (2010). The role of parent brand quality for service brand extension success. Journal of Service Research, 13(4), 359–361. Walsh, G., Mitchell, V.-W., Jackson, P. R., & Beatty, E. (2009). Examining the antecedents and consequences of corporate reputation: A customer perspective. British Journal of Management, 20(2), 187–203. Wanous, J. P., Reichers, A., & Hudy, M. J. (1997). Overall job satisfaction: How good are single-item measures? Journal of Applied Psychology, 82(2), 247–252. Weijters, B., & Baumgartner, H. (2012). Misresponse to reversed and negated items in surveys: A review. Journal of Marketing Research, 49(5), 737–747. Wetzels, M., Odekerken-Schröder, G., & van Oppen, C. (2009). Using PLS path modeling for assessing hierarchical construct models: Guidelines and empirical illustration. MIS Quarterly, 33(1), 177–195. Wilden, R., & Gudergan, S. (2015). The impact of dynamic capabilities on operational marketing and technological capabilities: Investigating the role of environmental turbulence. Journal of the Academy of Marketing Science, 43(2), 181–199. Willaby, H., Costa, D., Burns, B., MacCann, C., & Roberts, R. (2015). Testing complex models with small sample sizes: A historical overview and empirical demonstration of what partial least squares (PLS) can offer differential psychology. Personality and Individual Differences, 84, 73–78.

References  351

Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79–82. Witten, I. H., Frank, E., Hall, M. A., & Pal, C. (2016). Data mining: practical machine learning tools and technique (4th ed.). Burlington, MA: Morgan Kaufmann. Wold, H. (1975). Path models with latent variables: The NIPALS approach. In H. M. Blalock, A. Aganbegian, F. M. Borodkin, R. Boudon, & V. Capecchi (Eds.), Quantitative sociology: International perspectives on mathematical and statistical modeling (pp. 307– 357). New York, NY: Academic Press. Wold, H. (1982). Soft modeling: The basic design and some extensions. In H. Wold & K. G. Jöreskog (Eds.), Systems under indirect observations: Part II (pp. 1–54). Amsterdam: North Holland. Wold, H. (1985). Partial least squares. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of statistical sciences (pp. 581–591). New York, NY: Wiley. Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109–130. Wong, K. K.-K. (2019). Mastering partial least squares structural equation modeling (PLS-SEM) with SmartPLS in 38 hours. Bloomington, IN: iUniverse. Yuan, K.-H., Wen, Y., & Tang, J. (2020). Regression analysis with latent variables by partial least squares and four other composite scores: Consistency, bias and correction. Structural Equation Modeling: A Multidisciplinary Journal, 27(3), 333–350. Yun, L., Kim, K., & Cheong, Y. (2020). Sports sponsorship and the risks of ambush marketing: the moderating role of corporate reputation in the effects of disclosure of ambush marketers on attitudes and beliefs towards corporations. International Journal of Advertising, 39(7), 921–942. Zarantonella, L., & Pauwels-Delassus, V. (2015). The handbook of brand management scales. London: Taylor & Francis. Zeng, N., Liu, Y., Gong, P., Hertogh, M., & König, M. (2021). Do right PLS and do PLS right: A critical review of the application on PLS in construction management reserarch. Frontiers of Engineering Management, forthcoming. Zhang, Y., & Schwaiger, M. (2012). A comparative study of corporate reputation between China and developed Western countries. In S. Okazaki (Ed.), Handbook of research on international advertising (pp. 353–375). Cheltenham: Edward Elgar. Zhao, X., Lynch, J. G., & Chen, Q. (2010). Reconsidering Baron and Kenny: Myths and truths about mediation analysis. Journal of Consumer Research, 37(2), 197–206.

INDEX Absolute contribution, 151 Aguinis, H., 50, 254 Aguirre-Urreta, M. I., 53, 158 Akaike weights, 206–208 Algorithmic options, 94–96 Alternating extreme pole data responses, 64 Anderson, R. E., 50, 62, 66 Articles, 4–5 Artifacts, 53 Attractiveness (ATTR), 50–51, 51 (exhibit), 161 (exhibit) Average variance extracted (AVE), 117, 119, 141 Babin, B. J., 50, 62, 66 Bandwidth-fidelity dilemma, 278 Baron, R. M., 234 Bassellier, G., 150 Baumgartner, H., 29 Bayesian information criterion (BIC), 205 Beaty, J. C., 254 Becker, J.-M., 17, 91, 280, 291, 293 Bentler, P. M., 86, 297 Berneth, J. B., 50 Bias-corrected and accelerated (BCa) bootstrap confidence intervals, 158 Binary coded data, 28 Binary (dummy) variables, 48–49 Binz Astrachan, C., 280 Black, W. C., 50, 62, 66 Boik, R. J., 254 Bollen, K. A., 282 Bootstrap cases, 153 Bootstrapping, 123, 133 outer weights, results for, 179 (exhibit) p values in modeling window, 178 (exhibit) rules of, 159 (exhibit)

352

Bootstrapping confidence interval formative measurement models, 153–159, 155 (exhibit) reflective measurement models, 123–124 Bootstrap samples, 153 Bovaird, J. A., 249 Cascaded moderator analysis, 248 Casewise deletion, 62 Categorical control variable, 48–49, 49 (exhibit) Categorical moderator variable, 245 Categorical scale, 8 Causal indicators, 53–54, 92 Causal links, 43 Causal-predictive paradigm, 22, 43 CB-SEM. See Covariance-based structural equation modeling (CB-SEM) Cenfetelli, R. T., 150 Centroid weighting scheme, 90 Cepeda-Carrión, G., 232 Chatelin, Y.-M., 90 Chatla, S. B., 188 Cheah, J.-H., 73, 90, 91, 144, 259 Chen, Q., 234 Chin, W. W., 90, 251, 253 Ciavolino, E., 91 Cluster analysis, 290 Coding, 10 Coefficient of determination (R²), 195 Comma-separated value (.csv), 73 Common factor-based SEM, 15 Common factor model, 297 Communality, 117, 119

Index  353

Competence (COMP), 50–51, 51 (exhibit), 67–71 mediation, 240–243 model estimation, PLS-SEM, 100–103 moderation, 261–267 reflective measurement model assessment, 127–135 structural model assessment, 210–223 Competitive mediation, 234 Complementary mediation, 234 Composite-based SEM, 16, 23 PLS-SEM algorithm, 86 Composite indicators, 53–54 Composite reliability, 119 Composite score, 8 Composite variables, 6, 7 (exhibit), 16 Compositional invariance, 294, 295 (exhibit) Conditional indirect effect, 257 Conditional process models, 257 Configural invariance, 294, 295 (exhibit) Confirmatory composite analysis (CCA), 111, 114, 114–115 (exhibit) Confirmatory factor analysis, 3 Confirmatory tetrad analysis in PLS-SEM (CTA-PLS), 56, 281–285 indicators, 282 (exhibit) model-implied nonredundant vanishing tetrads, 283 results, 285 (exhibit) steps, 282–283 vanishing tetrads, 282 Consistency at large, 93 Consistent PLS-SEM (PLSc-SEM), 296–297 Constructs, 7 PLS-SEM algorithm, 86 Content validity, 117 Continuous moderator variable, 246 Convergence, 96 Convergent validity, 120 assessment, 143–145

Corporate reputation on customer satisfaction (CUSA), 67–71 mediation, 240–243 model estimation, PLS-SEM, 100–103 moderation, 261–267 reflective measurement model assessment, 127–135 structural model assessment, 210–223 Corporate social responsibility (CSOR), 50–51, 51 (exhibit), 145, 161 (exhibit) Correlation weights, 89 Covariance-based structural equation modeling (CB-SEM), 4 common factor-based SEM, 15 common variance and, 18 guidelines for choosing, 31, 32 (exhibit) PLS-SEM and, 15 vs. PLS-SEM selection, 31, 32 (exhibit) sum scores and, 15–18 Coverage error, 157 Critical t values, 154, 192 Cronbach’s alpha, 118–120, 129 Cross-loadings, 122 Cross-validated predictive ability test (CVPAT), 208 Customer loyalty (CUSL), 67–71, 210–223 mediation, 240–243 model estimation, PLS-SEM, 100–103 moderation, 261–267 reflective measurement model assessment, 127–135 Danks, N., 205, 221 Data characteristics, 29 (exhibit) metric scale, 28 minimum sample size requirements, 24–27 missing value treatment, 27 nonnormal data, 28 secondary data, 28–29 Data collection and examination, 61 case study, 71–72 distribution, 65–66 guidelines for examining, 66–67 (exhibit)

354   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) missing data, 61–63 outliers, 64–65 questionnaires and, 61–62 response patterns, 64 Data distributions, 10–11, 65–66 Data matrix, 86 (exhibit), 87 Dawson, J., 256 Degrees of freedom (df), 154 DeVellis, R. F., 50 Diagonal lining, data response, 64 Diamantopoulos, A., 50, 57, 59 Dijkstra, T. K., 86, 114, 119, 189 (exhibit), 190 (exhibit) Direct effect, 44, 194 Direct-only nonmediation, 234 Disattenuated correlation, 122 Discriminant validity, 120–126 bootstrap confidence interval, 123–124, 133 cross-loadings, 122 disattenuated correlation, 122 Fornell-Larcker criterion, 121–122, 121 (exhibit), 132 (exhibit) heterotrait-heteromethod correlations, 122 heterotrait-monotrait ratio (HTMT), 122–124, 123 (exhibit), 132 (exhibit) monotrait-heteromethod correlations, 122, 124 Disjoint two-stage approach, 280 Eberl, M., 67 Effect indicators. See Reflective measurement Efron, B., 158 Embedded two-stage approach, 280 Empirical t value, 154 Endogeneity, 285–286 Endogenous latent variables, 13, 42 PLS-SEM algorithm, 88 EP-theoretic approach, 22 Equality of composite mean values and variances, 294, 295 (exhibit)

Equidistance, 9 Error terms, 13 Esposito Vinzi, V., 90 Evaluation criteria, 109–110. See also PLS-SEM results Exact fit test, 189 (exhibit) Exogenous latent variables, 13, 42 PLS-SEM algorithm, 88 Explaining and predicting (EP) theories, 22 Explanatory power, structural model, 194–196 Exploratory factor analysis, 3 Extended repeated indicators approach, 280 Factor (score) indeterminacy, 17 Factor weighting scheme, 90 Falk, R. F., 43 f2 effect size, 195 interpretation of, 196 (exhibit) Finite mixture partial least squares (FIMIX-PLS), 291 First-generation techniques, 3 limitations of, 3–4 Fordellone, M., 293 Formative index, 52 Formative indicators absolute contribution, 151 decision-making process, 152 (exhibit) relative contribution, 149 rules, 153 (exhibit) Formative measurement, 14, 51–57 PLS-SEM algorithm, 88 PLS-SEM results, 113 vs. reflective measurement, 52 (exhibit), 54–56 Formative measurement model assessment, 171–181 bootstrapping procedure, 152–159, 155 (exhibit) case study, 159–181 collinearity issues, 145–148, 148 (exhibit) content specification, 141

Index  355

convergent validity, 143–145 corporate reputation, theoretical model of, 159 correlation matrix, 146 global single item, 144 higher-order construct, 150 indicators of, 148–152, 160–161 (exhibit) PLS-SEM algorithm, 166 (exhibit), 167 procedure, 142 (exhibit) redundancy analysis, 143, 172 (exhibit) results, 141 rules of, 153–159 simple path model, 159 SmartPLS 3 software, 156 tolerance (TOL), 146 variance inflation factor (VIF), 147, 148 (exhibit) VIF values, 173 (exhibit) Formative-reflective higher-order construct, 278 Fornell-Larcker criterion, 121–122 discriminant validity, 132 (exhibit) visual representation of, 121 (exhibit) Fuchs, C., 57 Full measurement invariance, 295 Full mediation, 234 Gaussian copula approach, 286 Genetic algorithm segmentation in PLS-SEM (PLS-GAS), 291 Geweke and Meese criterion (GM), 205 Global single item, validation of, 144, 144 (exhibit) Goodness-of-fit index (GoF), 189 (exhibit) Graphical interfaces, 91–92 Gregor, S., 22 Grimm, M. S., 62–63 Gudergan, S. P., 24, 56, 115, 158, 276, 280, 284 Gupta, S., 286 Hadaya, P., 25, 88 Haenlein, M., 25

Hair, J. F., 15, 23, 24, 25, 31, 50, 54, 59, 62, 66, 111, 114, 115, 190 (exhibit), 276, 280, 293, 297 Hauff, S., 277 Hayes, A. F., 237 (exhibit), 258, 259 Henseler, J., 25, 54, 114, 121, 122, 123, 124, 158, 189 (exhibit), 190 (exhibit), 253, 289, 290, 294, 295 Heterogeneity, 46, 229, 288 (exhibit) Heterotrait-heteromethod correlations, 122 Heterotrait-monotrait (HTMT) ratio, 117, 122–124, 123 (exhibit) confidence intervals, 134 (exhibit) Hierarchical component model, 277. See also Higher-order constructs Higher-order component, 60, 278 Higher-order constructs, 59–61, 277–281 example, 60 (exhibit) types of, 278, 279 (exhibit) Higher-order models. See Higher-order constructs Hildebrandt, L., 91 Ho, J. A., 259 Holdout sample, 197 Howard, M. C., 111, 114 Huang, W., 86, 297 Hult, G. T. M., 25, 59, 286, 294 Hwang, H., 90 Hypothesis tests, 50 Hypothesized relationships, 192 Importance-performance map analysis (IPMA), 272, 273 direct, indirect, and total effects, 275 (exhibit) model, 273 (exhibit) rescaling, 274 Index of moderated mediation, 258 Indicators, 7–8 causal, 53–54 composite, 53–54 formative measurement models assessment, 148–152 path models and, 12

356   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) PLS-SEM algorithm, 87 for reflective measurement, 70 (exhibit) reliability, 117–118 Indirect effect, 44, 194 Indirect-only mediation, 234 Individual mediating effect, 239 Initial values, PLS-SEM algorithm, 95 Inner model. See Structural model In-sample predictive power, 195 Interaction effect, 245 Interaction term, 248, 249 guidelines for creating, 253 model evaluation, 253–254 results interpretation, 254–257 Internal consistency reliability, 118–120 Interpretational confounding, 149 Interval scale, 9 Inverse square root method, 25–27 Ismail, I. R., 91 Item non-response, 62 Items/manifest variables, 7 Iterative reweighted regressions segmentation (PLS-IRRS), 293 Jarvis, C. B., 281 Joint mediating effect, 239 Kaiser, S., 57 Kamakura, W. A., 57 Kenny, D. A., 233–234 K-fold cross-validation, 198 Kim, K. H., 43 Klein, K., 280 Klesel, M., 289 Kock, N., 25, 88 Kolmogorov-Smirnov test, 11 Kreis, H., 91 Kurtosis, 11, 66 Lance, C. E., 294 Latent class techniques, 287 Latent variables, 7 scores, 86 Lauro, C., 90

Leroi-Werelds, S., 153–154 Likeability (LIKE), 67–71 mediation, 240–243 model estimation, PLS-SEM, 100–103 moderation, 261–267 reflective measurement model assessment, 127–135 structural model assessment, 210–223 Likert scales, 10, 71, 95, 221 Lim, Q. H., 259 Linear regression model (LM) benchmark, 201–202 Listwise deletion, 62 Little, T. D., 249 Lohmöller, J.-B., 30, 86, 91 Lower-order components, 60, 278 Loyalty construct, 42, 42 (exhibit) mediating effect, 43–44, 43 (exhibit) Lynch, J. G., 234 MacKenzie, S. B., 50, 281 Management competence, 150 Manifest variables, 7 Marakas, G. M., 53 Marcolin, B. L., 251 Maximum number of iterations, 96 Mean absolute error (MAE), 200–204 Mean value replacement, 62 Measurement, 7–8 composite score, 8 concept of, 7 error, 8 indicators, 7–8 invariance, 294–295 items/manifest variables, 7 of latent variables/constructs, 7 process, 7 single-item constructs, 8 Measurement invariance of composite models (MICOM) procedure, 294, 295 (exhibit) Measurement model, 13, 41 case study, 69–71 confirmatory tetrad analysis in PLS-SEM (CTA-PLS), 56

Index  357

guidelines for choosing, 56 (exhibit) higher-order component, 60 higher-order constructs, 59–61 lower-order components, 60 nomological validity, 113 path model, 50, 61 (exhibit) PLS-SEM algorithm, 87 PLS-SEM results, 110 (exhibit), 113 reflective and formative measurement models, 51–57 reliability, measurement error, 111–113 second-order constructs, 60 single-item measures, 57–59 specification, 50–61 sum scores, 57–59 validation, 111–113 Measurement model misspecification, 281 Measurement scale, 8 categorical, 8 equidistance, 9 interval, 9 nominal, 8 ordinal, 9 ratio scale, 9 Measurement theory, 13, 14 Mediated moderation, 259 Mediation, 44–45, 44 (exhibit), 228, 229 analysis procedure, 235 (exhibit) case study, 240–243 cause-effect relationship, 230 (exhibit), 231 (exhibit) evaluation in, 233 model, 230 (exhibit), 232 (exhibit) multiple, 238–240 PROCESS vs. PLS-SEM, 237–238 (exhibit) rules, 240 (exhibit) testing, 236–238 types of, 233–236 Mediator construct, 228 Mena, J. A., 15 Metric scale, 28 Metrological uncertainty, 17

Miller, N. B., 43 Minimum sample size requirements, 24–27, 27 (exhibit) Missing data, 61–63 alternating extreme pole responses, 64 diagonal lining response, 64 kurtosis, 66 levels, 62 outlier response, 64–65 skewness, 66 straight lining response, 64 survey non-response, 62 surveys, 64 Missing values model estimation, PLS-SEM, 99, 99 (exhibit) treatment, 27, 62 Model characteristics, 19–20 (exhibit), 30–31, 31 (exhibit) Model comparisons, 43, 205–209, 222 (exhibit) Model complexity, 94 Model estimation, PLS-SEM, 97–99 missing values, 99, 99 (exhibit) path coefficients, 100–101, 100–101 (exhibit) PLS-SEM algorithm settings, 98 (exhibit) results, 99–103 singular data matrix, 99 Model overfit, 195 Model parsimony, 190 Moderated mediation, 257 Moderation, 45–47, 229 case study, 260–267 categorical moderator variable, 245 continuous moderator variable, 246 heterogeneity and, 46 interaction term, 249 modeling effects, 247–248 multigroup analysis, 46, 47 (exhibit), 245 (exhibit) orthogonalizing approach, 250–251 overview, 243–245 product indicator approach, 249–250

358   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) rules, 259–260 (exhibit) theoretical model, 46 (exhibit), 244 (exhibit) two-stage approach, 251–252 types of, 245–246 Moderator effect, 45, 247 Moderator variable, 243 Monotrait-heteromethod correlations, 122, 124 Mooi, E. A., 63, 65, 72 Multicollinearity, 145 Multigroup analysis, 46, 47 (exhibit), 287–290 Multiple mediation, 238–240 analysis, 238 Multiple moderator model, 248 Multiple regression analysis, 3 with sum scores, 16 Multivariate analysis, 2 coding, 10 organization of, 2 (exhibit) PLS regression, 18 See also Covariance-based structural equation modeling (CB-SEM); Partial least squares structural equation modeling (PLS-SEM) Necessary condition analysis (NCA), 276–277 Newsted, P. R., 251 Ng, S. I., 259 Niehaves, B., 289, 290 Nitzl, C., 111, 114, 232 No-effect nonmediation, 234 Nominal scale, 8 Nomological validity, 113 Nonnormal data, 28 Non-response data, 62 Normal data distributions, 11 Observed heterogeneity, 286–294 Oliver, R. L., 22 One-tailed test, 192 Ordinal scale, 9 Orthogonalizing approach, 249–250

Outer loadings, 88 reflective measurement model assessment, 128 relevance test, 118 (exhibit) Outer model in PLS-SEM, 13 Outer weights, 88 Outliers, 64–65 Out-of-sample predictive power, 196 Pairwise deletion, missing data, 63 Parameter settings, 94–96 Parametric approach, 289 Park, S., 286 Parsimonious models, 205 Partial least squares k-means (PLS-SEM-KM), 293 Partial least squares structural equation modeling (PLS-SEM), 2 algorithm, 22 case studies, 67–72 vs. CB-SEM selection, 31, 32 (exhibit) characteristics of, 18–24, 19–21 (exhibit) common factor-based SEM and, 15–16 composite-based SEM and, 16, 23 data characteristics, 19 (exhibit), 24–29, 27 (exhibit) guidelines for choosing, 31, 32 (exhibit) limitations of, 22 metric scale, 28 minimum sample size requirements, 24–27 missing value treatment, 27 model characteristics, 19–20 (exhibit), 30–31, 31 (exhibit) model estimation, 20 (exhibit), 23 model evaluation, 20–21 (exhibit) model fit measures in, 189–190 (exhibit) nonnormal data, 28 procedure for applying, 33 (exhibit) R² values, 18 secondary data, 28–29 sum scores and, 15–18 variance-based SEM approach, 18

Index  359

Partial measurement invariance, 295 Partial mediation, 234 Path coefficients, 90 structural model assessment, 217 (exhibit) Path model, 12, 12 (exhibit) measurement, 13, 41 SmartPLS 3 software and, 62, 63, 67, 72–79 structural, 13, 41 theoretical relationships, 13–15 Path weighting scheme, 90 Percentile method, 157 Performance (PERF), 159, 161 (exhibit) Permutation test, 289 Pierce, C. A., 254 PLSpredict procedure, 188, 196, 199 (exhibit) guidelines, 203 (exhibit) PLS regression, 18 PLS-SEM algorithm, 86 composite-based SEM method, 86 constructs, 86 convergence, 96 data matrix for, 86 (exhibit), 87 exogenous constructs, 88 factors, 89 formative measurement models assessment, 166 (exhibit), 167 indicators, 87 initializing rules, 96 (exhibit) initial values, 95 latent variables, 86, 88 maximum number of iterations, 96 measurement model, 87 options and parameter, 94–96 path coefficients, 90 path model and data for, 87 (exhibit) prediction, 90 raw data, 86 results, 96–97 R² value, 90 secondary data, 87 statistical properties, 92–94

stop criterion, 96 structural model, 88 weighted, 90, 91 PLS-SEM bias, 23, 93 PLS-SEM results confirmatory composite analysis (CCA), 114, 114–115 (exhibit) evaluation criteria, 109–110 formative measurement model, 113 measurement error, 111 measurement models, 110 (exhibit) nomological validity, 113 outer loadings, 170 (exhibit) reflective measurement model, 112 reliability, 111–113 rules, 116 (exhibit) structural model, 110 (exhibit) validity, 111, 112 PLS typological path modeling (PLS-TPM), 291 Podsakoff, N. P., 50 Podsakoff, P. M., 50, 281 Preacher, K. J., 238, 256 Prediction, 90 error, 201 Prediction-oriented segmentation in PLS-SEM (PLS-POS), 291 Predictive power model, 196–205 folds, number of, 198–199 interpretation, results, 200–204 issues, 204–205 linear regression model (LM) benchmark, 201–202 repetitions, 199–200 statistic, 200 Procedure for applying PLS-SEM, 33 (exhibit) PROCESS vs. PLS-SEM, mediation analysis, 237–238 (exhibit) Product indicator approach, 249 Q2 statistic, 197 Quality (QUAL), 159, 160–161 (exhibit), 210–223 Questionnaires and data collection, 61–62

360   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) Ramayah, T., 91 Ratio scale, 9 Ray, S., 188, 205 Redundancy analysis, 143–144 Reflective-formative higher-order construct, 278 Reflective measurement evaluation criteria, 112 vs. formative measurement, 52 (exhibit), 54–56 indicators for, 70 (exhibit) model, 11, 51–57, 114 PLS-SEM algorithm, 88 Reflective measurement model assessment, 116–117, 169–170 case study, 127–135 composite reliability, 119 construct reliability and validity, 130 (exhibit) content validity, 117 convergent validity, 120 Cronbach’s alpha, 118–120, 129 discriminant validity, 120–126 Fornell-Larcker criterion, 121–122, 121 (exhibit) heterotrait-monotrait ratio (HTMT), 122–124, 123 (exhibit) indicator reliability, 117–118 internal consistency reliability, 118–120 outer loadings, 118 (exhibit), 128 PLS-SEM algorithm, 127–135 quality criteria, 129 reliability coefficient, 119 rules, 126 (exhibit) SmartPLS 3 software, 127–135 Reflective-reflective higher-order construct, 278 Regression analysis, 3–4 Regression weights, 89 Reinartz, W., 25 Relative contribution, indicators, 149 Relevance of significant relationships, 193 Reliability coefficient, 119 Reliability, measurement error, 111–113

Repeated indicators approach, 279 Reputation construct, 42, 42 (exhibit) categorical variable, 49, 49 (exhibit) mediating effect, 43–44, 43 (exhibit) Rescaling, 274 Response-based procedure for detecting unit segments in PLS path modeling (REBUS-PLS), 291 Response-based segmentation techniques, 291 Response patterns, data collection, 64 Retrospective approach, 25–26 Review articles, 5 (exhibit) Richter, N. F., 277 Rigdon, E. E., 17, 54, 276 Ringle, C. M., 15, 24, 25, 31, 54, 56, 59, 90, 115, 121, 122, 123, 158, 190 (exhibit), 273, 276, 277, 280, 289, 291, 293, 294, 295, 297 Risher, J. J., 31 Rockwood, N. J., 259 Roldán, J. L., 91, 232 Rönkkö, M., 53, 158 Root mean square error (RMSE), 200–204 Root mean square residual covariance (RMStheta), 189 (exhibit) R² values, 18, 195 PLS-SEM algorithm, 90 Sarstedt, M., 15, 17, 23, 24, 25, 31, 43, 54, 57, 59, 63, 65, 72, 73, 90, 115, 121, 122, 123, 158, 190 (exhibit), 205, 221, 273, 276, 277, 280, 291, 293, 294, 295 Satisfaction construct, 42, 42 (exhibit) categorical variable, 49, 49 (exhibit) mediating effect, 43–44, 43 (exhibit) Scales, 52 of measurement, 28–29 Schlittgen, R., 293 Schuberth, F., 114, 190 (exhibit), 289, 290 Schubring, S., 277 Schwaiger, M., 67, 69, 159 Secondary data, 28–29 PLS-SEM algorithm, 87

Index  361

Second-generation techniques, 4 tools, 2 Second-order constructs, 60, 278 SEM. See Structural equation modeling (SEM) Serial mediating effect, 239 Shapiro-Wilk test, 11, 65 Sharma, P. N., 43, 205, 221 Shmueli, G., 43, 188, 190 (exhibit), 196, 205 Simple corporate reputation model, 68 Single-item constructs, 8 Single-item measures, 57–59 guidelines for, 58 sum scores and, 59 Single mediation analysis, 238 Singular data matrix, 99 Sinkovics, R. R., 158, 289 Skewness, 11, 66 Slope plot, 256 SmartPLS 3 software bootstrapping options in, 175 (exhibit) to create new project, 73–79 data view in, 74 (exhibit) extended model in, 163 (exhibit), 241 (exhibit) formative measurement models assessment, 156, 162, 163–165 (exhibit) initial model, 76 (exhibit) mediation, 240 model estimation, 97–103 path model and, 62, 63, 67, 72–79 reflective measurement model, assessment of, 127–135 simple model with names and data assigned, 78 (exhibit) Sobel, M. E., 236 Sobel test, 236 “Soft modeling,” 28–29 Software packages, statistical, 92 Specific indirect effect, 238 Squillacciotti, S., 291 Standard error, 192

Standardized root mean square residual (SRMR), 189 (exhibit) Standardized values, 192 Statistical modeling technique, 4 Statistical power, 22 Statistical properties, 92–94 Steenkamp, J. B. E. M., 29 Stop criterion, PLS-SEM algorithm, 96 Straight lining, data response, 64 Streukens, S., 153–154 Structural equation modeling (SEM), 4 coding, 10 common factor-based, 15 composite variables, 6, 7 (exhibit) covariance-based structural equation modeling (CB-SEM), 4 data distributions, 10–11 elements, 6 measurement, 7–9 PLS path modeling, 4 types of, 4 Structural equation modeling, principles of measurement theory, 14 path model, 12–13 structural theory, 14–15 theoretical relationships, 13–14 Structural model, 13 case study, 67–69 causal links, 43 control variables, 47–50 endogenous latent variables, 42 exogenous latent variables, 42 guidelines for, 61 (exhibit) mediating effect, 44–45, 44 (exhibit) model comparisons, 43 moderation, 45–47 PLS-SEM algorithm, 88 PLS-SEM results, 110 (exhibit) specification, 41 theoretical model, 67–69 types of constructs, 42, 42 (exhibit) Structural model assessment BIC values, 223 bootstrapping results, 214–215 (exhibit)

362   A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) bootstrapping samples, 216 (exhibit) case study, 209–223 for collinearity, 191 comparisons, 205–209 critical value, 192 direct, indirect, and total effects, 194 explanatory power, 194–196 f2 effect sizes, 219 (exhibit) hypothesized relationships, 192 modeling window, results in, 211 (exhibit) overview, 187 path coefficients, 217 (exhibit) PLSpredict results report, 220 (exhibit), 221 (exhibit) predictive power, 196–205 procedure, 188 (exhibit) relevance of significant relationships, 193 rules of, 208–209 (exhibit) significance and relevance of, 192–194 standard error, 192 standardized values, 192 total effects, 213 (exhibit) VIF values, 212 (exhibit) Structural theory, 13, 14–15 Studentized bootstrap method, 158 Sum scores, 16, 57–59 PLS-SEM algorithm, 94 Suppressor variable, 234 Survey non-response, 62 Surveys, missing data, 64 Techniques, advanced confirmatory tetrad analysis in PLS-SEM (CTA-PLS), 281–285 consistent PLS-SEM, 296–297 endogeneity, 285–286 higher-order construct, 277–281 measurement invariance, 294–295 observed heterogeneity, 286–294 unobserved heterogeneity, 286–294

Tee, K. K., 259 Temme, D., 91 Tenenhaus, M., 90 10 times rule, 25 Tetrad, 281 Text (.txt) data sets, 73 Theoretical t values, 154 Theory, 13 Thiele, K. O., 25, 43, 59, 205 Three-way interaction, 248 Ting, H., 91 Ting, K.-F., 282 Tolerance (TOL), 146 Total effects, 194 structural model assessment, 213 (exhibit) Total indirect effect, 238 Training sample, 197 Two-stage approach, 251, 280 Two-tailed test, 192 Two-way interaction, 248 Unit non-response. See Survey non-response Unobserved heterogeneity, 286–287 cluster analysis, 290 finite mixture partial least squares (FIMIX-PLS), 291 genetic algorithm segmentation in PLS-SEM (PLS-GAS), 291 iterative reweighted regressions segmentation (PLS-IRRS), 292 partial least squares k-means (PLS-SEM-KM), 293 PLS typological path modeling (PLS-TPM), 291 prediction-oriented segmentation in PLS-SEM (PLS-POS), 291 response-based procedure for detecting unit segments in PLS path modeling (REBUS-PLS), 291 response-based segmentation techniques, 291 “Unwise deletion,” data, 63

Index  363

Validation, 111 measurement error, 111–113 Vandenberg, R. J., 294 Vanishing tetrads, 282 Variance-based SEM, 18 Variance inflation factor (VIF), 147 collinearity assessment using, 148 (exhibit) Variate. See Composite variables Velasquez Estrada, J. M., 188 Vichi, M., 293 Völckner, F., 291

Wagner, R., 62–63 Weighted PLS-SEM (WPLS) algorithm, 91 Wende, S., 56, 158, 284 Wetzels, M., 280 Widaman, K. F., 249 Wilczynski, P., 57 Will, A., 56, 158, 284 Winklhofer, H. M., 50 Wold, H., 28, 30, 86 Zhao, X., 234