Perspectives in Business Informatics Research: 19th International Conference on Business Informatics Research, BIR 2020, Vienna, Austria, September 21–23, 2020, Proceedings [1st ed.] 9783030611392, 9783030611408

This book constitutes the proceedings of the 19th International Conference on Perspectives in Business Informatics Resea

691 104 11MB

English Pages XII, 221 [224] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Perspectives in Business Informatics Research: 19th International Conference on Business Informatics Research, BIR 2020, Vienna, Austria, September 21–23, 2020, Proceedings [1st ed.]
 9783030611392, 9783030611408

Table of contents :
Front Matter ....Pages i-xii
Front Matter ....Pages 1-1
The Acceptance of Smart Home Technology (Christina Gross, Markus Siepermann, Richard Lackes)....Pages 3-18
Measuring the Barriers to the Digital Transformation in Management Courses – A Mixed Methods Study (Kristin Vogelsang, Henning Brink, Sven Packmohr)....Pages 19-34
Experiences of Applying the Second Step of the Digital Innovation and Transformation Process in Zoological Institutions (Johannes Wichmann, Matthias Wißotzki, Patrick Góralski)....Pages 35-49
Front Matter ....Pages 51-51
Cyber Security Resilience in Business Informatics: An Exploratory Paper (Haralambos Mouratidis, Jelena Zdravkovic, Janis Stirna)....Pages 53-66
The Model for Continuous IT Solution Engineering for Supporting Legal Entity Analysis (Marite Kirikova, Zane Miltina, Arnis Stasko, Marina Pincuka, Marina Jegermane, Daiga Kiopa)....Pages 67-81
Fields of Action to Advance the Digital Transformation of NPOs – Development of a Framework (Henning Brink, Sven Packmohr, Kristin Vogelsang)....Pages 82-97
Front Matter ....Pages 99-99
Is Team Always Right: Producing Risk Aware Effort Estimates in Agile Development (Jānis Grabis, Vineta Minkēviča, Bohdan Haidabrus, Rolands Popovs)....Pages 101-110
Design Decisions and Their Implications: An Ontology Quality Perspective (Achim Reiz, Kurt Sandkuhl)....Pages 111-127
Service Dependency Graph Analysis in Microservice Architecture (Edgars Gaidels, Marite Kirikova)....Pages 128-139
Front Matter ....Pages 141-141
Text Mining the Variety of Trends in the Field of Simulation Modeling Research (Mario Jadrić, Tea Mijač, Maja Ćukušić)....Pages 143-158
Service Quality Evaluation Using Text Mining: A Systematic Literature Review (Filip Vencovský)....Pages 159-173
Making Use of the Capability and Process Concepts – A Structured Comparison Method (Anders W. Tell, Martin Henkel)....Pages 174-188
Front Matter ....Pages 189-189
Designing Causal Inference Systems for Value-Based Spare Parts Pricing (Tiemo Thiess, Oliver Müller)....Pages 191-204
Organizational Change Toward IT-Supported Personal Advisory in Incumbent Banks (Maik Dehnert)....Pages 205-219
Back Matter ....Pages 221-221

Citation preview

LNBIP 398

Robert Andrei Buchmann Andrea Polini Björn Johansson Dimitris Karagiannis (Eds.)

Perspectives in Business Informatics Research 19th International Conference on Business Informatics Research, BIR 2020 Vienna, Austria, September 21–23, 2020 Proceedings

123

Lecture Notes in Business Information Processing Series Editors Wil van der Aalst RWTH Aachen University, Aachen, Germany John Mylopoulos University of Trento, Trento, Italy Michael Rosemann Queensland University of Technology, Brisbane, QLD, Australia Michael J. Shaw University of Illinois, Urbana-Champaign, IL, USA Clemens Szyperski Microsoft Research, Redmond, WA, USA

398

More information about this series at http://www.springer.com/series/7911

Robert Andrei Buchmann Andrea Polini Björn Johansson Dimitris Karagiannis (Eds.) •





Perspectives in Business Informatics Research 19th International Conference on Business Informatics Research, BIR 2020 Vienna, Austria, September 21–23, 2020 Proceedings

123

Editors Robert Andrei Buchmann Babeș-Bolyai University Cluj Napoca, Romania

Andrea Polini University of Camerino Camerino, Italy

Björn Johansson Linköping University Linköping, Sweden

Dimitris Karagiannis University of Vienna Vienna, Austria

ISSN 1865-1348 ISSN 1865-1356 (electronic) Lecture Notes in Business Information Processing ISBN 978-3-030-61139-2 ISBN 978-3-030-61140-8 (eBook) https://doi.org/10.1007/978-3-030-61140-8 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

The 19th edition of the International Conference on Perspectives in Business Informatics Research (BIR 2020) – initially set to take place at the University of Vienna during September 2020 – was deterred, as many scientific events around the globe, by the COVID-19 pandemic. The conference – with its numerous satellite events (workshops, doctoral consortium) – has always been a scientific networking hub, encouraging elaborated critical discussion and providing knowledge exchange opportunities, therefore the organization team decided to postpone BIR 2020 as part of an extended event to take place during BIR 2021. The joint BIR 2020-2021 event will thus include presentations of both years’ submissions, potentially expanding the BIR community and the range of discussed topics, while preserving the networking nature of the conference. Although this volume will have been published prior to the extended event of 2021, the peer-review process was applied as usual for submissions received during 2020, leading to the selection of 14 papers based on a minimum of 3 reviews for each paper. The acceptance rate was 29%, from a base of 48 submissions in total. The papers are grouped by the following topics: Digital Transformation and Technology Acceptance, Multi-perspective Enterprise Models and Frameworks, Supporting Information Systems Development, Literature and Conceptual Analysis, Value Creation, and Value Management. Business Informatics is an established research area at the intersection of Business Administration and Computer Science, subordinated to the Information Systems field. Since 2000, when BIR was initiated in Rostock, Germany, the topics investigated by this community have gained wide adoption and have stimulated a convergence of business and technical views towards establishing novel methods for supporting business decisions with the help of socio-technical systems. The teaching agenda of Business Informatics is steadily spreading across Europe, with numerous study programs emerging under this name, typically inspired by the diversity of research topics covered by this umbrella term, e.g., Enterprise Modeling, Business Process Management, Information Systems Development, Decision Support Systems, and E-business. The scientific community fostered around these topics is continuously expanding the relevance of the BIR conference series, making it a prestigious European event that will soon have its 20th edition, coalescing the presentation of papers selected from both the 2020 and 2021 submissions. We express our gratitude to the BIR Steering Committee members who agreed to have this edition managed by our team. In particular, we thank Prof. Marite Kirikova and Prof. Kurt Sandkuhl for providing assistance during the difficult decision-making process related to the pandemic. We thank all the authors who submitted their work and the Program Committee members who contributed reviews to the paper selection process. We also thank the

vi

Preface

global community of the Open Models Laboratory (OMiLAB, www.omilab.org) for contributing either submissions or reviews to the selection process of BIR 2020. The technical support regarding the production of this volume was ensured by the Springer team, lead by Ralf Gerstner and Christine Reiss, to whom we are thankful for the continuous partnership with BIR. Last but not least, we’d like to thank the Vienna organization team, lead by Victoria Döller, for their hard work on managing the communication, website, and registration process that made the publication of this volume possible. We are looking forward to attending the presentations of the hereby selected works during the extended event planned for 2021. August 2020

Björn Johansson Dimitris Karagiannis Robert Andrei Buchmann Andrea Polini

Organization

BIR 2020 is managed by an organization team affilitated to the University of Vienna, Austria. The actual event was postponed to 2021 due to the pandemic situation during 2020 and this volume contains the pre-event selection of accepted papers.

General Chairs Björn Johansson Dimitris Karagiannis

Lund University and Linköping, Sweden University of Vienna, Austria

Program and Publication Co-chairs Robert Andrei Buchmann Andrea Polini

Babeș-Bolyai University of Cluj Napoca, Romania University of Camerino, Italy

Local Organizing Chair Victoria Döller

University of Vienna, Austria

Steering Committee Kurt Sandkuhl (Chair) Eduard Babkin Rimantas Butleris Sven Carlsson Peter Forbrig Björn Johansson Marite Kirikova Andrzej Kobyliñski Lina Nemuraite Jyrki Nummenmaa Raimundas Matulevicius Vaclav Repa Benkt Wangler Stanislaw Wrycza

Rostock University, Germany State University of Nizhny Novgorod, Russia Kaunas Technical University, Lithuania Lund University, Sweden Rostock University, Germany Lund University and Linköping University, Sweden Riga Technical University, Latvia Warsaw School of Economics, Poland Kaunas Technical University, Lithuania University of Tampere, Finland Tartu University, Estonia University of Economics in Prague, Czech Republic University of Skövde, Sweden University of Gdansk, Poland

Program Committee Gundars Alksnis Bo Andersson Said Assar

Riga Technical University, Latvia Lund University, Sweden Institut Mines Telecom Business School, France

viii

Organization

Eduard Babkin Per Backlund Amelia Bădică Peter Bellström Meral Binbasioglu Cătălin Boja Dominik Bork Tomas Bruckner Robert Andrei Buchmann Witold Chmielarz Michal Choras Chiara Di Francescomarino Hans-Georg Fill Peter Forbrig Ana-Maria Ghiran Janis Grabis Janis Grundspenkis Knut Hinkelmann Adrian Iftene Emilio Insfran Amin Jalali Florian Johannsen Björn Johansson Gustaf Juell-Skielse Dimitris Karagiannis Christina Keller Sybren De Kinderen Marite Kirikova Michael Lang Birger Lantow Michael Le Duc Massimiliano De Leoni Ginta Majore Raimundas Matulevicius Patrik Mikalef Andrea Morichetta Jens Myrup Pedersen Jacob Norbjerg Jyrki Nummenmaa Cyril Onwubiko Malgorzata Pankowska Victoria Paulsson

National Research University, Russia University of Skövde, Sweden University of Craiova, Romania Karlstad University, Sweden Hofstra University, USA Bucharest University of Economic Studies, Romania University of Vienna, Austria University of Economics in Prague, Czech Republic Babeș-Bolyai University, Romania University of Warsaw, Poland University of Science and Technology Bydgoszcz, Poland Bruno Kessler Foundation, Italy Freiburg University, Switzerland Rostock University, Germany Babeș-Bolyai University, Romania Riga Technical University, Latvia Riga Technical University, Latvia University of Applied Sciences Northwestern Switzerland, Switzerland Alexandru loan Cuza University, Romania Polytechnic University of Valencia, Spain Stockholm University, Sweden University of Applied Sciences Schmalkalden, Germany Lund University and Linköping, Sweden Stockholm University, Sweden University of Vienna, Austria Jönköping University, Sweden University of Duisburg-Essen, Germany Riga Technical University, Latvia NUI Galway, Ireland Rostock University, Germany Mälardalen University, Sweden University of Padua, Italy Vidzeme University of Applied Sciences, Latvia University of Tartu, Estonia Norwegian University of Science and Technology, Norway University of Camerino, Italy Aalborg University, Denmark Copenhagen Business School, Denmark University of Tampere, Finland Research Series Ltd, UK University of Economics in Katowice, Poland Dublin City University, Ireland

Organization

Data Petcu John Sören Pettersson Tomas Pitner Pierluigi Plebani Paul Pocatilu Andrea Polini Dorina Rajanen Barbara Re Iris Reinhartz-Berger Vaclav Repa Stefanie Rinderle-Ma Ben Roelens Kurt Sandkuhl Rainer Schmidt Manuel Serrano Gheorghe Cosmin Silaghi Piotr Soja Janis Stirna Stefan Strecker Frantisek Sudzina Ann Svensson Torben Tambo Filip Vencovsky Gianluigi Viscusi Anna Wingkvist Stanislaw Wrycza Jelena Zdravkovic Alfred Zimmermann Wieslaw Wolny

ix

West University of Timisoara, Romania Karlstad University, Sweden Masaryk University, Czech Republic Polytechnic University of Milan, Italy Bucharest University of Economic Studies, Romania University of Camerino, Italy University of Oulu, Finland University of Camerino, Italy University of Haifa, Israel University of Economics in Prague, Czech Republic University of Vienna, Austria Open University of the Netherlands, The Netherlands Rostock University, Germany Munich University of Applied Sciences, Germany University of Castilla-La Mancha, Spain Babeș-Bolyai University, Romania Krakow University of Economics, Poland Stockholm University, Sweden University of Hagen, Germany Aalborg University, Denmark University West, Sweden Aarhus University, Denmark University of Economics in Prague, Czech Republic EPFL-CDM-CSI, Switzerland Linnaeus University, Sweden University of Gdansk, Poland University of Stockholm, Sweden Reutlingen University, Germany University of Economics in Katowice, Poland

Contents

Digital Transformation and Technology Acceptance The Acceptance of Smart Home Technology . . . . . . . . . . . . . . . . . . . . . . . Christina Gross, Markus Siepermann, and Richard Lackes

3

Measuring the Barriers to the Digital Transformation in Management Courses – A Mixed Methods Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kristin Vogelsang, Henning Brink, and Sven Packmohr

19

Experiences of Applying the Second Step of the Digital Innovation and Transformation Process in Zoological Institutions . . . . . . . . . . . . . . . . . Johannes Wichmann, Matthias Wißotzki, and Patrick Góralski

35

Multi-perspective Enterprise Models and Frameworks Cyber Security Resilience in Business Informatics: An Exploratory Paper . . . Haralambos Mouratidis, Jelena Zdravkovic, and Janis Stirna The Model for Continuous IT Solution Engineering for Supporting Legal Entity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marite Kirikova, Zane Miltina, Arnis Stasko, Marina Pincuka, Marina Jegermane, and Daiga Kiopa Fields of Action to Advance the Digital Transformation of NPOs – Development of a Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Henning Brink, Sven Packmohr, and Kristin Vogelsang

53

67

82

Supporting Information Systems Development Is Team Always Right: Producing Risk Aware Effort Estimates in Agile Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jānis Grabis, Vineta Minkēviča, Bohdan Haidabrus, and Rolands Popovs Design Decisions and Their Implications: An Ontology Quality Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Achim Reiz and Kurt Sandkuhl Service Dependency Graph Analysis in Microservice Architecture. . . . . . . . . Edgars Gaidels and Marite Kirikova

101

111 128

xii

Contents

Literature and Conceptual Analysis Text Mining the Variety of Trends in the Field of Simulation Modeling Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mario Jadrić, Tea Mijač, and Maja Ćukušić

143

Service Quality Evaluation Using Text Mining: A Systematic Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filip Vencovský

159

Making Use of the Capability and Process Concepts – A Structured Comparison Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anders W. Tell and Martin Henkel

174

Value Creation and Value Management Designing Causal Inference Systems for Value-Based Spare Parts Pricing: An ADR Study at MAN Energy Solutions . . . . . . . . . . . . . . . . . . . . . . . . . Tiemo Thiess and Oliver Müller

191

Organizational Change Toward IT-Supported Personal Advisory in Incumbent Banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maik Dehnert

205

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

221

Digital Transformation and Technology Acceptance

The Acceptance of Smart Home Technology Christina Gross(B)

, Markus Siepermann , and Richard Lackes

TU Dortmund, Otto-Hahn-Str. 12, 44227 Dortmund, Germany {christina.gross,markus.siepermann, richard.lackes}@tu-dortmund.de

Abstract. The market for smart home technology (SHT) has increased rapidly and is said to do so during the next years. In particular, comfort and security features are the main focus of vendors. This paper aims to examine the different influencing factors that have an impact on the adoption decision of consumers. For this, a survey was conducted among 327 German consumers. Results show that perceived security and comfort are significant influencing factors. In particular, control functions play an important role. In contrast, neither usability of SHT nor costs show a noteworthy impact on the adoption decision, although costs are expected to be high. Keywords: Smart home technology · Perceived security · Perceived comfort · Perceived costs · Technology acceptance model

1 Introduction Advancing digitalization does not only comprise IT businesses: in private households, digitalization and home automation are evolving trends. A recent representative survey among German users found that half (46%) of the Germans use at least one smart home application [1]. Otherwise almost one quarter (26%) refuses to use smart home technology (SHT) [1]. Comparing these results to the results from a survey of 2016, usage increased by more than 15% (29.4% to 46%). However, denial increased by about 5% points (20.4% to 26%) [2]. Forecasts for the year 2024 expect the SHT market volume to be e 6,686 million. Therefore, growth in sales would be up 11.8% from today (e 4.272 million in 2020) [3]. The number of smart homes is expected to increase from 7.8 million in 2020 to 10.7 million households in 2024 [3]. Although these numbers seem to indicate a huge interest in SHT, they also mean that the share of smart homes among all households in Germany just exceeds 25%. In the context of a smart home, several terms are used simultaneously: smart living, smart environment, home automation, domotics, intelligent home, or adaptive home, but usually mean the same: “A smart home is a residence equipped with a high-tech network, linking sensors and domestic devices, appliances, and features that can be remotely monitored, accessed or controlled, and provide services that respond to the needs of its inhabitants” [4]. It combines home automation with advanced security services and energy management measures [4, 5]. With the help of smart home technology (SHT), © Springer Nature Switzerland AG 2020 R. A. Buchmann et al. (Eds.): BIR 2020, LNBIP 398, pp. 3–18, 2020. https://doi.org/10.1007/978-3-030-61140-8_1

4

C. Gross et al.

households are able to analyze the condition and state of various parameters, anytime and anywhere. In general, three different kinds of services can be realized: lifestyle support, energy consumption and management, and security [4, 6]. Lifestyle support services aim to simplify people’s everyday lives with learning devices that adapt to the habits of users (e.g. turning lights and heating on when present). Additionally, they comprise entertainment services (e.g. voice-controlled music players) or ambient assisted-living, which provides monitoring tools to reduce the follow-up risks of incidents for elderly people (e.g. sensors on the floor, which recognize falls). Energy consumption and management services aim to reduce the total energy consumption of users. This is done by synchronizing the state of different devices, like roller shutters, heating, window opening etc. (e.g. Bosch smart home products). Security services focus on safeguarding by surveilling the home, monitoring the closed state of doors and windows, implementing different alarms, etc. (e.g. Magenta or Innogy smart home products). Recent years have brought a set of intelligent personal assistants, like Amazon Alexa, Apple Siri, or Google Assistant, that aim to improve the comfort of users. These assistants can serve as a central control unit for a smart home. While in 2019 about 3.25 billion digital voice assistants were sold, the demand is set to double in 2022 [7]. The global smart home security market is also expected to grow in the future [8]. In Germany, the volume of home security technology will increase from the forecast of e 790 million in 2020 to about e 1.3 billion e in 2024 [9]. The demand for security products is thus immense. Therefore, this paper aims to reveal the factors that influence the adoption of comfort and security in SHT. In this way, the following research questions will be pursued: RQ1: Which factors influence the adoption of smart home technology? RQ2: What is the contribution of different security and comfort features to the adoption of SHT? To answer these questions, we conducted a survey among 327 consumers in Germany. This survey focused on the comfort and security features of SHT, and incorporated a cost perspective. To the best of our knowledge, this paper is the first to utilize these constructs together. The remainder of this paper is organized as follows: in the next section, we briefly give an overview of the related literature, and demonstrate the contribution of this paper. The third section develops the research model, which serves as the basis for our survey, and is analyzed in section four. The paper closes with a discussion of the results in section five and points out implications and limitations in section six.

2 Literature Review Although SHT and research regarding SHT applications exists since many years [10], papers investigating the adoption behavior of consumers are still scarce. Apart from specialized studies with a focus on the energy sector [11] or qualitative studies [4, 12], only eight studies can be found which use multivariate statistics to analyze the relations between different factors and the adoption of SHT (see Table 1).

The Acceptance of Smart Home Technology

5

Table 1. Related literature Paper

Objective

Basic theory

Sample

Findings

Bao et al. (2014), [23]

Determinants of the adoption of mobile smart home

TAM

310 Chinese

Social influence and compatibility directly influence adoption, perceived costs and perceived technology security risk have no influence on adoption

Gaul and Ziefle (2009), [18]

Acceptance motives of eHealth technologies

TAM

280 Germans

High acceptance among all age groups. Advantages have a stronger impact than disadvantages

Hubert et al. (2019), [6]

Creating a comprehensive adoption model

TAM, DOI, perceived risk theory

409 Germans & Internationals

Perceived usefulness, compatibility, and risk perception have a strong influence on use intention Correct and concrete communication of benefits is essential. Security risk is a strong predictor for the overall risk

Jin Noh and Seong Kim (2010), [17]

Determinants of the adoption of smart home services

Binary logit model

600 Koreans

Acceptance of infra services is mainly affected by age, self-employment, and house property

Nguyen et al. (2018), [19]

Determinants of the adoption of smart home devices

UTAUT

304 Vietnamese

Perceived value, perceived usefulness, trust, and social influence are the main determinants for usage intention

Park et al. (2018), [21]

Determinants of the adoption of smart home services

TAM

799 South Koreans

Perceived usefulness, compatibility, perceived connectedness, and perceived control are the main influencing factors. Costs, enjoyment, and reliability play only a minor role

(continued)

6

C. Gross et al. Table 1. (continued)

Paper

Objective

Basic theory

Sample

Findings

Shih (2013), [22]

Determinants of the adoption of smart home services

TAM, DOI

580 Taiwanese

Usefulness and compatibility are the most influential factors for attitude, followed by interest, observability, and relative advantages. Ease of use has no direct impact on attitude

Yang et al. (2017), [20]

Determinants of the adoption of smart home services

TPB

216 South Koreans

Subjective norm, perceived behavioral control, and attitude influence the usage intention. Mobility, interoperability, privacy risk, and trust in service provider are the main predictors for attitude

Most papers are based on the technology acceptance model (TAM) [13], which is extended by additional constructs. Two papers are based on the successive model the Unified Theory of Acceptance and Use of Technology [14, 15] and their predecessor, the Theory of Planned Behavior [16]. The basic structure of the research models is therefore mostly the same. There is only one paper, which is not based on structural equation modeling, using a binary logit model which examines the impact of user characteristics on the use of different SHT services [17]. The research application areas are various. Gaul and Ziefle (2009) focused on eHealth technologies, in particular on a stent which needs to be implanted [18]. Therefore, their results are difficult to generalize. Although the application area is sometimes labelled differently, the other papers focus on what is commonly referred to as SHT. Some authors [19, 20] focus on the adoption of smart home devices and the associated services, while other autors [6, 21, 22] place an emphasis on the services provided by the manufacturer. In Bao et al. (2014), the focus is on the remote control aspects of SHT, realized with the help of smartphones [23]. The main goal of Hubert et al. (2019) was the creation of a new comprehensive adoption model that combines aspects of the TAM and diffusion of innovations (DOI) [6, 24] with risk aspects [25]. The usage of smart home technology was “only” the application to test their model. In particular, they used a prototype system that mainly made use of security services like automated alarms for fire, water leaks, or burglars, warnings for severe weather conditions and reminders to close windows and doors. However, the system did not provide home automation services like closing the window or providing shade. The detailed advantages of home automation were thus not considered. In fact,

The Acceptance of Smart Home Technology

7

the latter point holds for all the papers. The advantages of a smart home were mostly considered in general, but not discussed in detail within the questionnaires. Only Yang et al. (2017) went into more detail, using separate constructs for automation and mobility [20]. The enhanced security that is associated with SHT was considered [21, 23], but again without going into detail about the factors that were most influential. Concerning the risk associated with SHT three papers [6, 20, 23] focused on this aspect and others investigated trust in SHT [19]. There are two publications which analyze the determinants for the adoption of SHT among real users [17, 22]. All other studies, including this paper, focus on the selfreported future usage intentions. Unlike existing studies, this paper takes a different direction and places emphasis more on the details of SHT benefits. In particular, the security and the comfort that SHT are intended to enhance are the focus of our investigation. SHT provides different security and comfort functions that may contribute to the adoption decision of consumers. Hence, these different functions are analyzed separately, instead of in an aggregated view on the advantages of SHT.

3 Research Model The acceptance of an innovation can be seen as the willingness of people to approve it. In general, three stages of acceptance can be distinguished [26]. In the first, people need to be mentally prepared for the innovation, so that they are positive attuned towards its. Then, if people are ready, the second step comprises the desire and intention to use it. If this is strong enough, people will decide to use the innovation, which is the third stage of acceptance. Obviously, acceptance of stages 1 and 2 can be strong among people, even if they have never seen or tried the innovation. The permanent usage of stage 3 is only possible if the innovation already exists [27]. The acceptance of an innovation depends on its characteristics, the characteristics of the intended users, and their personal situations. The first determinant corresponds to the usefulness of the innovation. The more people perceive the innovation as useful, the more they are inclined to use it. The usage intention is highly correlated with the educational and social backgrounds of the users. The higher the level of education, the more easily the user spots the advantages that lead to acceptance. The more people in a social environment accept the innovation the more likely another person will do so too [14]. The widespread technology acceptance model [13, 28] incorporates these relations. People’s usage is explained by their intention to use it, which in turn is influenced by their attitude towards the innovation. People’s attitude is formed by their perception of the innovation’s usefulness and ease of use, while the latter also influences its perceived usefulness. The TAM is an approved model [29, 30], but is also criticized for its simplicity [31]. However, it possesses a very high explanatory power [15, 32, 33]. Therefore, we will also use the core of the TAM for our study. Though, as we want to learn about the motivational aspects among users and non-users, we will not use the Actual System Use construct. This is not really a restriction, as the behavioral intention to use an innovation has been proven to be a very good predictor for later usage [14, 49]. As a result, we use the constructs behavioral intention to use (BI), attitude (AT), perceived usefulness (PU) as well as perceived ease of use (PEOU), and hypothesize:

8

C. Gross et al.

H 1 : Attitude positively influences the behavioral intention to use SHT. H 2 : Perceived usefulness positively influences the behavioral intention to use SHT. H 3 : Perceived usefulness positively influences the attitude towards SHT. H 4 : Perceived ease of use positively influences the attitude towards SHT. H 5 : Perceived ease of use positively influences the perceived usefulness of SHT. There are various SHT applications that cannot be equalized. As mentioned above, lifestyle support services, energy consumption and management services, and security services are usually distinguished [4]. Lifestyle support services again comprise a set of very different services, like entertainment or ambient assisted living. These services are often targeted to certain specific groups like elderly people. Comfort and security are said to be the most important reasons for using SHT [11, 12, 34]. For a comprehensive overview and research examples, see [50]. This paper will therefore focus on services of energy management and security services. We will use two additional constructs. Perceived security (PS) is defined as the degree of improved security protection after using SHT, e.g. simulated presence, remote surveillance, fire warning, or intrusion detection. Perceived comfort (PF) is the degree of improved convenience from SHT. The comfort could rise from home automation features in heating, ventilation and air conditioning (HVAC), shading, and lighting. These services provide different advantages to their users, and therefore improve the usefulness of SHT. We hypothesize: H 6 : Perceived security positively influences the perceived usefulness of SHT. H 7 : Perceived comfort positively influences the perceived usefulness of SHT. SHT does not only provide benefits to users, but also bear several disadvantages. In particular, the costs of the various devices for building a smart home may keep people from buying SHT devices. There are initial installation costs, and (monthly) fees for the services provided by the manufacturers. Only a few studies have investigated the costs associated with SHT, and paint an ambiguous picture. While Bao et al. (2014) could not confirm the influence of costs [23], Park et al. (2018) could find a slight impact on usage intention [21]. We will also investigate the impact of perceived costs (PC) for SHT and hypothesize that: H 8 : Perceived costs negatively influence the behavioral intention to use SHT. The resulting research model is depicted in Fig. 1. To incorporate the different services into the model, all the constructs are measured formatively, except for perceived ease of use and perceived costs. The different services are then captured by the formative constructs’ items. A formative measurement bears several advantages and disadvantages, we’ll explicate later.

4 Analysis To test the research model, we conducted a survey between March and June 2018. The questionnaire consisted of 28 questions for the model (cf. Table 2, measured in a 5-point

The Acceptance of Smart Home Technology

9

Perceived Costs Perceived Security

H8(-)

H6(+)

Perceived Usefulness Perceived Comfort

H2(+)

Behavioral Intention to Use

H7(+) H5(+)

Perceived Ease Perceived Easeof of Use Use

H3(+)

H4(+)

H1(+)

Attitude Technology Acceptance Model

Fig. 1. Research model

Likert-Scale) and nine demographic questions. It was open to every German speaking person and therefore distributed online via Facebook and empirio. In addition, it was shared by manufacturers of smart home technology, Gira and Emansio. To supplement this process, a paper-based survey was done in the cities of Dortmund and Iserlohn. In total 327 participants (52% females, 45% males, 3% not specified) answered the questionnaire (71.3% online and 28.7% offline). 50.8% of the participants were between 20 and 29 years old, 4% were younger, 12.5% were in their thirties, 8.9% in their forties, 16.8% in their sixties, and 4.6% of the participants were older than sixty (2.4% not specified). 20.8% had an income of under e 1,000. 14.4% had an income between e 1,000 and e 2,000, 11.6% between e 2,000 and e 3,000, 9.8% earned between e 3,000 and e 4,000 and 15.3% lived on more than e 4,000 (28.1 not specified). Most of the participants (72.9%) had no experience with smart home technology. All the observations have less than 15% missing values [35] so that no observation had to be eliminated. With 327 samples, the sample size is beyond the sample size based on the number of arrows pointing to the latent variable constructs recommended by Chin for receiving stable results of the model estimation [36]. The considered structural equation model (SEM) consists of two elements: the measurement model, which specifies the relationship between the constructs and their indicators, and the structural model, in which the relations between the constructs will be analyzed [36, 37]. For the evaluation of the theoretical SEM, we used Smart PLS software Version 3.2.9 [38]. The software is based on the Partial Least Squares (PLS) algorithm, and used for variance-based analysis [35]. In contrast to covariance-based alternatives, such as LISREL, the sample size was not restricted, and it was not necessary to make assumptions about the distribution. This approach was chosen over other approaches, for a number of reasons. First, this study is exploratory, meaning that the influences of perceived security, perceived comfort, and perceived costs are not yet proven, and this research focuses on predicting a model for SHT acceptance. Second, Smart PLS is suitable for smaller sets, and does not require normal distribution, since it is a non-parametric method. Third,

10

C. Gross et al.

PLS-SEM is used to enhance the explanatory capacity of key target variables and their relationships in complex behavior research [39]. 4.1 Measurement Model In addition to the PLS algorithm, a bootstrapping of 5,000 samples was used for the determination of the significance of weights, loadings and path coefficients [35]. For missing values, case-wise replacement was applied. The program was set to 300 maximum iterations for calculating the PLS results. To assure that the maximum number of iterations is reached, the stop criterion is set to 10−7 . Within the measurement model, two kinds of constructs can be distinguished: reflective and formative constructs [40]. Our model consists of two reflective constructs: PEOU and PC, which implies that the construct affects the indicators. If there are poor results for a single indicator, elimination is possible and the algorithm could be recalculated. Our initial test results made no modification necessary; our loadings fit the model requirements. Moreover, our model uses five formative constructs PS, PF, PU, AT and BI. A formative measurement means that (all) indicators affect the construct, elimination is not possible. As a latent structure model, linkages between constructs are hypothesized, not directly observed. To examine the internal consistency for the reflective constructs, the convergence criterion, the discriminant validity, the indicator reliability and the predictive validity were examined [35, 41]. The results for the constructs PC (0.905) and PEOU (0.862) were greater than 0.7, confirming the internal consistency [42, 43]. Furthermore, the average variance extracted (AVE) exceeded the threshold of 0.5 (PC: 0.819, PF: 0.742), so it can be concluded that the convergent validity was confirmed [44]. The square root value of the AVE of each construct was greater than its correlation values with other Table 2. Results of the research model Construct

Item

Questionnaire item focus

Loadings/Weights

AVE/VIF

Perceived securityF (PS)

PS1

Time-controlled 0.440*** automatic roller shutters

PS2

Simulated presence

0.245*

1.384

PS3

Fire detection, emergency exit signals, etc.

0.040ns

1.559

PS4

Panic switch functions, e.g. Alarm etc.

0.226ns

1.791

PS5

Communicating with a door camera

0.262*

1.527

PS6

Roller shutdown in case 0.209* of a cullet

1.200

1.368

(continued)

The Acceptance of Smart Home Technology

11

Table 2. (continued) Construct

Item

Questionnaire item focus

Loadings/Weights

AVE/VIF

Perceived comfortF (PF)

PF1

Automatic lighting

0.184**

1.209

PF2

Automatic heating

0.624***

1.392

Humidity controllers

0.239**

1.364

PF4

Automatic roller shutter due to sun

0.365**

2.172

PF5

Automatic roller shutter −0.063ns (temperature)

2.285

PEOU1

I would learn to use SHT quickly

0.742

PEOU2

It is easy learning to use 0.757 SHT

PU1

SHT increases comfort

0.648***

1.401

SHT saves energy

0.211*

1.990

PU3

SHT reduces costs

0.105ns

1.842

PU4

SHT is good for the environment

−0.118ns

1.675

PU5

SHT increases security/safety

0.414***

1.285

AT1

Heating, ventilation, AC 0.311** (HVAC)

1.511

AT2

Lighting

0.330**

1.822

Providing shade

0.021ns

1.634

AT4

Smart security

0.608***

1.275

BI1

Heating, ventilation, AC 0.086ns (HVAC)

1.652

BI2

Lighting

0.369**

1.717

BI3

Providing shade

0.033ns

1.635

BI4

Smart security

0.711***

1.390

PC1

SHT causes a lot of extra Costs for me

0.882

0.819

PC2

SHT is expensive

0.928

PF3

Perceived ease of useR (PEOU)

Perceived usefulnessF (PU)

AttitudeF (AT)

PU2

AT3 Behavioral intention to useF (BI)

Perceived costsR (PC)

0.955

Significance of indicators: *p < 0.05, **p < 0.01, ***p < 0.001, R = reflective, F = formative

constructs, confirming discriminant validity, according to Fornell and Larcker’s criterion [44]. Moreover, the heterotrait-monotrait ratio of correlations (HTMT) value between

12

C. Gross et al.

PEOU and PC was 0.063, and does not exceed the threshold of 0.9 [45]. It can be concluded that discriminant validity has been established among all constructs. The bootstrapping results for the outer loadings revealed the suitability and relevance for the formative measurement model [46]. The composite reliability exceeded the threshold of 0.7, and the AVE exceeded the minimum of 0.5, so the convergent validity of the measurement model was proven. The significance of indicators was tested using the p-value, which must be below the known thresholds (0.1, 0.05, 0.01). The variance inflation factors (VIFs) were lower than threshold of 5, confirming the absence of multicollinearity problem [46]. First, the reflective constructs (PEOU, PC) were examined. The indicator reliability is below 1% significance level for all reflective constructs (see Table 2). The convergence criterion was also met, since the AVE for each construct was greater than 0.5, the composite reliability (CRPEOU = 0.851, CRPC = 0.901) was above 0.7, and Cronbach’s alpha (CAPEOU = 0.692, CAPC = 0.782) was almost above the critical level of 0.7. As stated above, the square root value of the AVE of each construct was greater than its correlation value with other constructs confirming discriminant validity according to Fornell and Larcker’s criterion. Table 3 shows that all loadings of the indicators were highest in the corresponding construct. Thus, the reflective constructs differ sufficiently from each other. The predictive validity was also fulfilled for each construct. Thus, a prediction of the constructs by their indicators was obtained. Table 3. Cross loadings Item

AT

BI

PF

PC

PEOU PS

PU

PEOU1 0.168 0.122 0.126 −0.051 0.955

0.116 0.239

PEOU2 0.044 0.031 0.103 −0.022 0.757

0.031 0.129

PC1

0.076 0.129 0.073 −0.882 0.063

0.134 0.080

PC2

0.117 0.163 0.059 −0.928 0.002

0.196 0.121

For formative constructs, the outer weights of eight indicators (PS3, PS4, AT3, PF5, BI1, BI3, PU3, PU4) were not significant. We conducted a significance test on the outer loadings. Since the p-Values were significant (p = 0.000) and there was no evidence of multicollinearity (VIF < 5), all indicators were sufficiently different and no indicator had to be eliminated. 4.2 Structural Model The standardized root mean square residual (SRMR) criterion was used to ensure the absence of misspecification in the model. SRMR assesses the differences between the actual correlation matrix (observed from the sample) and the expected one (predicted by the model). SRMR value is saturated at 0.071 and estimated to be 0.087, which is less than the threshold of 0.08 and indicates a good fit of the model [44]. The exogenous

The Acceptance of Smart Home Technology

13

variables moderate explain 43.2 to 47.3% (R2A = 0.473, R2BI = 0.33, R2PU = 0.432) of the total variance in AT, BI, and PU. A common method bias (CMB) test is essential, since endogenous and exogenous variables are collected together using one questionnaire [39]. For PLS-SEM, CMB is detected through a full collinearity assessment approach [47]. VIF values should be lower than the 3.3 threshold [35, 47]. Our VIF values confirm that the model is free from CMB. These results were confirmed by the Harman’s single factor test for reflective constructs and Pearson’s correlations matrix for the formative indicators. Figure 2 shows the hypotheses with their path coefficients, significance, and effect sizes. For each construct, the R2 and the predictive relevance Q2 are provided.

Perceived Costs Perceived Security

H8(-): 0.090ns

H6(+): 0.361

f2 = 0.014ns

f2 = 0.157

Perceived Usefulness

H7(+): 0.343

Perceived Comfort

H2(+):

0.105ns

f2 = 0.010ns

(R2 = 0.432, Q2 = 0.182)

Behavioral Intention to Use (R2 = 0.433, Q2 = 0.205)

f2 = 0.140

H5(+): 0.147 f2 = 0.037ns

H3(+): 0.691

H1(+): 0.565

f2 = 0.859

Perceived Ease Perceived H4(+): -0.013ns 2 ns Easeof of Use Use

f2 = 0.296

Attitude (R2 = 0.473, Q2 = 0.240)

f = 0.000

Technology Acceptance Model

Fig. 2. Results of the research model

5 Discussion This paper aimed to shed light on the extent to which security and comfort features of SHT contribute to its adoption by consumers. For this, it is one of the first studies to conceptualize PS and PF as formative constructs, and antecedents for the PU of SHT. As our results show (see Table 4), both factors could be proven to be significant parameters of the usefulness of SHT. Looking at the beneficial factors of PS and PF, fire warnings which have been part of several previous investigations [4, 6, 23] do not show a noteworthy impact in our case. The reason for this may be that most people rarely face fires, and rate this problem as low in importance. In particular, “overview over closed doors and windows” and “heating control” found great acceptance and showed a high impact, in line with previous research [10]. This is also underlined by the assessment of SHT for usefulness and attitude. While providing shade, saving costs, and environmental advantages did not play a role, comfort benefits and security were proven to have a great influence. The picture for comfort and security is ambiguous. The impact of the security in all TAM constructs is very high, while the assessment (about 60%) among people is quite low, when compared to comfort (about 80%). A reason for this may be that younger people,

14

C. Gross et al. Table 4. Overview of hypotheses and summary of results

Hypothesis

Result

H1 : Attitude positively influences the behavioral intention to use SHT

Supported

H2 : Perceived usefulness positively influences the behavioral intention to use SHT

Not supported

H3 : Perceived usefulness positively influences the attitude towards SHT

Supported

H4 : Perceived ease of use positively influences the attitude towards SHT

Not supported

H5 : Perceived ease of use positively influences the perceived usefulness of SHT

Supported

H6 : Perceived security positively influences the perceived usefulness of SHT

Supported

H7 : Perceived comfort positively influences the perceived usefulness of SH

Supported

H8 : Perceived costs negatively influence the behavioral intention to use SHT

Not supported

who accounted for the majority of the participants, ascribe less importance to the security aspect than elder people do. Therefore, while being of high significance, the acceptance of security functions and usage intention of security SHT might be lower. Another interesting result is that the costs have no impact on the usage intention which contrasts with [21], but is in line with [23]. SHT is perceived as pricey (>55% of the respondents agree or highly agree vs. Tg = I Tk > Tg , K

(5)

k=1

where   I Tk∗ > Tg =



1, if Tk∗ > Tg . 0, if Tk∗ ≤ Tg

(6)

The value of risk adjustment that the real effort will exceed the initial estimate can be determined as   ∗  K  ∗ k=1 Tk − Tg · Tk > Tg . (7) R=  ∗  K k=1 I Tk > Tg Now the initial fixed value Tg can be adjusted by taking into account the risk: RA = Tg + R.

(8)

The confidence intervals based risk aware estimate CI allows for fine-tuning risk tolerance by changing the desired level of confidence. The risk adjustment based risk aware estimate RA is probably more familiar and easier to use by the team members.

4 Experimental Evaluation Simulation studies are conducted to gain insights about properties of the risk aware estimation. The objective of the studies is to identify situations when the risk aware estimation yields significantly different results than the traditional planning poker estimation. The simulation procedure is organized as follows:

106

J. Grabis et al.

1. A sprint consists of 20 tasks and there are 7 team members; 2. Generate the traditional estimate for every task to be included in the sprint. Estimates assume effort estimation values as used in planning poker, i.e. (0.5, 1, 2, 3, 5, 8, 13, 20, 40, 100); 3. Generate individual team members’ estimates for every task; 4. Use the generated traditional estimates to obtain the overall effort estimate for the sprint; 5. Use the resampling procedure and the generated individual team members’ estimates to calculate CI and RA estimates for the sprint. The log-normal distribution is used to generate the traditional estimates (most of the tasks are small with a few larger tasks). Unfortunately, the simulation procedure does not resemble the team members negotiation process nor provides the actual effort. The individual team members’ estimates are generate using a probability distribution whose mean value (m) is set equal to the generated traditional estimate and standard deviation is a fraction of the mean (standard deviation is m divided by s). Therefore, the estimates are generated around the traditional estimate. The normal and lognormal distributions are considered and are referred as to agreement distributions. The former represents situation when team members are not inclined either optimistically or pessimistically. The latter represents situations when there are a few pessimistically-oriented team members inclined to provide large estimates. The variance characterizes the level of disagreement among the team members. 1000 simulation replications are performed for every experimental cell. The results of the experimental studies are distribution of the sprint effort estimate (Fig. 1) and values of the risk aware effort estimates (Table 1). If the team members have differing opinions (s is small) the effort distribution is wider than in the case of more similar judgments. That is amplified if the log-normal distribution is used model variability of team members’ judgments. The shape of the effort distribution visualizes variability of judgments in the team and their impact on the total effort needed to complete tasks considered for the sprint. Table 2 reports the experimental results. The risk aware effort estimates are obviously larger than the simulated traditional estimate. The difference is smaller (as shown by CI/T g ) in the case of normal distribution due to fewer pessimistic effort estimates. If variability of team members’ judgments is higher and pessimistic effort estimates are present (as in the case of using lognormal distribution), then the risk aware estimate is significantly higher than the traditional estimate. CI is significantly higher than RA because a high confidence level is required. The coefficient of variation CV shows that variability is higher for smaller m (tasks are smaller on average). That could be explained by low variability of values of planning poker for large tasks and the adjustment of the resampling producer might be required.

5 Empirical Evidence The risk aware estimation is applied using data accumulated in a real-life agile development project [20]. An e-commerce site is developed in the project. It is based on the

Is Team Always Right: Producing Risk Aware Effort Estimates

107

Fig. 1. Simulated distribution of effort estimates X according to the agreement distribution function a) normal and b) lognormal

Magento platform and the Vue.js framework is used for front-end development. The Scrum framework is used to guide development processes, and one of the authors of the paper served as a Scrum master for the project. The project was done in 2018 and its duration was 70 days. The development work was done in five sprints preceded by an initiation cycle and succeeded by final integration testing and hand-over. The core development team had four members and additional professionals involved for specific tasks. The sprints are planned during the preceding development cycle. All team members were experienced developers though they have not worked together as a team. The

108

J. Grabis et al.

Table 2. Effort estimation results according to characteristics of the agreement distribution function. T g = 150 if m = 5 and T g = 606 if m = 13. Distribution m s T¯ ∗

Stdev(T¯ ∗ ) CV

RA CI

CI/T g

Normal

5 1 174

52

0.30 201 281 1.88

Normal

5 2 152

28

0.18 171 208 1.39

Normal

5 3 148

21

0.14 164 191 1.27

Normal

13 1 523 117

0.22 670 762 1.26

Normal

13 2 539

94

0.17 647 731 1.21

Normal

13 3 537

80

0.15 620 700 1.15

Lognormal

5 1 382 124

0.33 387 637 4.25

Lognormal

5 2 313 107

0.34 325 533 3.55

Lognormal

5 3 268

94

0.35 284 461 3.07

Lognormal

13 1 615 142

0.23 724 905 1.49

Lognormal

13 2 579 144

0.25 715 875 1.44

Lognormal

13 3 558 138

0.25 699 842 1.39

Stdev(T¯ ∗ ) – standard deviation of T¯ ∗ , CV = Stdev(T¯ ∗ )/T¯ ∗

team members are involved in the planning process what includes effort estimation. The planning poker was used in the effort estimation process and estimates provided by individual team members were also recorded. The Jira project management system was used to keep the records. The team velocity is determined by considering previously recorded performance of the team members. The actual time spent on the work items are recorded. Retrospective meetings are organized after every sprint and prior to the next planning event. Inability to complete the whole scope and underestimation of the effort were quickly established as frequently observed problems. The estimated and actual effort measured in function points is reported in Table 3. Table 3. Comparison of actual effort with team and risk aware estimates in hours Sprint

1

2

3

4

Actual effort

389 390 261 374 101

Team estimate 204 216 174 304

5 94

CI

222 250 184 320 108

AR

212 232 179 315 101

The risk aware estimates CI and RA are also calculated (Table 3). Although the riskaware estimates are by 15% and 8%, respectively, more accurate than the team estimates (judged by the estimation mean square error), the underestimation is still substantial. Apparently, none of the team members has been able to apprehend complexity of the

Is Team Always Right: Producing Risk Aware Effort Estimates

109

tasks or other factors affecting development. The risk aware estimates are also calculated if a fifth artificial, pessimistic team member is added (this team member estimates every task to be twice as demanding as the most pessimistic estimated among the actual team members). In this case, the estimation accuracy is by 60% more accurate than the team estimates but such an ex post adjustment is problematic in practice. One can observe that the estimation accuracy improves from one iteration to another suggesting that the team has successfully adapted the estimation and development processes what should be expected in agile development.

6 Discussion and Conclusion The new simulation based effort estimation method has been developed. It allows to estimate the risk of underestimating development effort chiefly because of failing to account for all judgments expressed by the team members. The novelty of the method is supplying of additional estimation insights by reliance on the existing agile development processes and data. The method does not require additional information. It is not intended to provide point estimates of the development effort but rather than as a complementary tool available during sprint retrospectives to understand reasons of inaccurate estimation and to improve product development processes. The method is intended as a supplementary tool to facilitate team’s improvement and should be used jointly with other practices of improvement of agile development processes. If the actual effort was larger than the team estimate and smaller than the risk aware estimate, then the method signals underrepresentation of some team members’ judgment in the team estimate. That was actually observed in the empirical study that opinion of senior developers dominated the estimation process while the ability of junior developers greatly affected the actual performance. If the actual effort was larger than both the team estimate and the simulated estimate, then there are other factors affecting the estimation accuracy. At the current stage of the investigation, the simulation results show that the resampling procedure might require improvements. The empirical evidence accumulated currently is limited. In the case considered, clearly there are also other factors affecting the estimation accuracy. However, the Scrum master of the development team confirmed that the risk aware estimates are useful to understand causes of the underestimation and to lead discussions during the sprint retrospectives. More empirical data will be gathered in further research and these data also will be used to guide further elaboration of the resampling procedure.

References 1. Jorgensen, M., Sjoberg, D.I.K.: Impact of effort estimates on software project work. Inf. Softw. Technol. 43(15), 939–948 (2001) 2. Usman, M., Britto, R., Damm, L.-O., Börstler, J.: Effort estimation in large-scale software development: an industrial case study. Inf. Softw. Technol. 99, 21–40 (2018) 3. Børte, K., Ludvigsen, S.R., Mørch, A.I.: The role of social interaction in software effort estimation: unpacking the “magic step” between reasoning and decision-making. Inf. Softw. Technol. 54(9), 985–996 (2012)

110

J. Grabis et al.

4. Mahniˇc, V., Hovelja, T.: On using planning poker for estimating user stories. J. Syst. Softw. 85(9), 2086–2095 (2012) 5. Tanveer, B., Guzmán, L., Engel, U.M.: Effort estimation in agile software development: case study and improvement framework. J. Softw. Evol. Process 29(11), 1–14 (2017) 6. Usman, M., Mendes, E., Weidt, F., Britto, R.: Effort estimation in agile software development: a systematic literature review. In: 10th International Conference on Predictive Models in Software Engineering, PROMISE 2014, ACM International Conference Proceeding Series, pp. 82–91 (2014) 7. Haugen, N.C.: An empirical study of using planning poker for user story estimation. In: AGILE 2006 (AGILE 2006), Minneapolis, MN, pp. 9–34 (2006) 8. Zahraoui, H., Janati Idrissi, M.A.: Adjusting story points calculation in scrum effort and time estimation. In: 2015 10th International Conference on Intelligent Systems: Theories and Applications (SITA), Rabat, pp. 1–8 (2015) 9. Owais, M., Ramakishore, R.: Effort, duration and cost estimation in agile software development. In: 2016 Ninth International Conference on Contemporary Computing (IC3), Noida, pp. 1–5 (2016) 10. Hacaloglu, T., Demirors, O.: Challenges of using software size in agile software development: a systematic literature review. In: 2018 Academic Papers at IWSM Mensura, IWSM-Mensura 2018, Beijing, CEUR Workshop Proceedings, vol. 2207, pp. 109–122 11. Popli, R., Chauhan, N.: Research challenges of agile estimation. J. Intell. Comput. Appl. 7(1), 108–111 (2013) 12. Popli, R., Chauhan, N.: Managing uncertainty of story-points in agile software. In: 2nd International Conference on Computing for Sustainable Global Development, New Delhi, pp. 1357–1361 (2015) 13. Hannay, J.E., Benestad, H.C., Strand, K.: Agile uncertainty assessment: for benefit points and story points. IEEE Softw. 36(4), 50–62 (2019) 14. Ghane, K.: A model and system for applying Lean Six sigma to agile software development using hybrid simulation. In: IEEE International Technology Management Conference, Chicago, IL, pp. 1–4 (2014) 15. Dragicevic, S., Celar, S., Turic, M.: Bayesian network model for task effort estimation in agile software development. J. Syst. Softw. 127, 109–119 (2017) 16. Dagnino, A.: Estimating software-intensive projects in the absence of historical data. In: 35th International Conference on Software Engineering, ICSE 2013; San Francisco, pp. 941–950 (2013) 17. Štrba, R., Štolfa, J., Štolfa, S., Košinár, M.: Intelligent software support of the SCRUM process. Front. Artif. Intell. Appl. 272, 408–416 (2014) 18. Lenarduzzi, V., Lunesu, I., Matta, M., Taibi, D.: Functional size measures and effort estimation in agile development: a replicated study. In: Lassenius, C., Dingsøyr, T., Paasivaara, M. (eds.) XP 2015. LNBIP, vol. 212, pp. 105–116. Springer, Cham (2015). https://doi.org/10.1007/ 978-3-319-18612-2_9 19. Taibi, D., Lenarduzzi, V., Diebold, P., Lunesu, I.: Operationalizing the experience factory for effort estimation in agile processes. In: ACM International Conference Proceeding Series, pp. 31–40 (2017) 20. Popovs, R.: Velocity Calculation Methods in Scrum. Master Thesis, Riga Technical University (2019)

Design Decisions and Their Implications: An Ontology Quality Perspective Achim Reiz

and Kurt Sandkuhl(B)

Rostock University, 18051 Rostock, Germany {achim.reiz,kurt.sandkuhl}@uni-rostock.de

Abstract. Objective, reproducible and quantifiable measurements based on welldefined metrics are a widespread instrument for quality assurance in engineering disciplines and also in ontology engineering. Ontology metrics allow for the assessment of their quality and the comparison of different versions of the same ontology. We argue that such a comparison and especially the view on the evolutional evolvement bears valuable insights on the effect of explicit and implicit design decisions. This paper examines the use of quality metrics in the evolution of an ontology that is used in an image recognition context in the fashion domain. Overall, 51 incremental versions were analyzed using the OntoMetrics framework by Rostock University. Using 13 selected criteria, the evolution of the ontology is quantified and the effect of design decisions on the analyzed criteria is outlined. The critical assessment of ontology metrics is further used to uncover weak spots in the ontology. These weak spots enabled the deriving of improvement recommendations. Keywords: Ontology metrics · Ontology evaluation · Ontology quality

1 Introduction Image recognition as application of artificial intelligence increasingly is used in various contexts and enterprises. Tasks that previously required repetitive human labor can now be performed automatically. This motivated the use case underlying this research: Our industrial partner, a marketing company from the North of Germany, produces and distributes online videos in the fashion domain targeted towards end-consumers. Revenue is mostly created through the advertising of fashion items that are similar to the ones shown in the videos. But ads that are fitted to the specific contents of the videos can only be placed if the content is known. Previously, this was ensured by manual tagging. This approach requires a lot of expensive labor and offers a usage scenario for the implementation of an image recognition technology. The examination of moving pictures though comes with an additional challenge: running a recognition classifier on a picture takes many computational resources. Videos, consisting of many scenes with a lot of different possible pictures further increase computational requirements.

© Springer Nature Switzerland AG 2020 R. A. Buchmann et al. (Eds.): BIR 2020, LNBIP 398, pp. 111–127, 2020. https://doi.org/10.1007/978-3-030-61140-8_8

112

A. Reiz and K. Sandkuhl

The approach proposed by this research follows a new approach. Instead of running all available classifiers at once for each picture, the classification is performed based on an ontology defining a shared taxonomy augmented by semantic knowledge. This semantic support enables the exclusion of classifiers that do not fit the current situation (e.g. bathing clothes in an airplane). This paper examines the supporting ontology. Using the OntoMetrics tool developed at Rostock University [1], the history of the ontology is assessed by calculating different metrics for the 51 versions of the ontology and interpreting the development of the metrics in the light of design decisions. The main purpose of the work is to investigate if the development of metrics values over time can be a contribution to quality assurance of ontologies. Our conjecture is that interpretation of metrics development should be done in the light of explicit design decisions motivating evolution steps of the ontology. On long term, our aim is to contribute to quality assurance of ontologies by deriving recommendations for ontology quality improvements from metric development and design objectives. The paper is structured as follows: The next section is concerned with a detailed description of the mechanics of the newly developed recognition approach and the environmental conditions of the development of the augmentation ontology. Section 3 discusses the theoretical background from ontology quality assurance. Section 4 motivates the need for ontology metrics. Section 5 presents and interprets the metrics, including the inferring of recommendations, followed by a conclusion and an outlook on the next research steps.

2 Semantic Driven Image Recognition This section illustrates the mechanics of a semantically enhanced image detection process. It presents the orchestration of the services and further argues the advantages of the new approach. 2.1 Semantically Enhanced Image Recognition The new image recognition approach is developed in a cooperative project including three partners and a clear division of responsibilities. One partner, a research institute (RI) specializing in graphical data processing, is developing the neuronal net for the image classification; the chair for business information systems (BIS) at Rostock University develops the semantic engine. The resulting services from both partners are orchestrated and used by the industrial partner (IP) mentioned in the introduction. The detailed process works as follows (Fig. 1):

Design Decisions and Their Implications: An Ontology Quality Perspective

113

The process starts with a video upload by the IP on the servers of the RI. This enables them to run an initial detection of scene, gender and body area1 . The scene contains a probability of a background situation out of 392 possibilities. Examples for scenes are car_interrior, airport_terminal or ice_skating_rink_outdoor. The triple is returned to the IP and forwarded to the semantic API of BIS. Based on these scenes, the semantic service can perform the first filtering: While formal clothing like a suitjacket is possible in a car_interrior or airport_terminal, the occurrence of that clothing in an ice_skating_rink_outdoor is highly unlikely.

Fig. 1. The division of competencies of the three project partners

The scenes are not connected directly to the fashion elements, but via an additional aggregation-element called occasion. Examples for occasions are business with the sub-classes bluecolor-business, causal-business and formalbusiness or family (plus subclasses) or sports (plus subclasses). The occasions prevent the need to create connections from all 392 detectable scenes to the over 700 fashion items and reduce the complexity to a manageable amount (see Fig. 3). The body_area encodes where the item is worn, namely TOP for the upper part of the body, BOTTOM for the lower part or DRESS for items that cover the whole body. gender encodes the target group of the fashion items into male and female.

1 The ontology items are named exactly like in the ontology and are therefore not

grammatically correct in the context of the given sentences.

114

A. Reiz and K. Sandkuhl

Fig. 2. Excerpt of shared taxonomy for image recognition ontology

All project partners share a common taxonomy. An excerpt from it is given in Fig. 2. Based on the first result set out of the image recognition service, e.g. bodyarea=TOP, scene=car_interrior and gender=female, the semantic service can rule out all fashion items belonging to the LOWERLAYER. It just returns TOPLAYER and MIDLAYER to the requesting Actor. The IP subsequently can initiate a new refinement iteration without taking the items of the LOWERLAYER into account, thus saving computational resources. 2.2 Beyond Image Recognition Through Semantic Reasoning The ontology consists of two parts. The first part is aligned with the shared taxonomy of the image recognition classifiers of the RI and contains 63 items. For the rest of the paper, it is called Oimage . All of these items have an image recognition classifier counterpart. The training of image recognition classifiers is very labor-intensive and expensive, for every new classifier, large amounts of training data have to be collected. Creating new ontology classes in opposite does not require a data basis with large training sets. By building on the results of the image recognition service, further, more specific fashion items can be derived. For this refinement, the shared image recognition Oimage is complemented with a larger fashion knowledgebase containing 659 fashion items, later called Ofashion . The leaf of each item of the Oimage ontology contains a link towards the larger Ofashion knowledge base, thus providing further information even if there are no more available image recognition classifiers.

Design Decisions and Their Implications: An Ontology Quality Perspective

115

2.3 Development Circumstances of Ontology To interpret the development of the ontology, we argue that the environmental circumstances influence the ontology itself. The next section therefore enlightens the level of experience and turnover rate of the modeling staff as well as key design decisions. Especially these design decisions have a large influence on the design of the ontology and therefore also on its metrics. Development History. The initial ontology was created based on the fashion-brain project, an E.U. funded research endeavor for developing A.I. based marketing for the fashion industry [2]. One deliverable of this research was a rich taxonomy of fashion items. The taxonomy itself is provided in a javascript data representation and was converted towards an ontology serialization. This reuse effort resulted in the first part of the ontology Ofashion . In the first version, the newly created ontology consisted just of the class hierarchy. After the conversion, further attributes like brands, gender, and bodyarea were added. In the early stages, the development of the semantic fashion knowledge by BIS and the image recognition classifier by the RI was performed rather isolated. During this stage, both research partners focused on the development of fundamental technology. The adding of detectable scenes to the Ofashion ontology was the first step towards the alignment of the two services, first connected directly to the Ofashion elements, later connected through the aggregation property occasion. But while the scene detection is mandatory for further processing, we learned that the taxonomy and categorization of Ofashion does not represent and support the process of the image detection service. On the one hand, the number of classes exceeds the targeted number of image recognition classifiers, on the other hand, the structure of the conceptualization itself does not represent the iterative process of the image recognition service itself. Therefore, in a joint workshop with all project partners, Oimage was created and later connected to the already existing knowledge base. Figure 3 displays the different parts of the ontology and how they are connected. For better readability, some connected simple classes like gender and layer were omitted.

116

A. Reiz and K. Sandkuhl

Fig. 3. Reduced exemplary ontology query.

The Development Team. Over the lifetime of the ontology, 3 different knowledge engineers worked on the fashion ontology. The first one created the initial model Ofashion , including the gender, body area, and the connections with detectable scenes. The second knowledge engineer served only a short period in an interim position. The third ontology engineer performed the alignment with the image recognition service, including the creation of Oimage . All ontology engineers were supported by a student research assistant. Neither of the modelers had significant ontology modeling experience or a formal ontology modeling training.

3 Theoretical Background The background for our work primarily stems from ontology quality assurance, but our work also is related to ontology evolution [18]. Work in the area of quality assurance for ontologies includes different perspectives, such as the quality of the ontology as such, the quality of the process of ontology construction, and tools supporting the ontology engineer in achieving high quality. Quality assessment of ontologies as such has been subject of many research activities [4], but the quality criteria vary considerably between different approaches and often address structural, logical, and computational aspects of ontologies. Furthermore, metrics originating from software quality evaluation have been investigated [14] and specific metrics for ontologies were proposed [7, 8]. Many metrics lack sufficient empirical validation, i.e. what metrics value can be considered as “good” or as “bad” often has not been defined due to an insufficient number of reported applications.

Design Decisions and Their Implications: An Ontology Quality Perspective

117

Evaluation of the accuracy of ontology content, i.e. suitability and conformance with the domain to be represented, can be performed using a gold standard. In this context similarity metrics, as proposed for example by [15], are used to measure the deviation from the gold standard. These approaches are criticized for mainly using structural graph similarity and for not taking into account the semantics of class definitions or that different kinds of deviations should be weighted differently. Furthermore, a (single) gold standard often is difficult to develop due to a very limited number of experts available in the domain. Metrics measure the quality of one or more attributes in an ontology [3] and therefore in a combination with each other also the quality of the ontology itself. Applied during engineering and application, they are a necessary precondition for quality assurance and improvements [4]. But ontologies themselves are no static artifacts, but dynamic in their nature. The measurement of an ontological knowledge representation does merely capture a snapshot of a constantly changing item to a given time. As ontologies are evolving, so are the corresponding metrics. The measurement of ontology quality during every change can give insights into the improvements and alterations that are made [5]. The use of metrics for quality assurance has also been investigated for different purposes, as for example in the context of domain ontologies [17] or for ontology design patterns [19]. Furthermore, approaches were proposed for evaluating “ontologies in use”, i.e. to evaluate the fitness for a task to be performed with an ontology in a defined scenario. An ontology of high quality “helps the application in question produce good results on the given task” [16]. However, it is difficult to generalize the results from such approaches, since they can hardly capture all aspects potentially relevant.

4 Evaluation Toolset and Metrics Definitions As stated in Sect. 2, the developers of the ontology were inexperienced and had a high turnover rate. Also, the use cases to be supported by the ontology were subject of several changes in direction. These unstable conditions threaten the goal of developing a highquality ontology and amplify the need for quality measurement and assurance. This paper takes a retrospective look at the development of the ontology quality attributes of the fashion knowledge base. In total, 51 storage points were analyzed. During the first 13 versions, a storage management system was not yet established. It might be therefore possible that major updates are pooled into one snapshot. After the first 13 versions, the git versioning system was used for change tracking and a new version was created for every revision.

118

A. Reiz and K. Sandkuhl

The metrics themselves were calculated using the OntoMetrics platform by Rostock University [6]. For processing, the files were uploaded using the web-interface and exported in an XML-representation. This XML-serialization allowed the import in a spreadsheet software for further analysis and data visualization. This online tool offers a total of 81 metrics in five categories. These metrics are mostly based on the framework proposed by Gangemi et al. and Tartir et al. [7, 8]. Out of these 81 available metrics, 18 were selected and evaluated in detail. Table 1 explains the underlying calculations of the analyzed metrics. For better understandability, the heterogeneous styles of the metric definitions in the literature were aligned and simplified.

5 Analysis of Ontology Metrics The following section analyzes the development of the ontology metrics. The first part is concerned with a general interpretation of the metrics, especially the influence of design decisions on measurement developments. The latter derives current shortcomings and improvement recommendations. When describing the different developments, we also highlight relevant design decisions (DD). 5.1 General Interpretation of Metrics The Development of Class Count vs. Axioms Count. The first analysis is concerned with the development of class counts compared to the number of axioms in general. The class count remains rather stable: It starts with the conversion of the fashion-brain taxonomy [2], including an amount of 673 classes. While a few describing classes like size, brands, or gender were added, the class count remained rather stable up to version 17. The axioms though show that many items were linked with each other, enriching the ontology. In version 17, the scenes were introduced (DD-2), increasing the class count from 749 to 1110 classes. At first, it was tried to link the scenes directly with the fashion objects. Due to the high amount of fashion and scenes classes, this soon revealed itself as too complex and inflexible for future maintenance. Therefore, in version 22, occasions were introduced as an intermediate class between fashion and scenes (DD-3). Direct links between scenes and fashion items became obsolete and were removed, resulting in the sudden drop of axioms. Over the following time, the class count remained rather stable with just a little spike in version 39, when the shared sub ontology Oimage was introduced. This as a result also led to new links between Oimage , Ofashion and scenes, resulting in a strong increase in axioms (Fig. 4).

Design Decisions and Their Implications: An Ontology Quality Perspective

119

Table 1. Evaluated metrics Metric

Explanation

Axiom

Any type of statement in the ontology. E.g. definitions for classes, properties, datatypes, individuals, etc.

Class

Abstract concepts. Can contain individuals or other classes in the form of “isa” hierarchy

Individual

Instance of a class

Object property

Link between classes (and in result individuals). It can further be described using characteristics like (inverse-) functional, transitive, (a-)symmetric, or (ir-)reflexive

Data property

Relation between individuals and data values (literals)

Annotation property

Additional non-structured, textual information for ontology items (e.g. classes, data properties, object properties, etc.)

Class/relation ratio

The proportion of classes compared to the number of relations (relations are e.g. object properties, equivalent classes, disjoint classes, etc.) number of classes

class relation ratio = number of relations Inheritance richness

Measures the ratio of sub-class relations. The higher, the more horizontal an ontology is, with a coverage of more general knowledge. relationships inheritance richness = inheritance number of classes

Average depth

Measures the depth of the path compared to the number of paths. It only considers “isa” relations.  depth of pathj average depth = count of1 paths j

Average breadth

Measures the number of levels in a graph compared to the number of classes in these levels. Only “isa” arcs are considered.  number of Classes in levelj average breadth = count of1 levels

Tangledness

Ratio of classes that inherit from more than one upper class.

j

number of classes

tangledness = number of classes with multiple inheritance Relationship richness Measures diversity of relationship definitions compared to the definition of sub-class relations. The number of non-inheritance relationships is limited to the definitions of object properties, equivalent and disjoint classes definitions, not their usage as relations between classes relationship richness = Expressivity

Number of non−inheritance relationships total number of relationships

The currently supported and used expressivity for the statements within the ontology

120

A. Reiz and K. Sandkuhl

6000 5000

Count

4000 3000 2000 1000 0 1 3 5 7 9 111315171921232527293133353739414345474951

Versions Axioms

Classes

Fig. 4. Count of axioms and classes

Class/JRelaon Rao

The developments depicted above are also confirmed by the class/relation ratio shown in Fig. 5. With increasing maturity of the approach, each class gets more relations, is therefore enriched by linked attributes like gender or scenes. It also confirms that the relations are the main driver for the increasing axioms.

1.2 1 0.8 0.6 0.4 0.2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Versions Fig. 5. Class/relation ratio

Design Decisions and Their Implications: An Ontology Quality Perspective

121

The Changing Structure of the Ontology. A large impact on the structure of the ontology had the introduction of scenes (DD-2). At first, they were stored without any subhierarchies. One single scene class stored in the root of the ontology contained 273 situations. This resulted in a sudden drop in average depth and an increase in the average breadth (Fig. 6). Later in the history of the ontology, these scenes were sorted into categories (DD-4) like inside, outside, hotel, home, etc., explaining the drop of the average breadth in version 25.

12 10 8 6 4 2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Versions Inheritance richness:

Average depth:

Average breadth:

Fig. 6. Inheritance richness, average depth & breadth

The relationship between the average depth and breadth (Fig. 6) on version 39, at the introduction of Oimage , shows that this newly added sub-ontology has a very similar depth than the previous versions – but lowered the wide breadth effect of the scenes due to its distinctive hierarchy. The increasing inheritance richness in the opposite is not mainly driven by the hierarchical attributes of the ontology, even though the name would suggest otherwise. This measurement is instead determined by the growing amount of object property relations between the fashion items and describing attributes. Figure 7 shows a screenshot of the tool Protégé of the Class MINISKIRT. The object properties are treated as subclasses, therefore directly impacting Fig. 7. Object-property relations are the inheritance richness metric. This is also reflected in the tangledness (Fig. 8). stored as sub-classes As long as the ontology is used as a mere taxonomy, the tangledness measure remains zero. As soon as object properties are linked, they are counted as multi-hierarchical relationships, therefore increasing the tangledness measurement. Even though the name suggests otherwise,

122

A. Reiz and K. Sandkuhl

the increasing use of object properties does not have a positive impact on the relationship richness. This metric measures the variance of object properties compared to the number of inheritance relationships. It measures therefore not the usage of various relationships, but their mere existence and remains low, decreasing with an increase of isa relations.

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Versions Tangledness:

Relaonship Richness:

Fig. 8. Relationship richness & tangledness

Evolving Expressivity. Not only the metrics experienced a constant change, but the tool Protégé also constantly adapted the underlying description logic of the ontology code (Fig. 9). At first, the expressivity is set to (AL) (attribute language). It is the minimal expressivity, allowing the definition of (class) hierarchies, conjunctions, atomic negations, and value restrictions. The adding of a H in the next version indicates the existence of role hierarchies, (D) postulates the usage of data properties. Shortly after, the expressivity is extended with inverse roles I, complex negation C, and functionality F. In version 14, a large expansion in expressivity is carried out with the introduction of S (DD1). It contains AL’s functionality, and additionally transitivity, disjunction, existential restrictions, and complex negation (C). After the expanding complexity towards S, it remains rather stable. With little fluctuations regarding the role hierarchies (H), inverses (I), data properties (D), and complex role inclusion (R) [9, 10].

5.2 Identifying Shortcomings So far the influence of design decisions on ontology metrics was discussed. It is further argued, that these metrics can be used for deriving modeling recommendations by identifying weak spots and underdeveloped areas. Obsolete Individuals. The classes and their relations through object properties developed steadily towards a more meaningful data collection. But through the data analysis,

Design Decisions and Their Implications: An Ontology Quality Perspective

ALH(D)

ALCHI(D)

AL

123

SHI(D)

S

SR(D)

ALCH(D) ALCHIF(D) 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

Versions AL

ALH(D)

ALCHIF(D)

ALCH(D)

ALCHI(D)

SHI(D)

S

SR(D)

Fig. 9. The description logic-expressivity of the ontology versions

we discovered that the individuals lack that kind of development. While the first 5 versions contained 671 individuals, they were just a side product of the conversion of the fashion-brain taxonomy towards the ontology, identical to the classes and deleted shortly after. Since version 22, a small proportion of individuals remained. 51 of the 119 individuals are instances of the class brands, the rest are scattered instances throughout the whole ontology without a fixed pattern. Currently, the ontology is merely used as a factual database about the classes. The use of individuals is currently not planned. The 119 individuals currently decrease the understandability of this ontology and can be marked as obsolete. Missing Disjointness Axioms. Further, Fig. 10 shows the lack of the development of disjoint classes. Even though this has not an instant negative effect, it can lead to further problems if the ontology is enriched with individuals. An example can be found in the classes UNDERWEAR and JEANSPANT. As these are not marked as disjoint, it is possible to add an individual that is part of both. In that case, a reasoner would mark both items as equivalent [11]. Data Properties. Data properties offer a way to encode further knowledge into individuals by using literals. These literals can either be limited to data ranges, to XML-data types, or not limited to a certain type at all [12]. Data properties in the given ontology share a similar history like individuals – the use was planned in the initial phase of the ontology development for the encoding of color, gender, material, and size, but later disregarded. Due to the focus on the concepts rather than instantiation through individuals, the data properties do not offer a useful way to encode further information and can therefore be deleted. Annotation Properties. The number of annotations per class can influence the understandability of ontologies [13]. As we can see in Fig. 11 (b), the count of annotations remains rather low. The first peak is motivated by relicts of the taxonomy-conversion and does not convey any useful information. The adding of annotations to the ontology was not in focus during development but could increase the comprehensibility for the next modeler.

124

A. Reiz and K. Sandkuhl 1400 1200

Count

1000 800 600 400 200 0 1

4

7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Versions Classes

Individuals

Disjoint classes:

Fig. 10. Individuals lack behind the development of the classes

Object Property Enrichment. With the latest version, there are 8 object properties, a number that has remained rather stable over the development of the ontology. Conversely, and in opposition to the data property and individuals count, this stability here does not indicate a stop on development. As shown in Fig. 12, the domain and range attributes are commonly used for the object properties. Inverse relations in opposite are less frequently used for encoding. In the specific case of this ontology, the object properties consist of one “isLinkableWith” relation, the rest of them are “has” relations. In the context of this ontology, it is argued that the encoding of inverse relations does not significantly enrich the factual knowledge of the ontology.

800

25 20 15 10 5 0

600 400 200 0 1 5 9 13 17 21 25 29 33 37 41 45 49

1 5 9 13 17 212529333741 45 49

versions

Versions

Data properes

Annotaon asserons

Fig. 11. Count of data properties (a) and annotation properties (b)

A larger problem though can be identified by the metrics that describe the characteristics of the object properties. All measurements concerning the characteristics like disjointness, equality, (inverse) functionality, (a-)symmetry, (ir-)reflexivity, transitivity

Design Decisions and Their Implications: An Ontology Quality Perspective

125

10 5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 Object Properes

Range Object Properes

Domain Object Properes

Inverse Object Properes

Fig. 12. Object properties and their attributes

drop to zero in the later versions. Further enrichment of these relationships would increase the semantic precision of the ontology.

6 Conclusion and Future Work Ontologies enable the sharing of knowledge throughout various digital or human actors, thus creating a knowledge base with a common terminology. But designing ontologies is far from a trivial task, and the quality assurance is an integral part of the ontology design process. This paper presented a quality assessment based on the OntoMetrics tool of Rostock University for an ontology in the fashion domain that is used for the augmentation of image recognition technologies. It was shown that each design decision has a distinctive impact on the measured ontology metrics. The evolving characteristic of an ontology can be measured and uncover current and past development hotspots. However, when development of the metrics’ values are analyzed, this should be done in the light of the design decisions made. Further, it is shown that the metrics also unveil shortcomings. As seen in the analysis, especially obsolete items show a missing evolvent over time and can be identified with a rigorous analysis. A disproportion of related measurements also enables the identification of shortcomings. An example in this analysis can be found in the development of relations through object property assertions. This should also trigger an increase of logical object descriptions, e.g. inverse, symmetric, transitive. The missing evolvement of these properties unfolds their need for improvement. The interpretation of these metrics though is at times ambiguous and does not match the intuitive perceived meaning. Examples can be found in the inheritance richness, that is not mainly driven by hierarchical relationships but the distribution of object property assertions or the relationship richness, that does not reflect the variance of object properties itself, but the variance of the object properties in comparison to the number of inheritance relations. That motivates further research in this area. Even though the calculation of evolutional ontology metrics is a sound foundation for the interpretation of the quality

126

A. Reiz and K. Sandkuhl

of an ontology, these metrics need a rigorous interpretation. Further research should therefore consider the simplification of these metrics into easily understandable quality attributes, to enable the automatic rating of ontologies without the need for complex human interpretation.

References 1. Lantow, B.: OntoMetrics: putting metrics into use for ontology evaluation. In: Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. Porto, Portugal, 9 November 2016–11 November 2016, pp. 186– 191. SCITEPRESS, [S. l.] (2016). https://doi.org/10.5220/0006084601860191 2. Checco, A., et al.: FashionBrain project: a vision for understanding Europe’s fashion data universe. http://arxiv.org/pdf/1710.09788v1 (2017) 3. Lourdusamy, R., John, A.: A review on metrics for ontology evaluation. In: 2018 2nd International Conference on Inventive Systems and Control (ICISC). 2018 2nd International Conference on Inventive Systems and Control (ICISC), Coimbatore, 19 January 2018–20 January 2018, pp. 1415–1421. IEEE (2018) 4. Vrandeˇci´c, D., Sure, Y.: How to design better ontology metrics. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 311–325. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72667-8_23 5. Djedidi, R., Aufaure, M.-A.: ONTO-EVO A L an ontology evolution approach guided by pattern modeling and quality evaluation. In: Link, S., Prade, H. (eds.) FoIKS 2010. LNCS, vol. 5956, pp. 286–305. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-118296_19 6. Lantow, B., Sandkuhl, K.: An analysis of applicability using quality metrics for ontologies on ontology design patterns. Intell. Syst. Account. Finance Manage. (2015). https://doi.org/ 10.1002/isaf.1360 7. Tartir, S., Arpinar, B., Moore, M., Sheth, A.P., Aleman-Meza, B.: OntoQA: metric-based ontology quality analysis. In: IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources, Houston, 27 November 2005 (2005) 8. Gangemi, A., Catena, C., Ciaramita, M., Lehmann, J.: A theoretical framework for ontology evaluation and validation. In: Bouquet, P., Tummarello, G. (eds.) Semantic Web Applications and Perspectives. SWAP 2005, Italy, 14 December 2005–16 December 2005. CEUR (2005) 9. Baader, F., Horrocks, I., Lutz, C., Sattler, U. (eds.): An Introduction to Description Logic. Cambridge University Press, Cambridge (2017) 10. Baader, F., Horrocks, I., Lutz, C., Sattler, U.: Description logic terminology. In: Baader, F., Horrocks, I., Lutz, C., Sattler, U. (eds.) An Introduction to Description Logic, pp. 228–233. Cambridge University Press, Cambridge (2017) 11. Allemang, D., Hendler, J.: Counting and sets in OWL. In: Allemang, D., Hendler, J.A. (eds.) Semantic Web for the Working Ontologist. Modeling in RDF, RDFS and OWL, 2nd edn., pp. 249–278. Morgan Kaufmann Publishers/Elsevier, Amsterdam, Boston (2012) 12. Antoniou, G., van Harmelen, F.: A semantic Web primer, 2nd edn. Cooperative information systems. MIT, Cambridge, Mass., London (2008) 13. McDaniel, M., Storey, V.C.: Domain modeling for the semantic web: assessing the pragmatics of ontologies. In: CEUR Workshop Proceedings 1979 (2017) 14. Duque-Ramos, A., Fernandez-Breis, J.T., Stevens, R., Aussenac-Gilles, N.: OQuaRE: a SQuaRE-based approach for evaluating the quality of ontologies. J. Res. Pract. Inf. Technol. 43, 159 (2011)

Design Decisions and Their Implications: An Ontology Quality Perspective

127

15. Maedche, A., Staab, S.: Measuring similarity between ontologies. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 251–263. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45810-7_24 16. Brank, J., Grobelnik, M., Mladeni´c, D.: A survey of ontology evaluation techniques. In: Proceedings of the Conference on Data Mining and Data Warehouses (SIGKDD 2005) (2005) 17. McDaniel, M., Storey, V.C., Sugumaran, V.: Assessing the quality of domain ontologies: metrics and an automated ranking system. Data Knowl. Eng. 115, 32–47 (2018) 18. Cardoso, S.D., et al.: Leveraging the impact of ontology evolution on semantic annotations. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 68–82. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49004-5_5 19. Hammar, K.: Content Ontology Design Patterns: Qualities, Methods, and Tools, vol. 1879. Linköping University Electronic Press, Linköping (2017)

Service Dependency Graph Analysis in Microservice Architecture Edgars Gaidels(B)

and Marite Kirikova

Institute of Applied Computer Systems, Riga Technical University, Riga, Latvia [email protected], [email protected]

Abstract. Since microservices have become popular, we see increasingly complex service architectures emerging. Microservice systems, while being well scalable, flexible, and highly available, do not tackle the reliability aspects fully. While the container orchestration solutions like Kubernetes contribute to the fault tolerance against the network or infrastructural failures, the probability of the mistake in the architectural design still exists. Several graph-based anomaly detection approaches could be employed to proactively detect critical nodes and design anti-patterns. The paper, on a simplified example of the microservices-based system, demonstrates the application of graph algorithms that provide direction in assessing the quality of the underlying architecture. Keywords: Microservices · SOA · MSA · Service dependency graph · Neo4j

1 Introduction Microservice architecture (MSA) appeared recently as a new paradigm for programming applications employing the composition of small services, each running its processes and communicating via lightweight mechanisms. This approach has been built on top of the concepts of service-oriented architecture (SOA), inheriting principles of the domaindriven design, service modularity, and object-oriented programming by programming from the large to the small [1]. MSA is now a new trend in distributed software development taking modularity to the next level by making services completely independent in development and deployment, emphasizing loose coupling and high cohesion. This approach delivers many benefits, such as maintainability and scalability [2]. It also inherits a set of challenges from SOA and distributed systems in general [3]. Additionally, the freedom in development processes has led to an undesired side-effect of the microservices explosion. One of the extreme examples could be JD.com, the world’s third and China’s largest ecommerce site. JD provides 8000 applications and approximately 34,000 microservices running on the cluster of 500,000 containers, that support over 250 billions of calls daily [3]. Like in every complex distributed system, there is a higher chance for a network, hardware or application-level failure. Fine-grained entity services now are becoming coupled to tens or hundreds of other services. Therefore, the failure of a single service © Springer Nature Switzerland AG 2020 R. A. Buchmann et al. (Eds.): BIR 2020, LNBIP 398, pp. 128–139, 2020. https://doi.org/10.1007/978-3-030-61140-8_9

Service Dependency Graph Analysis in Microservice Architecture

129

increases the risk of triggering a cascading effect, bringing down several processes and making the system unavailable. On such a large scale the conventional governance and monitoring strategies might not be sustainable anymore. Given the increasing number of services, occurrences of failure become more commonplace and, unfortunately, most likely to arise in the production setting. There is no unified framework for analyzing large-scale systems to detect the weak spots in the architecture proactively. That requires organizations to come up with novel approaches for coping with the microservices on a scale. In other related domains, such as network analysis, graph-based methods are used for analyzing the architectures. With the purpose to test whether such methods can be applied for MSA, the research question addressed in this paper is: “What are the graphbased methods that can be useful in MSA analysis?”. For answering this question, the experimental design method was applied. The authors of this paper designed and developed a simplified MSA application for experimental purposes. Different graph algorithms were applied, and their benefit in MSA analysis assessed. The paper is organized as follows. In Sect. 2, the peculiarities of MSA are discussed. In Sect. 3, the potential of graph algorithms is explored. In Sect. 4, the experiments with sample architecture are carried out. Section 5 consists of brief conclusions.

2 The Challenges of MSA MSA gained popularity recently and can be considered as the next iteration of SOA [4]. MSA tries to tackle the unnecessary complexities of SOA in order to improve several aspects, such as flexibility, interoperability, independence, and scalability [5]. MSA splits an application into small components and narrows down their domains; loosens interdependency through the formal interface between microservices; allows teams to manage their own development pace independently, thereby improving the code quality [6]. Lastly, when services are separated, each of them could be scaled most appropriately [7]. The MSA approach in the system design also has its challenges. From the sources discussed above, we derived the following limitations of MSA: 1. Debugging of the overall system becomes more difficult. 2. The blurring of technological expertise caused by heterogeneous interoperability and the lack of governance in the technology selection. 3. Decentralized data management. Heterogeneity in the data representation, persistence, and processing mechanisms. 4. Duplication of functionality, data, rules, and processing algorithms. 5. Inconsistency of life cycles of individual components of the system. 6. The immaturity of modern approaches to service supervision and the exchange of teams and data in the large enterprise. Given the factors mentioned above, significant concern about the predictability of the RAS (Reliability, Availability, Serviceability) metrics arises. It is possible to improve the reliability of individual components by decoupling the entire system into microservices

130

E. Gaidels and M. Kirikova

and having more control over the DevOps and CI/CD processes but at the cost of the overall system’s reliability [8]. In the large-scale architectures, a large number of coupled services might amplify the effect. Therefore, it is crucial to have means for analyzing the MSA and finding the critical nodes in the entire system. As a starting point, it is important to understand which existing technologies support service dependency detection and their visualization in MSA.

3 Exploring Service Dependency Graphs A variety of tools exist for visualizing the microservices and their dependencies. Usually, they represent MSA in a simplified service dependency graph (SDG) view that visualizes services as nodes and their dependencies as edges. In some cases, SDG could also be dynamic if it is built on top of the distributed tracing technologies. Tracing data is used in various application performance management (APM) tools, some of which also provide service map functionality out of the box. We performed an analysis of the available solutions to find the tools for large-scale graph modeling. At this moment (May 2020), several solutions such as AWS X-Ray, Datadog, Elastic APM, NewRelic provide the most advanced features. However, they are license-based, therefore increase the operational costs of the IT. Spekt8 is an open-source visualization tool that automatically builds logical topologies of the application and infrastructure after deployment to the Kubernetes cluster. Another solution worth mentioning is Vizceral, provided in the Netflix OSS suite. It generates SDG and delivers a real-time visualization of the traffic flow within the system. This view provides a split-second sense of the state of the system, by knowing which engineer can intuitively detect anomalies in the services and their traffic behavior. Visceral is a common example of the tool used in the Intuition Engineering discipline that Netflix actively promotes. Successful application of graph modeling and the outcoming benefits have spawned an entire graph technology landscape, including graph-oriented communities and professional conferences. One of the most popular technologies nowadays is Neo4j – an open-source graph DBMS used not only for the data persistence purposes but also as a visualization tool for organizing data into complex ontologies in areas such as visual analytics, knowledge graph solutions, fraud detection, cybersecurity, machine learning, human resources, natural language processing, and social network analytics [9–11]. Neo4j contains a wide variety of extensions and libraries that allow developers to analyze graph structures. In comparison to the purely intuitive understanding of the graph that the tools mentioned above provide, we see an opportunity in the application of Neo4j for obtaining quantitative information about the SDG structure. Several attempts in this area were made earlier. In 2015, Neo4j published an article on analyzing microservice application pools [12]. Despite not being an academic publication, the article still demonstrated custom visualization possibilities together with the techniques for the detection of the single point of failure among the services hosted on the common virtual server. To move forward the application of Neo4j for anomaly detection in MSA, the authors of this paper propose a data-driven solution that uses centrality and community detection graph algorithms.

Service Dependency Graph Analysis in Microservice Architecture

131

Graph algorithms use the relationships between nodes to infer the organization and dynamics of complex systems, allowing them to uncover hidden information, test hypotheses, and make predictions about their behavior. Several papers address the application of the graph algorithms in the system analysis context. The survey developed in 2014 [13] highlights four main reasons that make graph-based approaches to anomaly detection vital and necessary: the interdependent nature of the data, powerful representation, relational nature of problem domains, and robust machinery. There are different sets of graph-based techniques that are applied to static and dynamic graphs. In this paper, we focus on the static graph analysis, as a representation of the defined system architecture. Static graph data is divided into plain graphs and attributed graphs. “An attributed graph is a graph where nodes or edges have features associated with them, … while the relational links may have various strengths, types, and frequency. A plain graph, on the other hand, consists of only nodes and edges among those nodes, i.e., the graph structure” [13]. The problem of the static graph anomaly detection could be formulated as an analysis of the graph snapshot in order to find nodes and edges and their substructures that are “few and different” or deviate significantly from the patterns observed in the graph [13]. Several papers address the problem of the static graphs, more specifically, plain graph anomaly detection. In [14], authors are mainly focused on the cyclic dependency problem between microservices, describing it as a “potential design error, which may lead to resource dependency and competition and even cause unlimited service calls to make the system function improperly or even crash.” The ability to detect the cyclic dependencies in SDG is a vital reliability requirement of developing MSA. In this paper, we will build a plain graph using Neo4j and use supported graph algorithms for anomaly detection. Authors of [17] are primarily dividing graph algorithms into three types: pathfinding, centrality, and community detection. Pathfinding algorithms are not used in this paper since the graph structure will be defined and represented using plain SDG. In this work, authors are going to focus on centrality and community detection algorithms, using related metrics and applying them to the MSA. For describing algorithms in more detail, information from [16, 17] will be used. Centrality Algorithms Centrality algorithms are chosen because they identify the most critical nodes and help to understand properties such as credibility, accessibility, the speed at which things spread, and the bridges between groups. Table 1 describes what each centrality algorithm does, based on findings in [16, 17]. Community Detection Algorithms Community detection algorithms look at the groups and partitions; and are used to produce network visualization for the general inspection. Based on [16, 17], Table 2 summarizes what each community detection algorithm does.

132

E. Gaidels and M. Kirikova Table 1. Overview of centrality algorithms.

Algorithm type

What it does

Degree centrality

Measures the number of direct relationships a node has

Betweenness centrality

Measures the number of shortest paths passing through a node; influence over the flow of information in a graph

Closeness centrality

Calculates which nodes have the shortest paths to all other nodes

Harmonic centrality

A variant of closeness centrality optimized for dealing with unconnected graphs

Eigenvector

Measures the transitive influence or connectivity of nodes

Page Rank

Measures the transitive influence or connectivity of nodes

Article Rank

A variant of the Page Rank algorithm

Table 2. Overview of community detection algorithms. Algorithm type

What it does

Clustering coefficient, Triangle count

Measures how many nodes form triangles and the degree to which nodes tend to cluster together

Weakly connected components, Strongly connected components

Finds groups where each node is reachable from every other node in that same group following or regardless of the direction of relationships

Label propagation

Infers clusters by spreading labels based on neighborhood majorities

Louvain modularity

Maximizes the presumed accuracy of grouping by comparing relationship weights and densities to a defined estimate or average. Finds clusters by moving nodes into higher relationship density groups and aggregating them into super communities

4 The Experiment In October 2019, Watt [18] demonstrated the application of graph theory for exploring MSA at the GOTO Berlin conference. The presentation contained a theoretical model of a system consisting of 20 microservices, to which only three algorithms were applied: degree centrality, cluster coefficient, and Louvain. Inspired by this approach we have employed a design of the online banking application. The application consists of 3 frontend services, 24 core backend services, each responsible for bounding a particular business function, e.g., Account, Statistics, Notifications; 10 integration adapter services, and 24 external APIs.

Service Dependency Graph Analysis in Microservice Architecture

133

It is assumed that the polyglot persistence strategy was employed, and data is interchanged between services only through the service APIs. For simplicity and better visibility, infrastructure supporting services such as service discovery, load balancers, and circuit breakers are not represented in the figures and tables below. In comparison to the previous scientific experiments [15, 19], the scope of the algorithms is extended as well. In this paper SGD will be challenged by most of the graph algorithms that are available in the Neo4j libraries: 1. Centrality algorithms: degree centrality, eigenvector, Page Rank, Article Rank, betweenness, closeness. 2. Community detection algorithms: clustering coefficient (CC), triangle count, Louvain, strongly connected components (SCC). The hardware and software used in the experiments were as follows: • • • •

CPU: Intel Core i7-9750H 2,6 GHz 6-Core RAM: 16 GB 2667 MHz DDR4 OS: macOS Catalina 10.15.4 Software: Neo4j Desktop 1.2.4 + APOC + Graph Algorithms After setting up the graph database, the following SDG was created (see Fig. 1).

Fig. 1. Microservice architecture in Neo4j.

Model in Fig. 1 consists of 37 internal microservices that communicate through the own RESTful APIs. On the outer layer of the architecture, there are three frontend

134

E. Gaidels and M. Kirikova

services, as well as shipping and payment adapter services connected to the external APIs. The entire graph consists of 61 nodes and 80 edges. SDG visualization provides an intuitive overview of the MSA. Heuristics is not a precise method for critical component detection; therefore, several graph algorithms were applied and results demonstrated in Table 3 and Table 4. Table 3. Centralities: experiment results, ordered by degree. No.

Service

Degree

Eigenvector

Page Rank

Article Rank

Betweenness

Closeness

1

Account

9

39.37

2.63

0.54

1087.47

0.37

2

BrokerAPI

9

9.29

3.33

0.57

288

0.18

3

GatewayAPI

7

24.00

2.30

0.47

208.81

0.29

4

Payments

7

8.42

2.89

0.47

636

0.25

5

ShippingAPI

6

19.03

2.20

0.42

440

0.28

6

Investments

6

17.22

1.86

0.42

503

0.25

7

Notifications

5

24.90

1.43

0.36

180.94

0.28

8

RegionA

5

1.93

2.37

0.39

230

0.20

9

idCheck

4

4.03

1.70

0.33

531

0.25

10

Orders

4

23.81

1.21

0.32

212.81

0.31

11

Products

4

23.33

1.15

0.32

356.77

0.29

12

Transactions

4

20.23

1.27

0.32

611.99

0.30

13

KYC

3

10.93

1.15

0.28

677

0.29

14

Offers

3

20.04

0.88

0.27

4.77

0.28

15

Reports

3

15.79

0.96

0.28

27.87

0.28

5.27

17.49

1.82

0.38

399.76

0.27

Average

Table 4 summarizes the results of applying six centrality algorithms to SDG. For the perceptibility purposes, the list was limited to 15 services and ordered by the highest degree centrality. Bold results represent values higher than the sample’s average, while italic – represent results that are significantly higher than the average. We are interested in the services for which the algorithms with the highest values overlap. According to the results of this experiment, it is worth to underline high centrality values for ShippingAPI, Payments, GatewayAPI, and especially Account and BrokerAPI services. In the previous experiments with Neo4j, for instance [20], only the degree centrality metric was used for analyzing the critical nodes in the system. While degree centrality calculates the number of direct dependencies, other results might serve as supporting metrics that describe the downstream dependencies. In such an approach, we suggest comparing the results of the same metric relatively to other results. For instance, both Account and BrokerAPI have the highest degree value. However, for the Account service, we see significantly higher results for eigenvector, betweenness,

Service Dependency Graph Analysis in Microservice Architecture

135

and closeness metrics. In the MSA context, the centrality algorithms definitions from Table 1 could be interpreted in a way that service with the higher centrality metrics might have a higher impact on the downstream services and the rest of the graph. It is visible in Fig. 1, where BrokerAPI is located on the right side of the graph and is adjacent to the marginal services with no downstream dependencies. Meanwhile, Account is a central service with a set of dependent services on multiple levels. In a situation when services have to be categorized by the criticality, centrality algorithms allow to evaluate the service criticality level providing the quantitative data for the relative comparison. Results could serve as a guideline for prioritizing services for the RAS metrics improvement. By having more experiments in various architectures it could have been possible to prove the statement above with greater confidence. At this point, we acknowledge the results and take them as input for the next step. All the mentioned services will be taken into consideration and observed closely. In the next experiment, community detection algorithms were applied to SDG and visualized with the help of the Neo4j Graph Algorithms Playgrounds application. The first step is CC detection. A triangle count algorithm was additionally applied, and findings listed in Table 4. Table 4. Community detection: ordered by the clustering coefficient. No.

Service

Coefficient Triangles

1

Offers

0.66

2

2

Products

0.50

3

3

Orders

0.33

2

4

Transactions 0.33

2

5

Reports

1

6

Notifications 0.2

2

7

Account

0.11

4

8

Investments

0.07

1

9

ShippingAPI 0.07

Average

0.33

0.29

1 2

The color-coding logic in Table 4 is the same as in the previous experiment. Only nine services are part of the node triangles. Services Offers and Products are making concentrated clusters with a high probability of nodes being interconnected inside the cluster. While Account service has a low CC, high triangle count implies cyclic dependencies, which are considered as a severe anomaly in SDG. Keeping that in mind, we applied the Louvain algorithm to separate SGD into larger clusters that could be interpreted as service boundaries within the MSA. Louvain algorithm breaks the system into five main domains. In Fig. 2, communities are visualized using unique colors and sized by the triangle count. Services in the red community are responsible for client identification. Blue services are the frontend facade consisting of frontend API, gateway API, and two bounded Statistics and News services. The green community represents Investments and underlying services. Purple services represent

136

E. Gaidels and M. Kirikova

the payment domain. Every service node in the domains above is equally small, which implies that they are loosely coupled and relatively independent. The incoming nodes of every domain, i.e., frontend API, KYC, Investments, and Payments, are bridges, the shutdown of which will cause a cascading failure to all underlying services. Therefore, the system architect shall apply stringent reliability requirements to these services to guarantee the highest uptime.

Fig. 2. Community detection with the Louvain algorithm. (Color figure online)

Another compelling case is the yellow domain. It is hard to define its context precisely. From the first impression, the nodes are organized around the Account service. However, the cluster includes other core services such as Products, Orders, and Transactions. Therefore, it is fair to assume that the yellow community represents the core of the entire MSA. The node size of each service is notably larger. That indicates the high coupling among the services inside the core domain. SCC algorithm was applied to SDG to verify that (see Fig. 3). The same color of the nodes indicates the strong connectedness among the services. Due to the high number of services, colors might repeat. Therefore, it is necessary to analyze only the set of linked nodes of the same color. Figure 3 confirms the tight coupling of the core services. From that, it is possible to conclude, that Account, Products, Transactions, Offers, Orders, Payments, and ShippingAPI services form the situation called a distributed monolith, that conflicts with the entire purpose of the MSA. This situation requires the immediate attention of the system’s architect.

Service Dependency Graph Analysis in Microservice Architecture

137

Community detection brings insights that, in the specific context, might trigger a re-evaluation of the overall system’s design, leading to the refactoring of some of its components. The conclusions above are based on the theoretical static graph model, and it could be quite remarkable to apply these principles to the larger dynamic systems.

Fig. 3. Community detection with strongly connected components algorithm. (Color figure online)

It is possible to automate Neo4j graph modeling by using a tool called Mercator [20]. It works as a Docker image that scans particular AWS infrastructure and generates a graph of all the related entities within the runtime. The container installs a Neo4j database inside itself and then invokes Mercator against AWS with the provided readonly credentials. Mercator is a powerful solution that allows companies of any size to visualize its AWS infrastructure automatically in a matter of seconds or minutes, based on the graph size. Alternatively, a combination of Neo4j and open source distributed tracing tools such as Zipkin or Jaeger could be used to collect and store the service call data automatically. When the infrastructure snapshot is loaded into the graph database, the graph algorithms, demonstrated in this section, could be applied for the in-depth analysis similarly as it was done for the static graph model.

5 Conclusions This paper addressed the MSA analysis using graph algorithms. Moving forward in MSA reliability assessment, the service dependency graph was evaluated in the experiments

138

E. Gaidels and M. Kirikova

with a single system consisting of 61 services, which is a realistic number of services for medium and small size enterprise service systems. Main conclusions after the experiments are the following: 1. The service dependency graph (SDG) advances the observability and governance of the MSA on a scale. Neo4j highlights the dependency relationships among microservices making it easier to visualize the complex service interactions comparing to the proprietary service mapping technologies. Tools such as Mercator could simplify the Neo4j graph generation process by fully automating it in the AWS infrastructure. 2. Neo4j libraries support various centrality and community detection algorithms that could be applied for the qualitative assessment of the SDG. We found that centrality algorithms help to evaluate service criticality providing the quantitative data for the relative service comparison. Community detection algorithms allow to identify highly coupled clusters of services, in some situations revealing the hidden architectural anti-patterns, such as distributed monolith. 3. The application of the graph algorithms contributes to the architectural anti-pattern, critical component, and cycling dependency detection in the SDG. Gained results could serve as an indicator for the microservices-based system refactoring and resiliency strategy improvements. The experiment demonstrates the technical feasibility of the graph algorithm application to the SDGs. The proposed Neo4j solution is lightweight and might find its applications in the medium-size enterprise systems. Note that the limitations of applying our experimental method include: (1) insufficient statistical validation, having a risk of the overall generalization due to analysis of a single MSA example; and (2) the conclusions are based on a graph of a local non-production-grade system. Further research is intended for combining graph algorithms with imitation of service failures like in chaos engineering in the large-scale distributed environments, and the use of other simulation methods to seek for predictive measures for ensuring reliability in MSA based systems.

References 1. Familiar, B.: Microservices, IoT, and Azure. Apress, Berkeley, CA (2015). https://doi.org/10. 1007/978-1-4842-1275-2 2. Fowler, S.J.: Production-Ready Microservices: Building Standardized Systems Across an Engineering Organization. O’Reilly Media Inc., Sebastopol (2016) 3. Liu, H., et al.: JCallGraph: tracing microservices in very large scale container cloud platforms. In: Da Silva, D., Wang, Q., Zhang, L.-J. (eds.) CLOUD 2019. LNCS, vol. 11513, pp. 287–302. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23502-4_20 4. Newman, S.: Monolith to Microservices: Evolutionary Patterns to Transform Your Monolith. O’Reilly Media Inc., Sebastopol (2019) 5. Hadellis, L., Koubias, S.: An approach to interoperability in a heterogeneous control network environment. IFAC Proc. Vol. 33(20), 105–112 (2000). https://doi.org/10.1016/S1474-667 0(17)38034-5

Service Dependency Graph Analysis in Microservice Architecture

139

6. Dragoni, N., et al.: Microservices: yesterday, today, and tomorrow. Present and Ulterior Software Engineering, pp. 195–216. Springer, Cham (2017). https://doi.org/10.1007/978-3-31967425-4_12 7. Gammelgaard, C.H.: Microservices in .NET Core: with examples in Nancy (2016) 8. Bloch, H., et al.: A microservice-based architecture approach for the automation of modular process plants. In: IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), pp. 1–8 (2018) 9. Szendi-Varga, J.: Graph Technology Landscape 2020 (2020). https://graphaware.com/grapha ware/2020/02/17/graph-technology-landscape-2020.html 10. Graph Visualization. https://neo4j.com/developer/graph-visualization/ 11. Negro, A., Kus, V.: Bring Order to Chaos: A Graph-Based Journey from Textual Data to Wisdom (2018). https://graphaware.com/nlp/2018/09/26/bring-order-to-chaos.html 12. Neo4j, Inc.: Managing Microservices with Neo4j (2015). https://neo4j.com/blog/managingmicroservices-neo4j/ 13. Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Disc. 29(3), 626–688 (2014). https://doi.org/10.1007/s10618-014-0365-y 14. Ma, S., Fan, C., Chuang, Y., Lee, W., Lee, S., Hsueh, N.: Using service dependency graph to analyze and test microservices. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference, pp 81–86 (2018). https://ieeexplore.ieee.org/document/8377834 15. Ma, S., Fan, C., Chuang, Y., Liu, I., Lan, C.: Graph-based and scenario-driven microservice analysis, retrieval, and testing. Future Gener. Comput. Syst. 100, 724–735 (2019) 16. Needham, M., Hodler, A.E.: Graph Algorithms: Practical Examples in Apache Spark and Neo4j. O’Reilly Media Inc., Sebastopol (2019) 17. Neo4j, Inc.: The Neo4j Graph Data Science Library Manual v1.2. https://neo4j.com/docs/ graph-data-science/1.2/ 18. Watt, N.: Using Graph Theory and Network Science to Explore your Microservices Architecture (2019). https://gotober.com/2019/sessions/1128/using-graph-theory-and-network-sci ence-to-explore-your-microservices-architecture 19. Nurmela, T., Nevavuori, P., Rahman, I.: Qualitative evaluation of dependency graph representativeness. In: CEUR Workshop Proceedings, vol. 2520, pp 37–44 (2019). http://ceur-ws. org/Vol-2520/paper5a.pdf 20. Mercator. https://github.com/LendingClub/mercator

Literature and Conceptual Analysis

Text Mining the Variety of Trends in the Field of Simulation Modeling Research Mario Jadri´c , Tea Mijaˇc(B)

´ , and Maja Cukuši´ c

Faculty of Economics, Business and Tourism, University of Split, Split, Croatia {mario.jadric,tea.mijac,maja.cukusic}@efst.hr

Abstract. Simulation modeling as a research field is engaging and multidisciplinary, drawing on a wide range of techniques and tools. It has reached its 60th anniversary amassing to an enormous corpus of literature. To date, many studies explored its advancements in specific disciplines. This paper aims to present a more structured and cross-disciplinary approach by identifying trends and extracting patterns from text files using text mining techniques. For that purpose, abstracts from over 5,000 papers were downloaded from the WoS database and analyzed. The results are complemented and visualized through the process of science mapping and positioned to existing frameworks (Plan-Do-Check-Act and Bloom’s taxonomy in particular). The source with the most journal papers related to simulation modeling has been Ecological modelling; however, recent research focus has shifted to healthcare issues. Overall, the results confirm the relevancy of the simulation modeling field and acknowledge the trend among research papers towards the highest cognitive level of Bloom’s taxonomy – knowledge. Keywords: Simulation modeling · Text mining · Science mapping · PDCA

1 Introduction Like any other complex concept or term, simulation modeling is defined in many ways. Some definitions of simulation modeling are determined by the current state of underlying technologies and some by the author’s specific field, failing to provide a universal definition. In that regard, a systematic approach, such as the one presented in the paper based on the text analysis, would be needed to derive such a definition and outline the essential determinants of the concept. The purpose of simulation modeling should be emphasized in all cases: the ability to acquire knowledge of the simulated system. Computer-aided or automatic text analysis refers to a set of techniques that use novel techniques to answer questions frequently related to psychology, political science, sociology, and other social sciences. In these areas, “computational linguistics is a field that is primarily concerned with language in the text”, providing comparisons, and finding the patterns that researchers cannot easily detect [1]. The advantage is that data mining tools could be used to organize the body of literature in a reasonable time [2]. Simulation modeling, or modeling & simulation, is a multidisciplinary field that draws on a wide range of techniques and tools [3], celebrating its 60th anniversary since © Springer Nature Switzerland AG 2020 R. A. Buchmann et al. (Eds.): BIR 2020, LNBIP 398, pp. 143–158, 2020. https://doi.org/10.1007/978-3-030-61140-8_10

144

M. Jadri´c et al.

the development of the first simulation modeling language - General Simulation Program [3, 4]. Since then, it has been considered a tool, method, or application, used by scientists and engineers to investigate problems in their domain. More specifically, it has been the basis for analyzing the behavior of various real-world systems in a wide range of areas such as commerce, computer networks, defense, health, manufacturing, and transportation [3]. As a tool, simulation modeling can support the decision making since simulation allows replicating processes virtually and examining behavior, the impact of possible changes, or comparing different alternatives without high costs [5, 6]. At a company level, business process simulation can be beneficial for making strategic decisions, but also tactical and operational [7]. Specifically, simulation techniques can support understanding and analyzing processes for strategic management process improvement, forecasting, or predicting [5]. By applying this simulation-based approach, grounded in statistical data, potential risks can also be analyzed [6]. Nowadays, a standard approach is to set up the control flow of business processes, including the engaged resources, and used documents and to record the instructions for the execution of steps in the business processes [8]. Being able to quantitatively estimate the impact of process design on its performance is one of the biggest benefits of simulation modeling [5]. With the field being mostly multidisciplinary, there is some agreement however that simulation modeling is rooted in computer science and systems engineering [9]. Given the mentioned complexity and cross-disciplinary nature of the term simulation modeling, the field has attracted the attention of many authors. A study of the first 50 years of the field found that there are three different types of simulation modeling literature reviews [3]: (1) historical reviews, (2) methodology reviews, and (3) application reviews. However, most of the review papers focus on a narrower area. For example, numerous review papers examine simulation modeling in healthcare [10–16]. Other authors have reviewed papers on simulation modeling in the area of sustainability [17, 18], human behavior [19], construction industry [20], real-time services [21] and so on. Apart from these reviews, there are also many review papers on the software used for simulation modeling; for example, [5, 22]. The already mentioned study that took into account 50 years of research [3] was limited to papers published in specialized journals for simulation modeling. The authors based their findings on 576 journal papers, concluding that about two-thirds were methodology-related papers and the minority related to real-world applications. One decade later, this paper aims to expand the research and explore the trends in simulation modeling but without excluding journal titles from other fields. Given the broad subject of simulation modeling, one of the goals of this paper is to explore the extent to which this field is represented by business domain applications. For that purpose, the review includes all papers where the title, abstract, and keywords contain “simulation modeling” from 1990 to 2018. Several indicative research questions were raised to analyze the specific trends, such as: (1) which authors and sources influenced the simulation modeling literature the most?; (2) is there a difference in terms of the geographical distribution of simulation modeling studies?; (3) which themes/topics have instigated the most publications?; (4) is there a specific trend that can be recognized using text mining?; (5) what has been the focus of recent studies in the field of simulation modeling?.

Text Mining the Variety of Trends in the Field of Simulation Modeling Research

145

The paper follows the standard structure: Sect. 2 contains a detailed explanation of the study design with the procedure of data retrieval and with and tools and techniques used. Section 3 presents the results, followed by a discussion in Sect. 4, including an outlook on the future of simulation modeling research. By presenting the limitations of the study and closing remarks, Sect. 5 concludes the paper.

2 Research Methodology 2.1 Methods Used As is customary in a systematic review, the process was based on the search of electronic databases to retrieve relevant literature [23] and identification of related studies leading to search that yields large numbers of “hits” [24]. After that, titles and abstracts had to be downloaded manually for further screening. Due to the emergence of research repositories in recent years containing textual and complete data sources along with the increase of volume and variety of those sources – screening has become more complex [24, 25]. It is the most time-consuming part of the process as it can involve screening of tens of thousands of titles and abstracts. Text mining was used to optimize the process [24], a data mining technique where the input data is text [26]. The technique has become essential in support of knowledge discovery, in particular in the Knowledge Discovery in Texts [2]. It is commonly used to extract useful hidden information and patterns in a text [27], or more specifically, to derive implicit knowledge that hides in unstructured text and present it in an explicit form. Standard phases of text mining – data retrieval, data extraction, and knowledge discovery [28], were followed to this end. As of recently, text mining results are sometimes complemented by systematic mapping, a particular type of systematic literature review [25] that can be used when the field of interest is broad, and the objective is to get an overview of what is being developed in the research field. The main differences between a traditional systematic review and a systematic mapping are their breadth and depth – while the systematic review analyses a low number of primary studies in-depth, the systematic mapping analyzes a greater number of studies, but in less detail [25]. As such, science mapping represents the use of computational techniques for visualization, analysis, and modeling of a broad range of scientific and technological activities [29]. The scope of a science mapping in this study is the simulation modeling field of research, in other words, the unit of analysis is the domain of scientific knowledge in that particular field, reflected through an aggregated collection of intellectual contributions [30]. 2.2 Data Retrieval and Procedure For selecting and retrieving the sources, a set of guidelines [31] was followed, starting with the search from the scholar database. Similar to other studies, the database selected for the review was exclusively Web of Science (WoS) [32, 33]. Scopus has not been considered due to the fact that top 10 WoS journals in this research are indexed by Scopus but not vice versa (top 5 results for sources are same for both databases sources). Since the “simulation modeling” can also denote a method in any research area, further

146

M. Jadri´c et al.

filtering or exclusion based on the of journals was not done, i.e., journals from the specific research areas (medicine, engineering, education, and more) were kept-in for the analysis. Still, the scope was limited to journal papers, with the view to analyze and compare the results with the papers from specialized conferences in the future. The data identification phase included a search in the WoS Core Collection digital library. The topic search for “simulation model*” resulted in 62,960 publications in the period from 1986 to 2018. In order to limit the search to papers that focus on simulation modeling as the primary research method or methodology, the following search expression was applied: (TITLE: (“simulation model*”) Refined by: DOCUMENT TYPES=(ARTICLE) AND LANGUAGES=(ENGLISH) Timespan=1986– 2018. Indexes=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, BKCI-S, BKCISSH, ESCI, CCR-EXPANDED, IC.). The query resulted in 6,656 publications. The next step was downloading the abstracts to extract a dataset from a collection of texts using text mining [2]. By using only abstracts, the impact of the paper size on the representation of a particular topic in the analysis has been reduced [34]. However, abstracts in WoS are only available for publications in the period from 1990 onwards. In that regard, the search query was modified once again, setting the timespan for years from 1990 to 2018. The result is a database of 5,495 papers. Downloaded abstracts are generally considered to meet the following criteria [34]: (1) abstracts are available for automatic download from WoS, (2) the access to abstracts is open, (3) abstracts are a concise summary of the article and, (4) abstracts are comparable in size and style regardless of the scope and style of different journals. The abstracts of the papers were downloaded in batches from WoS, one txt file per year of publication (1990 to 2018, N = 29). For science mapping, the bibliographic database (N = 5,495) was also downloaded from WoS. 2.3 Tools and Techniques Used For text mining RapidMiner Studio was used; it is an open-source application providing an integrated environment for data mining [35]. The standard process was modeled (Fig. 1) using operators [36]: (1) Tokenize, (2) Filter Stopwords, (3) Stem (Porter), and (4) Transform Cases; cf. [37].

Fig. 1. Text processing the documents in RapidMiner Studio

The Tokenize operator splits the text of a document into a sequence of tokens. The simplest token is a character, although the simplest meaningful token is a word [38]. This action resulted in tokens consisting of one single word, as it is the most appropriate option before building the word vector. All words from documents were thus transferred to attributes through tokenization. Further data processing was required to get meaning and relationships from that. Filter Stopwords (English) operator removes all tokens that are equal to stopwords from the built-in list. Stopwords are mostly conjunctions

Text Mining the Variety of Trends in the Field of Simulation Modeling Research

147

and articles that enable readability of texts but do not contribute to the analysis. After this reduction, by using Porter’s words stemming algorithm, an iterative, rule-based replacement of word suffixes was performed to reduce the length of the words until a minimum length is reached. Stemming is needed to find the words that share a common root and combine stems representing a phrase that may have more meanings than a single word or stem. In text mining, this is the process of reducing like-words down to a single, common token. For example [36]: country, countries, country’s, countryman, and similar → countr. Finally, cases of characters were transformed to lower case using the Transform Cases operator. To process documents, word vector was created using the Term Frequency method. Infrequent words were ignored using the absolute prune method to identify words that do not appear in 29 documents (representing one document per year). There were 279 regular attributes (most common words) observed through all 29 years. The sum of relative term frequency was used as a criterion to reduce the number of regular attributes whereby the sum of relative frequency had to be more than 1, resulting in 61 attributes for easier interpretability. For science mapping VOSviewer (VOS) was used; it is an application for analyzing bibliometric network data, for example, citation relations between publications, collaboration relations between researchers, and co-occurrence between scientific terms [39]. Since 2011 VOS has a text mining functionality used here to complement the results from RapidMiner. Several steps were performed to create a term map based on the corpus of documents [39]: (1) identification of noun phrases, (2) selecting the most relevant noun phrases, (3) mapping and clustering, and (4) visualization of mapping and clustering results. During the first step, the software identifies noun phrases by filtering all word sequences that consist exclusively of nouns and adjectives, and that end with a noun. It then converts plural noun phrases into singular ones. The selection of relevant terms is based on measuring Kullback-Leibler distance [39]. For mapping and clustering, visualization of similarities technique was used to locate items in such a way that the distance between any two items reflects the similarity or relatedness of the items as accurately as possible [40]. Similar to other studies [33, 41–43], several bibliometric maps have been constructed and visualized based on the bibliographic dataset downloaded from WoS.

3 Results of the Analysis 3.1 General Overview of Collected Papers from 1990 to 2018 The database of 5,495 documents represents a growing corpus of research from the simulation modeling field. The number of total publications increased from 1990 when 103 papers were published, to 2000 with 142 papers, then to 2010 with 236 papers, and by 2018 when the number reached 335. Using the WoS Treemap (WoS Categories for “simulation model*” 1990–2018) top 5 science categories with the highest number of records (N = 1,781) were Computer science interdisciplinary applications (N = 459), Environmental sciences (N = 371), Engineering, civil (N = 352), Ecology (N = 323), and Engineering, electrical electronic (N = 276). Top 5 source titles for “simulation model*” in the same period were the journals: Ecological Modelling, Agricultural Systems, Transportation Research, Simulation Modelling Practice, and Theory and Simulation.

148

M. Jadri´c et al.

After importing bibliographic database to VOS, total number of sources was 2,363. A minimum number of publications per source was set to 3, and a minimum number of citations was set to 1 with 461 meeting the thresholds. Overlay visualization was used to present a “distribution of dates” for each source to determine the time when it was cited. The results show that Ecological Modelling has the highest number of citations (4,030) as well as the largest number of publications (148). However, among the most recent sources meeting the criterion are plos one, applied energy, and others (Fig. 2). The highest number of papers was authored in the USA (1,714), followed by UK (555), China (407), and Germany (347).

Fig. 2. Overlay visualization of sources based on citations

Co-citation analysis results (presented in Fig. 3) illustrate a network were relatedness of authors is determined based on the number of times they were cited together. A minimum number of citations was set to 20, and from the total number of 92,111 authors, 333 meet the threshold. Scholar A. M. Law (writing on simulation modeling methodology) has the largest number of citations, followed by D. Levy (active in the area of health) and J. Ritchie (focusing on soil-related topics). At the same time, the citation analysis revealed that D. Levy also has the leading number of authored publications.

Fig. 3. Network visualization of co-citation analysis

Text Mining the Variety of Trends in the Field of Simulation Modeling Research

149

3.2 Relative Term Frequencies and Term Maps A measure of how important a word may be is its term frequency, denoting how frequently a word occurs in a document [44]. A sum of relative term frequency for the period from 1990 to 2018 was calculated using RapidMiner (and is illustrated in Fig. 4). The total number of regular attributes from RapidMiner was reduced to 61 attributes (as already elaborated above, where the sum of relative frequency > 1). For readability, the number of terms is limited to 36.

Fig. 4. Sum of relative term frequency

The idea then was to limit the number of the most frequent terms in VOS as well, so the results would be validated and easier to compare. The overlay visualization (Fig. 5) contains items colored differently based on characteristics (year published), and the color of an item ranges from blue (before 2001) to green and to yellow (after 2015). Even though it is possible to set years manually, software by default sets ranges automatically to provide the optimal visualization. The output from both tools helps identify several common terms related to a research domain even though the periods do not strictly match, and these are population, growth, cost, temperature, and water.

150

M. Jadri´c et al.

Fig. 5. Overlay visualization of the term map (limited number of terms) (Color figure online)

Another term map was created using VOS, but without applying the restriction to the number of terms (in Fig. 5, it has been set to match the number from RapidMiner) to better identify the “research front” in the field of simulation modeling. The term map (Fig. 6) highlights the relative “frequency” (the size of a node) and “recency” of topics based on temporal analysis of their occurrence in reviewed papers. This visualization thus creates the “distribution of dates” for each term to determine the period when the topic attracted the most considerable interest [41]. It provides a better overview since it visualizes all terms. In can be concluded that topical foci in recent years shifted to realworld problems from medicine (as evidenced by the terms: hospital, patient, surgery, intervention, harm, life expectancy, and others).

Fig. 6. Overlay visualization of the term map

Text Mining the Variety of Trends in the Field of Simulation Modeling Research

151

3.3 Clustering the Periods and the Concepts The aim of the clustering was to locate similar groups, i.e., to group objects that are similar to each other and dissimilar to the objects belonging to other clusters [35]. The partitioning was based on the k-method (k-means), using a similarity function to measure the proximity among the text objects [2]. K-means is an exclusive clustering algorithm – each object is assigned to precisely one from the set of clusters. Clustering was done in RapidMiner with Square Euclidean distance used as a similarity measure. Three clusters were identified based on the years, differentiating between the three periods early period (1990), the period until 2002, and the last 15 years – 2003–2018. Details regarding the top 12 concepts for each cluster are given in Table 1. Both “model” and “simul” have the same rank across all clusters, and there is no difference between the top 5 concepts regarding the three clusters (time-periods). The most recent cluster (2003– 2018) introduces two new concepts: “predict” and “soil” – that were not included in the top 12 concepts of the other clusters (1990 and 1991–2002). Also, results show that the relative term frequency of the word “method” declines in years. Table 1. Cross comparison of top 12 concepts across 3 clusters, from RapidMiner Cluster_2 (1990) Cluster_1 (1991–2002)

Cluster_0 (2003–2018)

Absolute 1 count

12

16

Fraction

0.414

0.552

0.034 Concept Rtf*

Concept Rtf*

Concept Rtf*

model

0,655 model

0,668

model

0,676

simul

0,434 simul

0,437

simul

0,408

result

0,166 system

0,149

system

0,132

system

0,155 result

0,148

develop

0,131

develop

0,134 develop

0,123

result

0,127

base

0,132 base

0,116

data

0,113

method

0,117 data

0,108

effect

0,100

studi

0,116 studi

0,104

time

0,087

perform

0,102 effect

0,098

process

0,085

effect

0,096 time

0,089

predict

0,082

process

0,093 process

0,088

soil

0,082

time

0,093 method

0,084

studi

0,082

*Relative term frequency

As already mentioned in Sect. 2, VOS uses an alternative approach for clustering, and some authors argue that this provides a more satisfactory representation of a data set [40]. The results of text mining in VOS are visualized in Fig. 7. After setting the number

152

M. Jadri´c et al.

of minimum occurrences to 10, out of 97,889 terms, 1,885 meet the threshold. The total number of terms included in visualization is 1,131, amounting to the top 60% of the most relevant terms. There are five clusters, each presented with a different color. Each cluster may be seen as a different topic [39]. In that regard, and based on a more detailed analysis of the map, five main topics are acknowledged: (1) simulation modeling process – in red, with 346 items, (2) environment – in green, with 273 items, (3) simulation modeling settings – in blue, with 252 items, (4) healthcare – in yellow, with 188 items, and (5) population – in purple, with 72 items.

Fig. 7. VOSviewer cluster visualization (Color figure online)

3.4 Positioning the Results to Other Frameworks In the 1950s, W.E. Deming proposed that processes should be analyzed and measured to identify sources of variations that cause products to deviate from customer requirements. He recommended that processes have to be viewed through a continuous feedback loop (Plan–Do–Check-Act, PDCA) so that managers can identify and change the parts that need improvements [45]. The framework has been used on numerous occasions and across disciplines demonstrating the importance of continuous improvement and development. As Deming’s PDCA phases could be used in various scenarios, in this case, these were mapped to distinct phases of simulation modeling. Specifically: 1) Plan (design or revise process components to improve results) was mapped to the stemmed term “propos”; 2) Do (implement the plan and measure its performance) to the term “develop”; 3) Check (assess the measurements and report the results to decision-makers) to the term “evalu”; 4) Act (decide on changes needed to improve the process) to term “improve.” The use of the terms over time has been illustrated (Fig. 8).

Text Mining the Variety of Trends in the Field of Simulation Modeling Research

153

Fig. 8. Frequencies of PDCA-related terms (propos, develop, evalu and improve), 1990–2018

The terms matching the phases “plan”-“check”-“act” increase and converge with time. The term matching the phase “do” – develop, after a significant decrease from 1990 to 1994, oscillates in the predictable range and tends to stabilize, with significantly more use than the other three phases throughout the observed period. Apart from the well-known management framework PDCA, another one was assessed as useful to review the development of the research field over time. The first and most famous taxonomy of educational objectives is the one proposed in 1956 by B. Bloom [46]. In one of the domains, the cognitive domain, there are different levels: (1) knowledge, (2) comprehension, (3) application, (4) analysis, (5) synthesis, and (6) evaluation. The taxonomy is hierarchical, meaning that learning at higher levels depends on the prerequisites of knowledge and skills at lower levels achieved. Bloom’s Rose or Bloom’s Wheel displays the levels along with associated activities. Each of the six levels in the cognitive domain is presented through activities (such as organize, generalize, prepare, select, etc.) and resources that best contribute to the achievement of the objectives. This taxonomy has been used in many domains as well; it is considered foundational and indispensable in the education community [47], but authors have identified its cross-disciplinary use in business, the social sciences, and applied sciences [48]. Through the review of the activities corresponding to each of the six Bloom’s cognitive levels, and of the stemmed terms resulting from the text mining analysis, mapping of the words was done thoroughly. The most frequent term “develop” corresponds to the level synthesis (with a relative frequency of more than 3.5). A more detailed analysis has been done for each level. Over the period of time (1990 to 2018), based on the frequencies, it can be seen that the linear growth over time characterizes four out of six levels (Fig. 9). The mild linear fall characterizes the levels synthesis and knowledge over time, or more precise – the frequency of the stemmed terms design, develop, increase, predict, describe, include and require, does not increase over time.

154

M. Jadri´c et al.

Fig. 9. Frequencies of the terms corresponding to six levels of Bloom’s taxonomy, 1990–2018

4 Discussion This paper aimed to identify which authors and sources and documents influenced the simulation modeling literature the most. According to results, scholar A. M. Law has the largest number of citations, followed by D. Levy – who has been most productive author. With the second the most productive author is P. Carberry, also originating from USA – results are consistent with geographical distribution indicating that majority of papers come from USA. Based on the overview of analyzed papers, it became apparent that the source with the most documents as well as the most number of citations is Ecological modeling. However, after comparing the results when the total link strength (the number of publications in which two sources occur together) is taken into consideration – Agricultural system has the biggest strength. These results are in line with the previous research stating that many of the present-day research problems in sustainability can only be addressed effectively through simulation modeling [17]; therefore, it is not surprising that simulation modeling has made a significant contribution to the field. Based on the text mining results, in RapidMiner Studio and VOS, clustering analysis was used for additional insights into trends and main areas of interest. For that purpose, clustering based on the time dimension demonstrated that trends reflected in the relative term frequencies of the top 5 concepts remained the same in the three clusters (the stemmed terms being: model, simul, system, result, and develop). However, in the most recent cluster (from 2003 onwards), the term predict came about, implying that simulation models have recently been more used for predicting. Looking at the topics, and by using a different tool and clustering approach, it was possible to differentiate between five clusters. The biggest cluster is addressing the simulation modeling process, followed by clusters addressing environment topics, simulation modeling setting, healthcare, and population. Even if the ecological topics are the most cited sources, the cluster pertaining to it is the second biggest, surpassed by modeling-related research. It is understandably so, as even if the many reviewed studies use simulation models in applied settings (such as ecology), research results are expected to contribute to the advancement of simulation techniques themselves. In recent years, health-related research had become an important topic supported by simulation modeling, as this study had shown. To follow up on this, a recent review from 2018 suggested that simulation studies of physical activity are an emerging area in this regard [10].

Text Mining the Variety of Trends in the Field of Simulation Modeling Research

155

Apart from the trend analysis, the aim of the study was to get insights into the maturity of the field, so to say. For that purpose, based on a filtered and reduced number of relevant terms, it was possible to draw conclusions by positioning the results to relevant management and educational frameworks. Similar to authors [49], by looking at the PDCA management cycle, the phases of simulation modeling were identified and compared over time. It is apparent that most of the studies in the simulation modeling field align with the “do” phase, where the goal is to implement the plan and measure its performance. This is in line with the majority of papers using modeling to support real-world cases in different domains, as listed above. The smallest share among reviewed papers corresponds to the last phase, “act.” However, the frequencies of terms corresponding to this phase have the biggest linear growth. It can indicate that the trend in the field of simulation modeling is moving towards using models for decision support for system improvements. In favor of this analysis is the rise of the term “predict” which stands out in the most recent cluster (2003–2018). After positioning the simulation modeling terms to Bloom’s taxonomy, or more specifically, its cognitive domain, it can be concluded that terms related to the “knowledge” level (the highest domain) are the least represented. Despite the significant corpus, the field of simulation modeling is still very relevant and actual, with the progress to be achieved. It also remains evident that due to the changing and competitive environment the natural and artificial systems, organizations and individuals face, it is necessary to keep investigating the concept of continuous improvement towards excellence [7], well-supported by simulation modeling.

5 Conclusion This research paper has focused on more than 5,000 abstracts addressing the field of simulation modeling since 1990. Several trends have been identified by applying several text mining techniques, an approach that has the potential to revolutionize the way a research synthesis is done. The topics of recent papers in the field confirm that a lot of research is still done on the methodology itself – leaving space for improvement despite the longevity of the field. Further to that, the most important domains that frequently and successfully apply simulation modeling concepts in solving real-word cases have been identified. Simulation modeling allows users to predict future states by tracking changes in the system over time. It provides decision-makers with an opportunity to experiment in ways impossible in the real world [13, 16], and that makes it at the same time, an endurable research methodology and research focus. The field of simulation modeling is very active and the corpus has been continuously expanding for the past 30 years, as evidenced here and in previous studies [3]. The main limitation of this research is the scope of the retrieved data, limited by the source, i.e., the WoS database. Another limitation of this paper is caused by certain weaknesses of the visualization tool VOS: it creates a standard output, and no information from the user can be added, and the navigation possibilities are quite limited [2]. Also, using abstracts as the primary source of the text represents a possible limitation since they are very short, and authors may not always use the exact terms to describe concepts. Further, though the decision to not limit the number of journals to get the wider picture of the field is rational, it would be interesting to follow the same procedure with the papers

156

M. Jadri´c et al.

published exclusively in journals specialized in simulation modeling. The plan for future research thus includes analyses and comparisons of the results from the outlets published by specialized conferences, such as the Winter Simulation Conference (which has been providing open access for more than 50 years. Also, one of the possible directions of future work would be to conduct topic modelling for complementing the results of this paper. Acknowledgment. This work has been supported by the Croatian Science Foundation (project No. UIP-2017-05-7625).

References 1. Humphreys, A., Jen-Hui Wang, R.: Automated text analysis for consumer research. J. Consum. Res. 44(6), 1274–1306 (2018) 2. Justicia De La Torre, C., Sánchez, D., Blanco, I., Martín-Bautista, M.J.: Text mining: techniques, applications, and challenges. Int. J. Uncertainty, Fuzziness Knowl. Based Syst. 26(4), 553–582 (2018) 3. Taylor, S.J.E., Eldabi, T., Riley, G., Paul, R.J., Pidd, M.: Simulation modelling is 50! Do we need a reality check? J. Oper. Res. Soc. 60(1), 69–82 (2009) 4. Tocher, K., Owen, D.: The automatic programming of simulation. In: Proceedings of the Second International Conference on Operational Research, pp. 58–60 (1960) 5. García-García, J.A., Enríquez, J.G., Ruiz, M., Arévalo, C., Jiménez-Ramírez, A.: Software Process Simulation Modeling: Systematic literature review. Comput. Stand. Interfaces 70, 103425 (2020) 6. Lunesu, M. I., Münch, J., Marchesi, M., Kuhrmann, M.: Using simulation for understanding and reproducing distributed software development processes in the cloud. Inf. Softw. Technol. 103, 226–238 (2018) 7. Garcia, M.T., Barcelona, M.A., Ruiz, M., Garcia-Borgonon, L.; Ramos, I.: A discrete-event simulation metamodel for obtaining simulation models from business process models. In: José Escalona, M., Aragón, G., Linger, H., Lang, M., Barry, C., Schneider, C. (eds.) Information System Development, pp. 307–317. Springer, Cham (2014). https://doi.org/10.1007/978-3319-07215-9_25 8. Janses-Vullers, M.H., Netjes, M.: Business Process Simulation - A Tool Survey (2006) 9. Diallo, S.Y., Gore, R.J., Padilla, J.J., Lynch, C.J.: An overview of modeling and simulation using content analysis. Scientometrics 103(3), 977–1002 (2015) 10. Frerichs, L., Smith, N.R., Hassmiller, K., Bendor, T.K., Evenson, K.R.: A scoping review of simulation modeling in built environment and physical activity research. Heal. Place 57, 122–130 (2019) 11. Onggo, B.S.S.: Proceedings of the 2012 Winter Simulation Conference (2012) 12. Anjomshoae, A., Hassan, A., Rebi, M., Rani, A.: A review of ergonomics and simulation modeling in healthcare delivery system. Adv. Mater. Res. 845, 604–608 (2014) 13. Fone, D., et al.: Systematic review of the use and value of computer simulation modelling in population health and health care delivery. J Pub. Heal. Med. 25(4), 325–335 (2003) 14. Salleh, S., Thokala, P., Brennan, A., Hughes, R., Booth, A.: Simulation modelling in healthcare : an umbrella review of systematic literature reviews. PharmacoEconomics 35, 937–949 (2017) 15. Handel, A., La Gruta, N.L., Thomas, P.G.: Simulation modelling for immunologists. Nat. Rev. Immunol. 20, 186–195 (2019)

Text Mining the Variety of Trends in the Field of Simulation Modeling Research

157

16. Long, K.M., Meadows, G.M.: Simulation modelling in mental health: a systematic review. J. Simul. 12(1), 76–85 (2017) 17. Moon, Y.B.: Simulation modelling for sustainability: a review of the literature. Int. J. Sustain. Eng. 10(1), 2–19 (2017) 18. Kirimtat, A., Kundakci, B., Chatzikonstantinou, I., Sariyildiz, S.: Review of simulation modeling for shading devices in buildings. Renew. Sust. Energy Rev. 53, 23–49 (2016) 19. Cheng, Y., Liu, D., Chen, J., Namilae, S.: Human behavior under emergency and its simulation modeling : a review. In: Cassenti, D. (eds.) Advances in Human Factors in Simulation and Modeling, vol. 780. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-94223-0_30 20. Abdelmegid, M.A., González, V.A., Poshdar, M., Sullivan, M.O., Walker, C.G., Ying, F.: Automation in Construction Barriers to adopting simulation modelling in construction industry. Autom. Constr. 111, 103046 (2020) 21. Faingloz, L., Tolujew, J.: Simulation modelling application in real-time service systems: review of the literature. Procedia Eng. 178(112), 200–205 (2017) 22. Zhang, H., Kitchenham, B., Pfahl, D.: Software process simulation modeling: an extended systematic review. In: Münch, J., Yang, Y., Schäfer, W. (eds.) ICSP 2010. LNCS, vol. 6195, pp. 309–320. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14347-2_27 23. Creswell, J.W., Creswell, J.D.: Research Design, 5th edn. SAGE, London (2018) 24. Ananiadou, S., Procter, R., Thomas, J.: Supporting systematic reviews using text mining. Soc. Sci. Comput. Rev. 27, 509–523 (2009) 25. Sinoara, R.A., Antunes, J., Rezende, S.O.: Text mining and semantics: a systematic mapping study. J. Braz. Comput. Soc. 23(1), 1–20 (2017). https://doi.org/10.1186/s13173-017-0058-7 26. Kotu, V., Deshpande, B.: Predictive Analytics and Data Mining. Elsevier, Amsterdam (2015) 27. Kaur, A., Chopra, D.; Comparison of text mining tools. In: 5th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO), pp. 365–376 (2016) 28. Zhu, F., Patumcharoenpol, P., Zhang, C., Yang, Y., Chan, J.: Biomedical text mining and its applications in cancer research. J. Biomed. Inform. 46, 200–211 (2013) 29. Chen, C., Dubin, R., Schultz, T.: Science mapping. In: Encyclopedia of Information Science and Technology, 3rd edn., pp. 4171–4184 (2014) 30. Chen, C.: Science mapping: a systematic review of the literature. J. Data Inf. Sci. 2(2), 1–40 (2017) 31. Andres, A.: Measuring Academic Research. Chandos Publishing, Oxford (2009) 32. Rojas-Sola, J.I., Aguilera-García, A.I.: Global bibliometric analysis of building information modeling through the web of science core collection (2003–2017). Inf. de La Con. 72(557), 1–12 (2020). https://doi.org/10.3989/IC.66768 33. Teixeira, S., Pocinho, M.: The bibliometric perspective of hotel industry and regional competitiveness: J. Spat. Org. Dyn. 8(2), 129–147 (2020) 34. Daenekindt, S., Huisman, J.: Mapping the scattered field of research on higher education : a correlated topic model of 17, 000 articles, 1991–2018. High. Educ. 80(516), 1–17 (2020) 35. RapidMiner: RapidMiner (2020). http://docs.rapidminer.com/ 36. North, M.: Data Mining for the Masses, Global Text Project (2012) 37. Bayhaqy, A., Sfenrianto, S., Nainggolan, K., Kaburuan, E.: Sentiment analysis about ecommerce from tweets using decision tree, k-nearest neighbor, and Naïve Bayes. In: ICOT 2018 (2019) 38. Hrcka, L., Simoncicova, V., Tadanai, O., Tanuska, P., Vazan, P.: Using text mining methods for analysis of production data in automotive industry. In: Silhavy, R., Senkerik, R., Kominkova Oplatkova, Z., Prokopova, Z., Silhavy, P. (eds.) CSOC 2017. AISC, vol. 573, pp. 393–403. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57261-1_39 39. Van Eck, N.J., Waltman, L.: Text mining and visualization using VOSviewer (2011)

158

M. Jadri´c et al.

40. Van Eck, N.J., Waltman, L., Dekker, R., Van Den Berg, J.: A comparison of two techniques for bibliometric mapping: multidimensional scaling and VOS. J. Am. Soc. Inf. Sci. Technol. 61(12), 2405–2416 (2010) 41. Hallinger, P., Gümü¸s, S., Belliba¸s, M.: ‘Are principals instructional leaders yet?’ A science map of the knowledge base on instructional leadership, 1940–2018. Scientometrics 122, 1629–1650 (2020) 42. Zurita, G., Shukla, A.K., Pino, J.A., Merigó, J.M., Lobos-Ossandón, V., Muhuri, P.K.: A bibliometric overview of the journal of network and computer applications between 1997 and 2019. J. Network Comput. Appl. 165, 102695 (2020) 43. Obileke, K., Onyeaka, H., Omoregbe, O., Makaka, G., Nwokolo, N., Mukumba, P.: Bioenergy from bio-waste: a bibliometric analysis of the trend in scientific research from 1998–2018. Bio. Convers. Biorefinery (2020). https://link.springer.com/article/10.1007/s13399-020-008 32-9 44. Silge, J., Robinson, R.: Welcome to Text Mining with R. O’Reilly, Sebastopol (2017) 45. Arveson, P.: The Deming Cycle. 2020. https://balancedscorecard.org/bsc-basics/articles-vid eos/the-deming-cycle. Accessed 10 Feb 2020 46. Bloom, B. S.: Taxonomy of Educational Objectives: The Classification of Educational Goals: Handbook I, Cognitive domain. Longmans: Green, New York (1965) 47. Britto, R., Usman, M: Bloom’s Taxonomy in Software Engineering Education : A Systematic Mapping Study (2015) 48. Nentl, N., Zietlow, R.: Using bloom’ s taxonomy to teach critical thinking skills to business students. Coll. Undergrad. Libr. 15(1–2), 159–172 (2008) ´ 49. Ninˇcevi´c Pašali´c, I., Cukuši´ c, M., Jadri´c, M.: Smart city research advances in Southeast Europe. Int. J. Inf. Manage (2020, in press)

Service Quality Evaluation Using Text Mining: A Systematic Literature Review Filip Vencovsk´ y(B) University of Economics, 130 67 Prague 3, Prague, Czechia [email protected]

Abstract. The volume of customers’ feedback that is available online is rising every year. Traditional approaches to service quality evaluation use questionnaires and leave existing online feedback from consumers aside. The possible reason is that harnessing the consumers’ feedback is a difficult task that requires employing text mining methods. Therefore, we decided to shed light on service quality research that uses consumers’ feedback as a source of information and text mining methods as a part of the quality evaluation. We conducted a systematic literature review of journal articles that focuses on service quality evaluation with the use of text mining methods. We found that text mining is a promising method for service quality research. On the other hand, we identified four challenges that arose from the reviewed article inconsistencies regarding quality measure, quality dimensions, the level of analysis, and sentiment analysis methods. Future research is needed to validate quality evaluation measures and separate them from customer satisfaction measures, to argue when is suitable to use quality dimensions from literature and when identify service-specific quality dimensions, and to focus more on aspect level of quality analysis. There were also no attempts among studies to use the current state of art classification technologies, such as deep learning, or experiment with word embedding.

Keywords: Service quality Text mining

1

· Quality evaluation · Customer feedback ·

Introduction

In opposite to product quality, service quality is harder to evaluate due to its intangibility, variability and other characteristics [16]. These characteristics make service quality only possible to measure from the customers’ point of view. The main purpose of service quality evaluation is to unveil what consumers feel about a service they consume, what are the strengths or weaknesses of the service. Only then the service could be transformed to the level that the majority of consumers perceive as the expected. c Springer Nature Switzerland AG 2020  R. A. Buchmann et al. (Eds.): BIR 2020, LNBIP 398, pp. 159–173, 2020. https://doi.org/10.1007/978-3-030-61140-8_11

160

F. Vencovsk´ y

Many techniques of service quality evaluation have been described. The majority of research papers propose SERVQUAL or SERVPERF instruments for quality evaluation in different fields of service [2,8,12,22–24,27–29,38,39,41, 47,48]. In contrast, only a few papers use full-text or online sources to gather information about service quality [5,10,21,31,37,46]. On the other hand, there are a plethora of research articles that focus on customer satisfaction and use text mining techniques [13,20,30,34]. If a service should deliver promised values in quality that consumers expect, only the information that consumers feel weakly or strongly satisfied with the service is not enough. To enable any service improvement, it is necessary to understand single quality attributes or dimensions and their importance for customers. Traditional structured service quality surveys offer a multi-dimensional view, but they limit consumers in expressing themselves [42]. The methods like interviews that allow a customer to express fully are time and resource consuming, and there were many disadvantages associated with unstructured data for a long time, including the necessity of painstaking content analyses and coding procedures [9,46]. We think that there is a way of service quality evaluation that is less timeconsuming as interviews and allows customers to express their opinions without any limitations. Therefore we decided to investigate the possibilities of computeraided text analysis in service quality field by review of the current state of knowledge in the academic literature. Since the topic is not well established, we explore the journals that could be suitable for a discussion of the topic in the future. In this paper, we address two research questions: 1. What scientific journals are the most related to service quality evaluation using text mining? 2. What are the development and current state of knowledge about service quality evaluation using text mining?

2

Research Method

We used a systematic literature review (SLR) as the method to answer the two research questions. Prior to this paper, we found six studies that fits the topic of this research perfectly [5,10,21,31,37,46]. We used these studies for keyword identification and evaluation of search results. We set these inclusion criteria that will help to answer the research questions and lower the volume of search results: – – – –

journal article, English language, service quality topic, text-mining approach.

Service Quality Evaluation Using Text Mining

161

We used Scopus as a catalogue for this systematic literature review. The search results were gathered in September 2019. The search query was a conjunction of service quality and text mining topics in English. We search in title, abstract, and keywords or articles. We excluded the entries from the subject areas that less likely contains relevant articles: Health Sciences (“MEDI”, “NURS”, “DENT”, “VETE”, “HEAL”), Life Sciences (“NEUR”, “IMMU”, “BIOC”, “PHAR”, “AGRI”), Physical Sciences (“ENVI”, “MATE”, “PHYS”, “ENER”, “EART”, “CHEM”, “CENG”), Arts and Humanities (ARTS). Multidisciplinary subjects were kept. The area of service quality is well established because the topic exists from 1980’ and thus the basic keywords service quality were used. The search resulted in 180,676 entries. The area of the text mining is still evolving and thus we extended the search on the whole area of data mining. The conjunction of service quality area with these two areas resulted in 1,869 entries. Unfortunately, one of the control articles was not present in the search results. Hence the additional analysis of control articles was necessary. Through the analysis of keywords of the control articles was found that the phrase sentiment analysis finally helped to fully cover the desired area. The complete query that contains service quality on one side and text mining together with data mining and sentiment analysis on the other side resulted finally in 1,925 entries. TITLE-ABS-KEY ( service AND quality AND ( "text mining" OR "data mining" OR "sentiment analysis" ) ) AND ( EXCLUDE ( SUBJAREA ( "MEDI" ) ... ) We identified 559 entries as journal articles. The articles were catalogued as: – article (521), – review (26), – undefined (12). We reviewed the abstracts of journal articles and selected 81 articles for a content review. The articles that were not selected for the content review focused on different topics or were indirectly linked to service quality (in a way “our approach will improve service quality”). The excluded articles contributed to the following industries: telco, healthcare, hospitality, transportation, insurance, education, ecommerce, IT (web services, networks, computing, warehousing) and focused on following topics: prediction, segmentation, personalisation, process improvement, pattern analysis, QoS, data quality, service composition, service matchmaking. Based on the content review, we categorised articles into four groups based on their relevance to the service quality and text mining topics:

162

F. Vencovsk´ y

– The inclusion criteria met 14 articles on service quality and text mining. – The second group consisted of 48 articles that included text mining, but it explored related concepts to service quality, such as satisfaction. The group did not fulfil the inclusion criteria. – The third group consisted of 12 articles focused on service quality and the broader area of data mining but not text mining. The group did not fulfil the inclusion criteria. – The last group of seven articles did not satisfy the inclusion criteria because the articles explored only related concepts to service quality and the broader area of data mining but not text mining. The content of one article [6] was unavailable and was also excluded from the further research. The final distribution of articles into the four described groups shows Table 1. Table 1. Distribution of articles among four groups

3

Area

Service quality Related to service quality

Text mining

13

48

Data mining in general 12

7

Relevant Journals

The articles focused on e-service, hospitality, transportation, healthcare, banking and retail. Although the first article [31] was published in 2008, the majority of articles was issued after the year 2015. The most (five) papers was published in 2018. In order to answer the research question, what scientific journals are the most related to service quality evaluation using text mining, we analysed where were the selected articles published and what is the nature of the journal. Only one journal issued more than one of the reviewed articles. It was the journal Expert Systems with Application two published articles. We conducted a further analysis of the journals and verified the multidisciplinary nature of the topic. The journals were either from the service area or information technology (IT) area. The detailed categorisation of journals shows Table 2. We consider the journals Expert Systems with Application and Journal Of Theoretical And Applied Information Technology as the most suitable for holding the discussion on this multidisciplinary topic due to the emphasis on application of technology to different fields, including service science.

Service Quality Evaluation Using Text Mining

4

163

Current State of Knowledge

We focused on similarities and differences in approach to quality evaluation and use of text-mining on the thirteen remaining journal articles to identify what is the development and current state of knowledge about service quality evaluation using text mining. The articles differ in the level of background in service quality literature. Since the traditional service quality research uses validated questionnaires where respondents choose how much they agree with the statement on a scale, the main challenge for service quality researchers is to determine what is the right quality measure in free-form textual feedback. Table 2. Journals by subject area Subject area

Journal

Service-related Service in general

Service Business, International Journal Of Services And Standards Transportation service Public Transport, Travel Behaviour And Society Other service industries Cornell Hospitality Quarterly, International Journal Of Bank Marketing

IT-related

Applied technology

Expert Systems with Applications, Journal Of Theoretical And Applied Information Technology Information management International Journal Of Information Management, Information Systems And E-Business Management Computer science IEEE Transactions On Knowledge And Data Engineering

Methodology

Quality And Quantity

We identified ten aspects in which the selected journal articles differ: – – – – – – – – – –

source of service quality information, service quality evaluation measures, service quality dimensions, level of analysis, quality dimension mining, sentiment analysis, contextual information, quality monitoring, used tools, feature preparation.

164

4.1

F. Vencovsk´ y

Source of Service Quality Information

Due to the fulfillment of the inclusion criteria, all of the papers use some kind of textual customer feedback, sources of the feedback are various and offer a different level of information to work with. Table 3 summarises the used sources of information. The vast majority of researchers used online reviews [7,11,21, 25,36,46,52]. Online reviews consist of a review body that contains customers’ opinions in textual form and review rating that contains one or more rating scales. Those scales bring researchers valuable information about the overall attitude of the reviewer towards the service. The question is how to interpret the value, and especially in the case when no leading information is provided to reviewers. Three studies [36,46,52] used rating as one of the inputs for quality evaluation. The other sources are purely textual. Ashton et al. [4,5] worked with the open-ended question from the questionnaire after the service cancellation. Lo [31] gathered data in a form of messages for a web manager on a discussion forum. Three other studies [17,19,35] builds the quality evaluation on Twitter data. Researchers that decide to use Twitter data have, in contrast to other sources, difficult task to separate service quality-related messages from the others. Mogaji et al. [35] and He et al. [19] used the company name as a keyword for message extraction. Haghighi et al. [17] used a more sophisticated approach and filtered messages according to the company name, location and, moreover, collected messages categorised using topic analysis as service-related and company-project related. Although the information sources provide metadata such as date, user or location, only Zhao et al. [52] embodied those data in quality evaluation. Table 3. Source of information for service quality evaluation Source of information Count Articles

4.2

Online review

7

[7, 11, 21, 25, 36, 46, 52]

Questionnaire

2

[4, 5]

Messageboard

1

[31]

Twitter

3

[17, 19, 35]

Service Quality Evaluation Measures

The reviewed literature mirror the discussion in service quality research. According to the expectation confirmation theory, one group of researchers captures expectation and actual quality separately. The other group is convinced that the actual perceived quality is already a result of expectation confirmation.

Service Quality Evaluation Using Text Mining

165

The main challenge is to define the right measure that can be extracted from a text. Table 4 shows used measures. The majority of articles [7,11,17,19,21,25,35] used sentiment as a measure. Zhao et al. [52] combined sentiment with a rating to calculate review confidence. The rating of confident reviews was then taken as a quality measure. Palese et al. [36] used rating alone. Song et al. [46] combined sentiment with a volume of mentions in expectation confirmation formula. In this formula, the sentiment was calculated as perceived actual quality. The volume of mentions was a measure of consumers’ expectations. Three studies [4,5,31] used only a volume of mentions alone: Ashton et al. [4,5] as quality attribute volume and Lo [31] as complaint rate. Table 4. Measures for service quality evaluation

4.3

Measure

Count Articles

Sentiment

7

[7, 11, 17, 19, 21, 25, 35]

Sentiment and rating

1

[52]

Rating

1

[36]

Sentiment and volume 1

[46]

Volume

[4, 5, 31]

3

Service Quality Dimensions

Only the information about overall service quality is not enough. The true potential for continual service improvement is enabled through the knowledge of which part of service is perceived as lower than expected. Service quality research shows that quality can be separated into different dimensions [39]. Table 5 summarises used sources of service quality dimensions. The majority of reviewed papers [7,11,35,36] used theoretically well-established SERVQUAL dimensions (tangibles, reliability, assurance, responsiveness, empathy). Literature as a source of quality dimensions identification was also used by He et al. [19]. They employed reliability, responsiveness, flexibility, availability, assurance, personnel contact quality, and tangibles dimensions from Ramanathan and Karpuzcu [43]. Besides literature, reviewed articles used topic mining for discovering latent quality dimensions. The approach was used in two studies [4,5,25,31]. Topic mining, in case of James et al. study [21], resulted in a match with literature. The systems, interpersonal, and technical dimensions coincide with L´ opez et al. [32]. Kim [25] used Operations, Sales, Amenities, Facilities and Experiences found using topic mining in their previous study [26]. Song et al. [46] used information gathered with service quality experts. Two studies [17,52] did not use any quality dimension.

166

4.4

F. Vencovsk´ y

Level of Analysis

Another important factor in service quality evaluation is the level of analysis. Table 6 shows different levels o analysis. The vast majority of articles [5,7,17, 19,31,35,36,52] used a whole document as a unit to which sentiment or rating and quality dimensions are connected. This approach, on the other hand, has a significant disadvantage because a document may contain judgements about more than one quality dimension. This issue was solved in two studies [5,7] with use of multiple class categorisation. Table 5. Sources of service quality dimensions Dimension source

Count Articles

Literature

5

Topic mining

4

Topic mining + literature match 1

[7, 11, 19, 35, 36] [4, 5, 25, 31] [21]

Experts

1

[46]

No dimensions

2

[17, 52]

Even with the multiple classes, the review has only one sentiment or rating. Two studies [11,36] proposed a sentence level of analysis. Sentiment and quality dimension are assigned to each sentence in a document. Song et al. [46] assigned phrases into service hierarchy and performed analyses on word level. High level of analyses was performed [21] by concatenation of all documents that relate to one entity. On concatenated documents, they performed multiple class categorisation. From one article [25] is not clear which level of analysis was used. Table 6. Levels of analysis Level of analysis

Count Articles

Document Unclear classes 5 One class 1 Multiple classes 2

[17, 19, 35, 36, 52] [31] [5, 7]

Sentence

One class 1 Multiple classes 1 Multiple classes 1

[11] [36] [21]

Word

1

[46]

Unclear

1

[25]

Entity

Service Quality Evaluation Using Text Mining

4.5

167

Quality Dimension Mining

An important part of the computer-aided quality evaluation is to determine how service quality dimension and quality attributes are to be extracted from a text. Table 7 summarises used methods for quality dimension mining. The majority of articles [4,5,21,25,36] used topic mining as unsupervised or semi-supervised learning method. Ashton et al. [4,5] used Latent Semantic Analysis (LSA) with Singular Value Decomposition (SVD). Three studies [21,25,36] employed Latent Dirichlet Analysis (LDA). Palese et al. [36] used in addition to LDA seed words. Two studies [11,31] used supervised learning methods. Lo [31] employed Support Vector Machine (SVM) and Duan et al. [11] Na¨ıve Bayes classification. The rest of the research articles did not use machine learning for quality attributes extraction. Two studies [7,19] searched for quality-related keywords. Song et al. [46] employed quality attribute dictionary. Mohaji et al. [35] used manual coding and Haghighi et al. [17] did not use any attribute extraction. Table 7. Quality dimension mining methods Quality dimension mining method

Count Articles

Unsupervised learning LDA LSA

3 2

Supervised learning

4.6

SVM 1 Na¨ıve Bayes 1

[21, 25, 36] [4, 5] [31] [11]

Keyword search

2

[7, 19]

Dictionary

1

[46]

Manual coding

1

[35]

No attribute mining

1

[17]

Sentiment Analysis

Sentiment analysis is a popular text mining method for discovering a onedimensional emotion expressed in a text. In the context of service quality, the sentiment is an important measure for qualitative judgements. Table 8 summarises used methods for sentiment analysis. Most articles [7,19,21,25,46] used a dictionary approach. Each analysed word is compared with the word in sentiment dictionary. The words in the sentiment dictionary have assigned sentimental values. The sentiment of the text is then calculated from these individual values. Haghighi et al. [17] used a kind of sentiment analysis where context is taken into consideration with the use of specific linguistic rules. Only one article [11] employed supervised learning and analysed sentiment using Na¨ıve Bayes classifier. The goal of supervised learning sentiment analysis

168

F. Vencovsk´ y

is to train a model to return sentiment values as close as possible to values coded by coders. Mogaji et al. [35] used TextBlob Python library without specifying which implementation is used for sentiment analysis because TextBlob works either with Na¨ıve Bayes or with dictionary-based PatternAnalyzer. 4.7

Contextual Information

Zhao et al. [52] was only the one researcher that put contextual information into consideration of service quality. As the online review is by some researchers seen as problematic, contextual information might help to separate honest reviews from fake or influenced ones. They used geographical distance, time and sentiment to calculate review confidence. Table 8. Sentiment analysis methods Sentiment analysis method Count Articles Supervised learning

4.8

1

[11]

Dictionary

5

[7, 19, 21, 25, 46]

Dictionary + rules

1

[17]

Unclear

1

[35]

Quality Monitoring

Four other studies [4,5,11,31] paid attention to quality monitoring using. Ashton et al. [4,5] and Lo [31] proposed quality charts and Duan et al. [11] dynamic score. 4.9

Used Tools

Some reviewed articles also revealed tools used for text analysis. Song et al. [46] used Stanford NLP library. Haghighi et al. [17] and Palese et al. [36] employed R, namely packages topicmodels for topic mining, tm for preprocessing, textcat for language detection, MC tokeniser for tokenization, Rsentiment for sentiment analysis. Mogaji et al. [35] used Python with twitterscraper for scraping and Textblob for sentiment analysis. James [21] used DICTON for sentiment analysis and MALLET for dimension mining. He et al. [19] used Lexalytics for sentiment analysis. 4.10

Feature Preparation

Feature preparation is a crucial part of machine learning, that is useful for sentiment analysis or quality dimension classification. Only two authors wrote details about this part. Lo [31] used TF-IDF metrics, known from the information retrieval field. Ashton et al. [4,5] used absolute term frequency.

Service Quality Evaluation Using Text Mining

5

169

Conclusion

We conducted a systematic literature review that focused on service quality evaluation using text mining methods. We addressed two research questions: What scientific journals are the most related to service quality evaluation using text mining? What are the development and current state of knowledge about service quality evaluation using text mining? In order to answer these questions, we proposed a set of inclusion criteria. The search result consisted of 559 journal articles. We reviewed the abstracts of these articles and selected 81 articles for further analysis. We categorised these articles into four groups according to their relevance to service quality and text mining. Finally, we identified 13 most relevant articles for detailed review. Regarding the first research question, we found that focal articles are published in either service or IT-related journals. Service journals were focused on service in general or on a specific service industry such as transportation. IT-related journals were orientated to applied technologies, information management and computer science. One journal was focused solely on the area of methodology. We propose that the journals Expert Systems with Application and Journal Of Theoretical And Applied Information Technology are the most relevant for holding the discussion on this topic due to the emphasis on application of technology to different fields, including service science. The review of development and current state of knowledge showed that the attempts to use text mining for service quality evaluation are promising but not satisfactory. Established service quality methods derived from SERVQUAL or SERVPERF use tools consisted of rating scales and factual statements related to service. These tools use meticulously identified dimensions. Text mining of textual feedback brings new possibilities and challenges in this matter: Problem 1. Although the choice of a right measure is crucial for service quality evaluation, the reviewed studies showed inconsistency in adopted approaches: the majority of articles used expressed sentiment, part of the articles used rating scale and part volume of feedback. Unfortunately, no article brought sufficient evidence to support their measures. The used measures also collide with customer satisfaction concept on which the same measures are applied in literature [14,40]. Despite customer satisfaction and service quality are related concepts [15], they cant be interchanged. Problem 2. In order to evaluate service quality, measures need to be combined with quality dimensions. Quality dimensions were traditionally identified using interviews with service managers and customers. Textual feedback and text mining methods can also be used for dimension identification [51]. A part of the articles used dimensions from service quality literature, and another part identified dimensions from gathered content using topic mining. The question that prevails is when it is more suitable to use traditional dimensions and when to identify service-specific dimensions from the gathered content.

170

F. Vencovsk´ y

Problem 3. The articles varied in the level of analysis. The level of analysis is essential for the ability of a method to link expressed opinions with aspects and quality dimensions. The most frequent was a document-level even when a document is too coarse for quality analysis; because it contains more than one dimension. The same problem remains in sentence-level. The only solution is to use multiple class classifiers [5,7,21,36] or aspect (entity) levels [21,33,44]. Problem 4. In case of sentiment analysis as a measure, the articles were diverse in used methods to capture expressed sentiment from text. Most of the articles used sentiment dictionaries, although machine learning methods [18] or their combination with a dictionary approach [45] can provide better accuracy. Remarkably, there were no attempts among studies to use the current state of art classification technologies, such as deep learning [1,50], or experiment with word embedding [3,45,49].

References 1. Abdi, A., Shamsuddin, S.M., Hasan, S., Piran, J.: Deep learning-based sentiment classification of evaluative text based on multi-feature fusion. Inf. Process. Manage. 56(4), 1245–1259 (2019). https://doi.org/10.1016/j.ipm.2019.02.018 2. Alanezi, M.A., Kamil, A., Basri, S.: A proposed instrument dimensions for measuring e-government service quality. Int. J. U- E-Serv. Sci. Technol. 3(4), 1–17 (2010) 3. Ali, F., et al.: Transportation sentiment analysis using word embedding and ontology-based topic modeling. Knowl.-Based Syst. 174, 27–42 (2019). https:// doi.org/10.1016/j.knosys.2019.02.033 4. Ashton, T., Evangelopoulos, N., Prybutok, V.R.: Exponentially weighted moving average control charts for monitoring customer service quality comments. Int. J. Serv. Stand. 8(3), 230 (2013). https://doi.org/10.1504/IJSS.2013.057237 5. Ashton, T., Evangelopoulos, N., Prybutok, V.R.: Quantitative quality control from qualitative data: control charts with latent semantic analysis. Qual. Quant. 49(3), 1081–1099 (2014). https://doi.org/10.1007/s11135-014-0036-5 6. Bogicevic, V., Yang, W., Bujisic, M., Bilgihan, A.: Visual data mining: analysis of airline service quality attributes. J. Qual. Assur. Hospitality Tourism 18(4), 509–530 (2017). https://doi.org/10.1080/1528008X.2017.1314799 7. Chakrabarti, S., Trehan, D., Makhija, M.: Assessment of service quality using text mining - evidence from private sector banks in India. Int. J. Bank Market. 36(4), 594–615 (2018). https://doi.org/10.1108/IJBM-04-2017-0070 8. Choudhury, K.: Service quality and word of mouth: a study of the banking sector. Int. J. Bank Market. 32(7), 612–627 (2014). https://doi.org/10.1108/IJBM-122012-0122 9. Chowdhury, J., Reardon, J., Srivastava, R.: Alternative modes of measuring store image: an empirical assessment of structured versus unstructured measures. J. Market. Theory Pract. 6(2), 72–86 (1998). https://doi.org/10.1080/10696679.1998. 11501797 10. Duan, W., Cao, Q., Yu, Y., Levy, S.: mining online user-generated content: using sentiment analysis technique to study hotel service quality. In: 2013 46th Hawaii International Conference on System Sciences, pp. 3119–3128 (2013). https://doi. org/10.1109/HICSS.2013.400

Service Quality Evaluation Using Text Mining

171

11. Duan, W., Yu, Y., Cao, Q., Levy, S.: Exploring the impact of social media on hotel service performance. Cornell Hospitality Q. 57(3), 282–296 (2016). https:// doi.org/10.1177/1938965515620483 12. El-Bayoumi, J.G.: Evaluating IT service quality using SERVQUAL. In: Proceedings of the ACM SIGUCCS 40th Annual Conference on Special Interest Group on University and College Computing Services - SIGUCCS 2012, p. 15. ACM Press, New York (2012). https://doi.org/10.1145/2382456.2382461 13. Gitto, S., Mancuso, P.: Improving airport services using sentiment analysis of the websites. Tourism Manage. Persp. 22, 132–136 (2017). https://doi.org/10.1016/j. tmp.2017.03.008 14. Gnewuch, U., Morana, S., Adam, M., Maedche, A.: Measuring service encounter satisfaction with customer service chatbots using sentiment analysis. In: Proceedings of the 14th International Conference on Wirtschaftsinformatik (WI2019), pp. 0–11 (2019) 15. Gotlieb, J.B., Grewal, D., Brown, S.W.: Consumer satisfaction and perceived quality: complementary or divergent constructs? J. Appl. Psychol. 79(6), 875–885 (1994) 16. Gr¨ onroos, C.: A service quality model and its marketing implications. Eur. J. Market. 18(4), 36–44 (1984). https://doi.org/10.1108/EUM0000000004784 17. Haghighi, N.N., Liu, X.C., Wei, R., Li, W., Shao, H.: Using Twitter data for transit performance assessment: a framework for evaluating transit riders’ opinions about quality of service. Public Transp. 10(2), 363–377 (2018). https://doi.org/10.1007/ s12469-018-0184-4 18. Hailong, Z., Wenyan, G., Bo, J.: Machine learning and lexicon based methods for sentiment classification: a survey. In: Proceedings - 11th Web Information System and Application Conference, WISA 2014, pp. 262–265 (2014). https://doi.org/10. 1109/WISA.2014.55 19. He, W., Tian, X., Hung, A., Akula, V., Zhang, W.: Measuring and comparing service quality metrics through social media analytics: a case study. IseB 16(3), 579–600 (2017). https://doi.org/10.1007/s10257-017-0360-0 20. Bogicevic, V., Yang, W., Bilgihan, A., Bujisic, M.: Airport service quality drivers of passenger satisfaction. Tourism Rev. 68(4), 3–18 (2013). https://doi.org/10.1108/ TR-09-2013-0047 21. James, T.L., Villacis Calderon, E.D., Cook, D.F.: Exploring patient perceptions of healthcare service quality through analysis of unstructured feedback. Expert Syst. Appl. 71, 479–492 (2017). https://doi.org/10.1016/j.eswa.2016.11.004 22. Jiang, J.J., Klein, G., Carr, C.L.: Measuring information system service quality: SERVQUAL from the other side. MIS Q. 26(2), 145–166 (2002). https://doi.org/ 10.2307/4132324 23. Kettinger, W.J., Lee, C.C.: Perceived service quality and user satisfaction with the information services function. Decis. Sci. 25(5–6), 737–766 (1994). https://doi.org/ 10.1111/j.1540-5915.1994.tb01868.x 24. Kettinger, W.J., Lee, C.C.: Pragmatic perspectives on the measurement of information systems service quality. MIS Q. 21(2), 223–240 (1997). https://doi.org/10. 2307/249421 25. Kim, D.: CTQ for service quality management using web-based VOC: with focus on hotel business. J. Theoret. Appl. Inf. Technol. 96(22), 7464–7472 (2018) 26. Kim, D., Yu, S.J.: Hotel review mining for targeting strategy: focusing on Chinese free independent traveler. J. Theoret. Appl. Inf. Technol. 95(18), 4436–4445 (2017) 27. Ladhari, R.: Alternative measures of service quality: a review. Managing Serv. Qual. 18(1), 65–86 (2008). https://doi.org/10.1108/09604520810842849

172

F. Vencovsk´ y

28. Leong, L.Y., Hew, T.S., Lee, V.H., Ooi, K.B.: An SEM-artificial-neural-network analysis of the relationships between SERVPERF, customer satisfaction and loyalty among low-cost and full-service airline. Expert Syst. Appl. 42(19) (2015). https://doi.org/10.1016/j.eswa.2015.04.043 29. Li, Y.N., Tan, K.C., Xie, M.: Measuring web-based service quality. Total Qual. Manage. 13(5), 685–700 (2002). https://doi.org/10.1080/0954412022000002072 30. Lin, S.M.: Analysis of service satisfaction in web auction logistics service using a combination of Fruit fly optimization algorithm and general regression neural network. Neural Comput. Appl. 22(3–4), 783–791 (2013). https://doi.org/10.1007/ s00521-011-0769-1 31. Lo, S.: Web service quality control based on text mining using support vector machine. Expert Syst. Appl. 34(1), 603–610 (2008). https://doi.org/10.1016/j. eswa.2006.09.026 32. L´ opez, A., Detz, A., Ratanawongsa, N., Sarkar, U.: What patients say about their doctors online: a qualitative content analysis. J. Gen. Intern. Med. 27(6), 685–692 (2012). https://doi.org/10.1007/s11606-011-1958-4 33. Ma, X., Zeng, J., Peng, L., Fortino, G., Zhang, Y.: Modeling multi-aspects within one opinionated sentence simultaneously for aspect-level sentiment analysis. Future Gener. Comput. Syst. 93, 304–311 (2019). https://doi.org/10.1016/j.future.2018. 10.041 34. Miranda, M.D., Sassi, R.J.: Using sentiment analysis to assess customer satisfaction in an online job search company. In: Abramowicz, W., Kokkinaki, A. (eds.) BIS 2014. LNBIP, vol. 183, pp. 17–27. Springer, Cham (2014). https://doi.org/10.1007/ 978-3-319-11460-6 2 35. Mogaji, E., Erkan, I.: Insight into consumer experience on UK train transportation services. Travel Behav. Soc. 14, 21–33 (2019). https://doi.org/10.1016/j.tbs.2018. 09.004 36. Palese, B., Usai, A.: The relative importance of service quality dimensions in Ecommerce experiences. Int. J. Inf. Manage. 40, 132–140 (2018). https://doi.org/ 10.1016/j.ijinfomgt.2018.02.001 37. Palese, B., Piccoli, G.: Online reviews as a measure of service quality. In: 2016 Pre-ICIS SIGDSA/IFIP WG8.3 Symposium, Dublin 2016 (2016) 38. Parasuraman, A., Berry, L.L., Zeithaml, V.A.: Refinement and reassessment of the SERVQUAL scale. J. Retail. 67(4), 420–450 (1991) 39. Parasuraman, A., Zeithaml, V.A., Berry, L.L.: SERVQUAL: a multiple-item scale for measuring consumer perceptions of service quality. J. Retail. 64(1), 12–40 (1988) 40. Park, E., Jang, Y., Kim, J., Jeong, N.J., Bae, K., del Pobil, A.P.: Determinants of customer satisfaction with airline services: an analysis of customer feedback big data. J. Retail. Consum. Serv. 51(April), 186–190 (2019). https://doi.org/10.1016/ j.jretconser.2019.06.009 41. Pitt, L.F., Watson, R.T., Kavan, C.B.: Measuring information systems service quality: concerns for a complete canvas. MIS Q. 21(2), 209–221 (1997). https:// doi.org/10.2307/249420 42. Qu, Z., Zhang, H., Li, H.: Determinants of online merchant rating: content analysis of consumer comments about Yahoo merchants. Decis. Support Syst. 46(1), 440– 449 (2008). https://doi.org/10.1016/j.dss.2008.08.004 43. Ramanathan, R., Karpuzcu, H.: Comparing perceived and expected service using an AHP model: An application to measure service quality of a company engaged in pharmaceutical distribution. Opsearch 48(2), 136–152 (2011). https://doi.org/ 10.1007/s12597-010-0022-1

Service Quality Evaluation Using Text Mining

173

44. Ray, P., Chakrabarti, A.: A mixed approach of deep learning method and rulebased method to improve aspect level sentiment analysis. Appl. Comput. Inform. (2019). https://doi.org/10.1016/j.aci.2019.02.002 45. Rezaeinia, S.M., Rahmani, R., Ghodsi, A., Veisi, H.: Sentiment analysis based on improved pre-trained word embeddings. Expert Syst. Appl. 117, 139–147 (2019). https://doi.org/10.1016/j.eswa.2018.08.044 46. Song, B., Lee, C., Yoon, B., Park, Y.: Diagnosing service quality using customer reviews: an index approach based on sentiment and gap analyses. Serv. Bus. 10(4), 775–798 (2015). https://doi.org/10.1007/s11628-015-0290-1 47. VanDyke, T.P., Kappelman, L.A., Prybutok, V.R.: Measuring information systems service quality: concerns on the use of the SERVQUAL questionnaire. MIS Q. 21(2), 195–208 (1997). https://doi.org/10.2307/249419 48. Vanparia, B., Tsoukatos, E.: Comparision of SERVQUAL, SERVPERF, BSQ and BANKQUAL scale in banking sector. In: Confronting Contemporary Business Challenges Through Management Innovation, pp. 2405–2430 (2013) 49. Xiong, S., Lv, H., Zhao, W., Ji, D.: Towards Twitter sentiment classification by multi-level sentiment-enriched word embeddings. Neurocomputing 275, 2459–2466 (2018). https://doi.org/10.1016/j.neucom.2017.11.023 50. Yadav, A., Vishwakarma, D.K.: Sentiment analysis using deep learning architectures: a review. Artif. Intell. Rev. 53(6), 4335–4385 (2019). https://doi.org/10. 1007/s10462-019-09794-5 51. Yang, Z., Jun, M., Peterson, R.T.: Measuring customer perceived online service quality: scale development and managerial implications. Int. J. Oper. Prod. Manage. 24(11), 1149–1174 (2004). https://doi.org/10.1108/01443570410563278 52. Zhao, G., Qian, X., Lei, X., Mei, T.: Service quality evaluation by exploring social users’ contextual information. IEEE Trans. Knowl. Data Eng. 1–1 (2016). https:// doi.org/10.1109/TKDE.2016.2607172

Making Use of the Capability and Process Concepts – A Structured Comparison Method Anders W. Tell(B)

and Martin Henkel

Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden {anderswt,martinh}@dsv.su.se

Abstract. There are discussions in the business architecture, enterprise architecture, process management, and IT communities regarding the relation between the concepts of capability and processes, their similarities and differences. Some practitioners claim that capability and processes are the same concept while other emphasize the differences. In this paper, we present a method that generates an explanatory model of the similarities and differences between the two concepts. To demonstrate the method, we apply the method to well-known definitions of both concepts. The aim of this paper is to present a pragmatic method that enables explanations of differences between the two concepts of process and capability and thereby support comparative discussions by business modelers, enterprise analysts and architecture framework designers. Besides explaining the differences between the two concepts the explanatory model highlights the particular benefits of using the capability concept. Keywords: Capability · Capability management · Process · Business process · Process management · Enterprise modeling · Enterprise architecture

1 Introduction The concepts of capability, and capability management are relatively new in comparison with the more mature concept of process. As a sign of longevity, the term “process” has taken on a large number of senses over the years, ranging from 5-6-level classification frameworks (APQC) [1], to an ontological meaning - an occurrence in time and space [2]. The concept of capability has been used to describe business structures and business advantages in terms of core capabilities [3]. More recently it has only been promoted as a way to perform structured analysis when designing IT systems [4]. Amongst practitioners and business architecture framework builders [5], the debate rage high about the best use of the two concepts, their similarities, and differences. For some, capability is the same as an abstract process. The on-going debate illustrates a proliferation of process and capability definitions and approaches with no broad agreement on their shared nature. A problem is that researchers and participants in comparative discussions may not be clear on what assumptions, and the usage-situations that underpin their reasoning. © Springer Nature Switzerland AG 2020 R. A. Buchmann et al. (Eds.): BIR 2020, LNBIP 398, pp. 174–188, 2020. https://doi.org/10.1007/978-3-030-61140-8_12

Making Use of the Capability and Process Concepts

175

This may lead to the creation of definitions that include several specific additions or mix-in’s from other concepts that go far beyond a general sense of what a process or capability are in essence. Being unclear in this way may lead to continued proliferation of conceptualizations. The research question addressed in this paper is: Can perceived similarities and differences between the two concepts of capability and process be analyzed and explained in a structured way? This question is part of a design science inquiry into the requirements and design aspects of multi-perspective and architectural viewpoints [6], that address varying aspects, including capability concerns of an enterprise or organization. Such capability-oriented perspective must be well integrated with adjacent, and related viewpoints and concepts such as process-oriented viewpoints, which provide central parts of enterprise architecture frameworks [7]. The contribution of this paper is to present a general method that generates a transformation model that can be used to interpret and explain the similarities, differences and distance between the concepts of process and capability. The analysis and explanations are formulated in an explanatory model. The method and explanatory model aims to support comparative and formative discussions by business modelers, enterprise analysts and architecture framework designers, by enabling them to be clear on their concepts in their own usage situations and frameworks. The approach of this paper is firstly to define a method that enables comparisons between concepts and their corresponding conceptual models. Secondly, to apply the method to two concepts under analysis, ‘capability’ and ‘process’. The method includes several steps, with specific comparisons and analysis, which are used to explain similarities, differences, and to discuss intuitions held by communities of practice. In the application of the method a set of well-known and used definitions have been chosen as inputs to the method and the creation of the explanatory model. An examination of the transformation model shows that a small number of belief revision bridge the semantical distance between the concepts process and capability. In particular, the explanatory model shows that there are differences in terms of the thematic roles [8] (input, output) vs. (source, result), modality (capability describes a possibility - CAN, while process focus on action - DO) and substantiality (capability explicitly includes capacity, while process is often intuitively related to capacity). The method and transformation model can be applied in several different ways. Firstly, the method can be used as a tool for incorporating the concept of capability into existing business or EA frameworks by enabling structured comparisons with existing concepts. Secondly, it can be used to explain the utility of the concept of capability compared to the concept of process. The remainder of the paper is structured as follows: In Sect. 2 the method is described, and in Sect. 3 the method is applied to compare the concept of capability and process. Section 3 also contains the result of the comparison in terms of the parts of the transformation model. In Sect. 4 implications of the model is examined, while Sect. 5 adds conclusions.

176

A. W. Tell and M. Henkel

2 Overview of the Method The purpose of the method is to enable analysis and explanations of similarities, differences and distances between two concepts. The method is built on the principles of belief revision and lattice-of-theories [9], which provides a structure for formulating and formally analyzing how concepts and their senses relate to each other. The method is built on the principles of belief revision [10, 11] and lattice-of-theories [9, 12], which provides a structure for formulating and formally analyzing how concepts and their senses relate to each other. We selected this approach because it provides a straightforward tool for exploring and explaining similarities and differences between two conceptual models through a set of belief revisions. Such sets of belief revisions can be viewed as the epistemic or conceptual distance a modeler or framework builder needs to travel from one concept to another. The approach addresses the narrow conceptual analysis task of comparing two defined concepts. As such, the task is performed after methods that build concepts from observations (e.g. grounded theory) or from sets objects and their properties (e.g. formal concept analysis). Furthermore, a lattice of theories provides a straightforwardly mapping between the underlying meanings (senses) of words or morphemes in a natural language [9], which makes it possible to integrate textual conceptual definitions with conceptual models. Thirdly, the lattice-of-theories is based on first-order logic which is argued to be easier to use when reasoning about the axioms of a theory [9, 10]. The inputs to the method are two concepts, a set of comparative questions, and a top-level theory. The output of the method is an explanatory model. 2.1 Structural Parts of the Method The method is built on the following structural parts: • A lattice-of-theories containing the concepts and transformations of concepts into other concepts by belief revisions. • Definitions, in text form, of the concepts under analysis. • A set of comparative questions that guides the type of comparisons performed in the method. • A set of conceptual models (theories) that populate the lattice-of-theories. A conceptual model represents a concept and consists of essential entities, relations and propositions [13]. • A top-level theory and model that provides the comparative base. • A set of transformation models that represents the distance between two conceptual models. It is formulated as belief revisions that form the path between a conceptual model and another conceptual model. The result of the method is an explanatory model that consists of the concept definitions, their conceptual models, the populated lattice-of-theories, questions, transformation models, and the analysis of similarities, difference and distances.

Making Use of the Capability and Process Concepts

177

Fig. 1. Illustration of the parts of the method

Figure 1 illustrates the structural parts of the method when comparing the two concepts of capability and process. Foundation of the Method. The foundation for the method is built on the principles of belief revision and lattice-of-theories [9], which provides a mechanism for formulating how theories relate through a series of step-wise refinements and revisions from one theory to another. By applying (belief) revisions in this way, it is possible to create a lattice of related theories, leading to a lattice-of-theories. In order to create the lattice, theories need to be represented by belief-representing sentences, possibly expressed in a formal language such as common logic [14]. According to John Sowa [9] four key belief revisions can be defined: • Contraction: a sentence is removed forming another simpler belief set and theory. • Expansion: a sentence is added, and nothing is removed. • Revision: a combination of contraction and expansion where a sentence is added at the same time as other sentences are removed. • Analogy: the structure of the belief sentence is maintained but the names of types, relations and individuals in the belief sentence, axiom are changed. Concepts and Corresponding Conceptual Models. The starting point for the method is definitions of the analyzed concepts. These definitions are interpreted, translated and formalized into conceptual models. An alternative starting is to directly use two conceptual models of the concepts that should be compared. This alternative starting point avoids the interpretation and translation steps, thus making the method less susceptible to subjectivity. A conceptual model is treated as a micro-theory and entered into the lattice-oftheories. Here, a conceptual model is linearized into a list of beliefs representing sentences incorporating essential concepts, relations, and propositions. When a new conceptual model is added to the lattice-of-theories it can also be reformulated as a set of belief revisions starting with a more general conceptual model and theory in the lattice. These belief revisions are used to form transformation models.

178

A. W. Tell and M. Henkel

The choices of definitions are guided by the chosen set of analytical questions and comparisons that the explanatory model should address. The use of the lattice-of-theories enables analysis of what authors of specific concepts and definitions have explicitly or perhaps tacitly added with respect to more general concepts definitions. Transformation and Comparison. A transformation model is formulated through a shortest path traversal in the lattice-of-theories between one conceptual model and another conceptual model. The transformation model consists of a list with all belief revisions encountered during the traversal. A transformation model is used to represent the distance between two concepts, and to analyze similarities and differences between two concepts. Furthermore, the transformation model is used to generate insights based on the comparative questions. Explanatory Model. The output of the method is an explanatory model, which consists of the concepts under analysis, their definitions, comparative questions, the populated lattice-of-theories, transformations models and an analysis of the transformations. The analysis addresses similarities, differences and distances between two concept, and a discussion of the views and intuitions held by communities of practices. 2.2 Outline of Method Steps The method consists of the following functional steps: Step 1: Define an upper level ontology as comparative base Choose an upper level ontology that provides the top-theory in the lattice of theories, i.e. the ontological and the comparative base. All other conceptual models inserted into the lattice-of-theories can be expressed as belief revisions with this top-theory as the comparative base. As will be described later, we choose Basic Formal Ontology (BFO) for the comparison of the concepts of process and capabilities. Step 2: Define questions that determine the desired types of comparisons Define questions that determine and frame the comparisons that the method should address. The questions should be chosen so that the comparisons can provide interesting answers and insights. These questions and types of comparisons will guide the next step where definitions are chosen as inputs to the method. When comparing process and capabilities we were guided by two questions, which lead to the comparison being done on both a general and a specific level. Step 3: Identify and translate concepts Choose definitions of each concept under analysis (process and capability in our case). The choice of definitions is guided by the previously choice of comparative questions. Translate each identified definition into a corresponding conceptual model This step may be skipped when starting with conceptual models of each concept, rather than textual definitions.

Making Use of the Capability and Process Concepts

179

Insert the conceptual model into the lattice of theories in relation to the more general comparative base. Step 4: Create a transformation model for each type of comparison Create a transformation model for each comparison through a shortest path traversal in the lattice-of-theories from one conceptual model. Populate the transformation model with all belief revisions encountered during the traversal. This list with belief revisions provides the base for further analysis and explanations. Step 5: Analyze transformation models and create the explanatory model Analyze each type of comparison and corresponding transformation model to generate insights. The analysis includes similarities, differences, distance between the concepts, and differences in intuitions held by communities of practices.

3 Application of the Method In this section, we apply the method and generate two explanatory models, one for each question and type of comparison that we choose (A and B). For the purpose of readability and brevity, details of the developed conceptual models, and the full list of belief revisions, have been omitted from this paper. 3.1 Step 1: Define Upper Level Ontology as Comparative Base As the first input to the method, the Basic Formal Ontology (BFO) [2] is chosen as the most general theory and comparative base for definitions of more specific concepts The BFO ontology was chosen over more information system oriented ontologies such as the Bunge-Wand-Weber [15], since it constitutes an upper-level ontology with a strong basis in natural sciences and promise to bridge natural, social and technically oriented applications. Furthermore, BFO is well researched and specifically supports the creation of lower-level domain specific ontologies [16]. Figure 2 shows a partial model of the BFO, where a continuant refers to “an entity that persists, endures, or continues to exist through time while maintaining its identity”, while occurrent refers to an entity that is bounded in time or space.

Fig. 2. Partial conceptual model for BFO

180

A. W. Tell and M. Henkel

3.2 Step 2: Define Questions that Determine the Types of Comparisons The second input to the method is the choice of comparative questions, which guide the following choice of definitions in step 3. In the research efforts leading up to this paper a number of questions were formulated and explored. In this paper the results of the exploration of the following two questions are reported. A) What are the similarities and differences between the two concepts in their minimal and essential senses? This question leads to the comparison of process and capability on a general level. See Sect. 3.3. B) What are the similarities and differences between commonly held senses of the two concepts? In essence this question leads us to, via transformation model, to directly compare two common and specific definitions of process and capability. See Sect. 3.4. 3.3 Comparison A: Minimal and Essential Senses Step 3: Identify and translate General Process The BFO provides all concepts necessary for a definition of a General Process, which in its minimal and essential representation is an occurrence in time and place where some continuants participate (Fig. 3). Definition: Occurrent that has temporal proper parts and for some time t, p s-dependson some material entity at t. _has-participant_ is an instance-level relation between a process, a continuant, and a temporal region at which the continuant participates in some way in the process. [Source: BFO]

Fig. 3. Conceptual model for General Process

Step 3: Identify and translate General Capability Looking at an organization through capability lenses provides an important and useful

Making Use of the Capability and Process Concepts

181

perspective on organizations and other systems. An intuitive way to represent a capability, attributed with an organization, is that an organization is able to, has power to, or has potentiality to achieve some results. Intuitively, people, machines, computers in the organization can do some work in a process (participate in the process) with some other entities such as information, documents, tools, material, and culture, which ultimately leads to some result. In this paper, we apply a general capability pattern that bridges multiple domains and applications [17, 18] and [19]. According to this, a capability can be represented as a pattern with 6 elements, where some source entities of an organization, system, or person can lead-to, achieve, accomplish, or generate some result(s). This general definition of the concept of Capability was chosen since it provides a definition that can be ontologically grounded in BFO. Furthermore, it promises to provide a general base for construction and reformulations of more specific variants of the concept of Capability as shown in the paper [18] and [19]. Furthermore, this pattern lends itself to integration with adjacent concepts such as ‘process’. We thus identify that a general capability is defined according to the following pattern: Definition: “A of to by/with through

• possessor, is the portion of reality to which the capability is dependent on, attributed to, owned by, accessed by, or part of. Examples: organization, person, machine, enterprise, or system. • possibility, is a possibility for something to come into existence by some source, through a lead-to mechanism. Examples: probability, or disposition. • source, are the input factors of a capability, which participate in a thematic ‘source’ role [8]. Examples: things, assets, facilities, resources, people, knowledge, processes, machines, culture, or learning. • result, is the to-part, which participates in a determinant thematic ‘product’ role; the accomplishment, the achievement, effect, consequence. Examples: activity is done, delivered goods, performed service, or fulfilment of objectives. • lead-to, is the way source(s) can lead-to the result. Examples: natural process, prescribed or described work, causality, or mathematical formula. • substantiality, is the strength of the lead-to mechanism and source factors. Examples: capacity of sources, or demonstrated achievement of results. We have chosen a straightforward interpretation that does not for example incorporate a modal logic extension to BFO to cater for the Possibility modality (Fig. 4). Step 4: Create a transformation model - General In this section we present an shortened description of the transformation from the General Process to the General Capability conceptual models. The complete transformation contains the full set of belief revisions needed in order to go from General process to General Capability, we report here on a subset of those. The transformation, via belief revisions, revealed a number of differences and similarities:

182

A. W. Tell and M. Henkel

Fig. 4. Conceptual model for General Capability

• The General Process that is a bfo:process and an happening in time and space is revised into a bfo:continuant. This revision indicate an essential distinction between the concepts of General Process and General Capability. • Two essential General Capability characteristics with corresponding relations are added as expansions: Possibility and Substantiality entities are added, and corresponding is_possible, and has_substantiality relationships. • Two (thematic) roles and relations, source and result, and the Result entity are added as expansions. • A possessor relation is added as an expansion, indicating that a capability is dependent on something else, while a process exists on its own. • A lead-to relation is added as an expansion. – The lead-to construct of the General Capability is very general and references something that change sources to results. The General Process can be viewed as one way to represent changes of sources to results. As such, the pattern “sourcerole lead-to result-role” can be viewed analogous to a construct such as “input-role process output-role”.

Step 5: Analyze transformation models and create the explanatory model The two general concepts share a minimal semantical base, and exhibit distinguishing characteristics, and cannot be considered in essence as equivalent. Section 4 - Implications - provides a more in-depth analysis of the transformation, differences, commonalities and explanations. 3.4 Comparison B: Commonly Held Senses Step 3: Identify and translate Commonly used Specific Process Within the domain of business management there exist many definitions of the concept of process. In many cases the term used is ‘Business Process’ which indicates an organizational scope [20]. We have chosen to analyze the definition by Hammer and Champy [21]. This definition is chosen as a reasonable representative from the process

Making Use of the Capability and Process Concepts

183

community since it is well known, used, and widely referenced, Furthermore, it is small and straight-forward to interpret. Specific Process definition: ”a collection of activities that takes one or more kinds of input and creates an output that is of value to the customer.”

The conceptual model of Specific Process (Fig. 5) is interpreted and derived from the definitional sentence itself. This may introduce a bias in the comparison due to possible omissions of relevant concepts, relations and propositions. However, we assess that the definition itself is sufficiently detailed for the comparison and analysis. It should be noted that starting with a conceptual model, rather than textual definitions, removes the interpretation step, and reduces interpretation biases.

Fig. 5. Conceptual model for Specific Process

In the interpretation, we add the following (grayed) parts to the General Process to arrive at a model for a specific process (Fig. 5): Activity as Process part, a Customer and an Output that is valuable to the Customer, two (thematic) roles input and output. Step 3: Commonly used Specific Capability From Helfat and Peteraf work on capabilities [22] we have chosen a Specific Capability definition that we assess share common characteristics with other management-oriented definitions of organizational capabilities. This definition was chosen as a reasonable representative from the capability community since it is well known, used, and widely referenced. Specific Capability definition: ”the ability of an organization to perform a coordinated set of tasks, utilizing organizational resources, for the purpose of achieving a particular end result.” [22]

In the interpretation, we add Organization as the possessor (Fig. 6). Organizational Resources play the role of sources, and a particular End Results plays the role of result. A Task is a coordinated process that plays the role of lead-to, and is performed by the doer Organization to achieve an End Result. We have interpreted Ability as an indication of a possibility that in general sense something can be performed. The concept of Ability is problematic since the term is

184

A. W. Tell and M. Henkel

Fig. 6. Conceptual model for Specific Capability

undefined in this definition of Capability, and it is often also used synonymously with Capability leading to a circular definitions. Step 4: Create a transformation model In this section we present an shortened description of the transformation from the Specific Process to the Specific Capability conceptual models. The traversal reveals a number of differences and similarities. In addition to the transformation between General Process and General Capability we find the following belief revisions. • The Activity entity is removed as a contraction. • The Organization entity is added as an expansion, and the general possessor relation is revised to refer to Organization. • The Task and the lead-to relation are added as expansions. – The Task can be viewed as a concretization of the abstract and general lead-to part of capability. • The relations perform and coordination are added as expansions. • The Customer entity and the value-to relation are removed as contractions. • The (thematic) roles input, output and the Output entity from Specific Process are removed as contractions, and the (thematic) roles source, result and the Organizational Resource, End Result entities from Specific Capability are added as expansions. – The Specific Process roles (input, output) can be viewed as analogous to the Specific Capability roles (source, result), and Output as analogous to the End Result. Here, the Organizational Resource that plays the source role is more specific than the Entity that plays the input role of the Specific Process. – Although, when looking at the details of the entities playing (thematic) roles [8] (Output, Result, End Result) are likely to be defined slightly differently.

Making Use of the Capability and Process Concepts

185

Step 5: Analyze transformation models and create the explanatory model The two concepts exhibit general and essential differences and also several differences in detail. However, there exists several similarities and analogous patterns. The presence of the role pattern and the addition of Task to the Capability definitions can both contribute to a perception that the distance between the two concepts is short. The Sect. 4 Implications provides a more in-depth analysis of the transformation, differences, commonalities and explanations.

4 Implications In this section we present an examination of key beliefs revisions. The belief revisions from the concept of capability to process reveals that each of them can be considered as natural and intuitive to make tacitly by analysts, architects and modelers that regularly work with processes. As such, they explain that process and capability on a surface level may be perceived as essentially the same thing [20]. However, the belief revisions between the concept of process and capability point out differences, and offer a possibility to construct capability and process models that can complement each other. The explanatory model constructed and described in this paper reveals that the semantical distance between the concepts of capability and process is closely related to four (4) groups of belief revisions: • The thematic roles played by some entities that participate in a process as input and output are analogous to capability roles as source and result. In process management and process modeling oriented practices [21], input and output roles are frequently used. The distinctions between them and the analogous roles source and result can be considered as not relevant. These differences can conceptually and practically be considered as small, not relevant, and easy to ignore by practitioners. If this distinction is considered as not relevant then the conclusion is that an analysis of roles may be performed similarly for both process and capability analysis. • In a process, the step-wise transformation of inputs into outputs is revised into that a capability ‘sources lead-to results’. In the context of organizations, processes and activities can be viewed as the mechanisms by which outputs are created or converted. Here, a process or activity is performed by some doer, often people, ICT-applications, or machines. In this context, it can be natural to tacitly assume that the capability ‘lead-to’ and the task is the same as the mechanisms of process or activity. We label capabilities that are based on the fact that the lead-to is a (organizational) process as processual capabilities. These kind of processual capabilities are frequently found in management literature [22]. However, a structured process is just one way of describing a capability lead-to. Thus, based on the analysis, we conclude that a capability offers a more rich and free form of describing change – the lead-to. For example, the effect may be described as state changes, changes in mindset and so on. • The possibility modality of capability is added to process. This means that a capability is analogous to processes that can happen and not with processes that actually do or did happen. This distinction between can and do may seem trivial, but it emphasizes the difference in focus of process and capability analysis. The do focus of processes

186

A. W. Tell and M. Henkel

naturally put focus on activities, while the can focus of capabilities put focus on the possibility of something that leads to some effect. In general, if distinctions such as those between ‘possible’, ‘described’, ‘prescribed’, ‘predicted’, ‘actual’, ‘desired’ and ‘intended’ are not kept in mind then discussions can easily become confusing. • A capability has substantiality added to process. A capability is comparable with a process that is performed with capacity. In an organizational setting, it can be easy to tacitly assume that all processes are performed with some degree of capacity. Thus, this difference can be easy to overlook. The effect of the difference is that when performing analysis of capabilities, extra effort must be put in analyzing the substantiality in terms of available resources needed for the capability. Typically, a capability does not exist if there is no machines, personnel or other resources backing it. For organizations that are process oriented or applies business process management, the question arises whether both concepts are needed? Based on the analyzed conceptual models, clear and distinct functions of each concept can be found. A capability view offers a performance, resource allocation (investment), and capacity view of processes since the idea of capability uniquely includes possibility, substantiality (expressed for example as capacity, or power), and lead-to (expressed as causality, inference, or) elements. Thus, the use of the capability concept adds value to a process analysis when the mentioned aspects are needed. A capability model cannot replace process models, it complements them by adding potentiality, capacity, resource allocation (investment), change (transformation), linkage, and causality reasoning. As such, it can become a valuable add-on to benefits realization management [23] and performance management [24]. The relative closeness between the concepts of capability and process introduces the question of transferal or projection. Have some meanings from the idea of process simply been copied, transferred or projected into the idea of capability? A capability is a dependent entity that is grounded in underlying concepts for its existence. This introduce the problem and question of what knowledge is added by creating a capability model that is in effect a mirror image of a process model? For example, the key micro economical aspect and function of “Setting the right price for a service or product” can give rise to the formulation of a business capability “Set Right Price”. In this case the question becomes, how much knowledge is added to an organization by the formulation of this capability? After all, knowledge about the underlying micro economical concept of setting price is commonly available, must be known before the formulation of the capability and an organization is likely to have practices (or formal processes) in place to set the right price.

5 Conclusions In this paper, we started with the observation that there exist varying perceptions of the similarities and differences between the two concepts: process and capability, and we asked if a structured method can explain similarities and differences. The presented

Making Use of the Capability and Process Concepts

187

method that compares lists with belief sentences has made the comparisons and analysis possible. The resulting explanatory model, that consists of concepts, their conceptual models, and transformations provides reasonable explanations of differences and similarities between the capability and process concepts. The presented method is useful for framework and theory builders (economical, business, analysis, process, enterprise architecture, process management), and in standardization efforts when trying to prevent slippage in argumentation during discussions, debates, and consensus making. The method provides a mean to fixate argumentations since it breaks down definitions into smaller parts. Thus, no concepts, relations, or propositions can be considered in a discussion without being part of the conceptual models and lists with belief revisions. The explanatory model revealed that there is a short semantical distance between the concepts of capability and process, which is closely related to four groups of belief revisions. In particular, capability adds the elements of possibility and substantiality as compared to process. The method and explanatory model are successful in the sense that the comparisons and analysis generated reasonable explanations to perceived similarities and differences amongst practitioners in communities of practices. Well known, used, and widely referenced process and capability definitions are used in the explanatory model. However, the general method and the resulting transformation and explanation models are sensitive the choice of analyzed definitions. A future research topic is to apply the method to a broader range of definitions, especially to pairs of definitions from a framework or theory where they both coexist. A broader application of the method can shed a light on the varying conceptions on a larger scale. Another research topic is to explore the practical consequences of treating concepts of process and capability as essentially different, to assess their unique contribution to (economical, business, and organizational) analysis, design and planning methods.

References 1. APQC: APQC’s Process Classification Framework. https://www.apqc.org/process-perfor mance-management/process-frameworks 2. Smith, B., Grenon, P.: Basic Formal Ontology (BFO). http://purl.obolibrary.org/obo/bfo 3. Javidan, M.: Core competence: what does it mean in practice? Long Range Plann. 31, 60–71 (1998) 4. Grabis, J., et al.: D5.3 The final version of Capability Driven Development methodology, pp. 1–267 (2016) 5. Dugan, L.: Business Architecture and BPM - Differentiation and Reconciliation. Business Architecture Guild (2014) 6. ISO/IEC, IEEE: ISO/IEC 42010:2011 Systems and software engineering — Architecture description (2011) 7. The Open Group: TOGAF Version 9.1 - The Open Group Architecture Framework (TOGAF) (2009) 8. Dowty, D.: Thematic proto-roles and argument selection. Language 67, 547–619 (1991) 9. Sowa, J.F.: Theories, Models, Reasoning, Language, and Truth. http://www.jfsowa.com/logic/ theories.htm

188

A. W. Tell and M. Henkel

10. Alchourrón, C.E., Gärdenfors, P., Makinson, D.: On the logic of theory change: partial meet contraction and revision functions. J. Symbolic Logic 50, 510–530 (1985) 11. Logic of Belief Revision. https://plato.stanford.edu/entries/logic-belief-revision/ 12. Gruninger, M., Hahmann, T., Hashemi, A., Ong, D., Ozgovde, A.: Modular first-order ontologies via repositories. Appl. Ontol. 7, 169–209 (2012) 13. Object Management Group: Semantics of Business Vocabulary and Business Rules (SBVR) v1.3. https://www.omg.org/spec/SBVR/1.3 14. ISO/IEC: ISO/IEC 24707:2007 Common Logic (CL): a framework for a family of logic based languages (2007) 15. Wand, Y., Weber, R.: On the ontological expressiveness of information systems analysis and design grammars. J. Inf. Syst. 3, 217–237 (1993) 16. Smith, B., Ceusters, W.: Ontological realism: a methodology for coordinated evolution of scientific ontologies. Appl. Ontol. 5, 139–188 (2010) ˇ 17. Tell, A.W., Henkel, M., Perjons, E.: A method for situating capability viewpoints. In: Repa, V., Bruckner, T. (eds.) BIR 2016. LNBIP, vol. 261, pp. 278–293. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45321-7_20 18. Tell, A.W.: What capability is not. In: Johansson, B., Andersson, B., Holmberg, N. (eds.) BIR 2014. LNBIP, vol. 194, pp. 128–142. Springer, Cham (2014). https://doi.org/10.1007/978-3319-11370-8_10 19. Tell, A.W.: Designing Situated Capability Viewpoints: adapting the general concept of capability to work practices (2018) 20. Harmon, P.: Harmon on BPM: Processes and Capabilities. https://www.bptrends.com/proces ses-and-capabilities/ 21. Hammer, M., Champy, J.: Reengineering the Corporation. Harper Collins, New York (1993) 22. Helfat, C.E., Peteraf, M.A.: The dynamic resource-based view: capability lifecycles. Strateg. Manag. J. 24, 997–1010 (2003) 23. Serra, C.E.M., Kunc, M.: Benefits Realisation Management and its influence on project success and on the execution of business strategies. Int. J. Project Manage. 33, 53–66 (2015) 24. Smither, J.W., London, M. (eds.): Performance Management. Jossey-Bass, New York (2008)

Value Creation and Value Management

Designing Causal Inference Systems for Value-Based Spare Parts Pricing An ADR Study at MAN Energy Solutions Tiemo Thiess1(B)

and Oliver Müller2

1 IT University of Copenhagen, Rued Langgards vej 7, 2300 Copenhagen, Denmark

[email protected] 2 Paderborn University, Warburger Str. 100, 33098 Paderborn, Germany

[email protected]

Abstract. In the wake of servitization and increased aftersales competition, original equipment manufacturers (OEMs) begin to change their pricing strategies from traditional cost-based to value-based pricing. As value-based pricing is much more individualized and data-driven, it becomes increasingly important to validate one’s pricing hypotheses by estimating the causal effects of pricing interventions. Randomized controlled trials (RCTs) are conceptually the best method for making such causal inferences. However, RCTs are complicated, expensive, and often not feasible. MAN Energy Solutions was facing a similar challenge. In reaction to his, we conducted an action design research study (ADR) in which we designed and implemented a novel causal inference system for value-based spare parts pricing. Based on this, we formalize design principles for the broader class of such systems that emphasize the need for pre-aggregation when dealing with lumpy aftersales data, scalability when having to run numerous analyses on heterogenous spare parts portfolios, and incorporating unaffectedness conditions that help to avoid spillover effects caused by often interdependent spare parts purchases. Also, they encourage analysts to take pre-intervention predictability into account when interpreting causal effects, to incorporate a manipulated treatment variable into the causal inference model, and to present the system output in interactive user interfaces to aid understanding and acceptance. Keywords: Causal inference · Value-based pricing · Action design research · Spare parts · Industrial marketing

1 Introduction Original equipment manufacturers (OEMs) have, in the last years, started to focus on the usage phase of their products, instead of focusing on the product development phase only [1]. In the product usage phase, they can generate earnings via spare parts sales and through-life engineering services like maintenance, repair, and overhaul [2]. This requires OEMs to develop new business models and customer-centric processes, a transformation that is enabled by new technology and data-driven approaches [3–6]. However, due to an intense competition of third-party companies that copy and sell non-original © Springer Nature Switzerland AG 2020 R. A. Buchmann et al. (Eds.): BIR 2020, LNBIP 398, pp. 191–204, 2020. https://doi.org/10.1007/978-3-030-61140-8_13

192

T. Thiess and O. Müller

spare parts at low prices, the potential aftersales gains for OEMs are threatened [7]. Even though pricing has the highest impact on earnings before taxes and interest rates (EBIT) [8], OEMs struggle to realize its potentials, due to difficulties arising from vast portfolios of often thousands of different spare parts. In reaction to this, they employ undifferentiated cost-based pricing strategies [7]. Instead, research suggests that OEMs should shift from cost-based to value-based pricing [8–11] strategies. In a value-based pricing approach, one sets prices based on the value that materials provide to the customers, e.g., expressed in customers’ willingness to pay. This approach incentivizes OEMs to innovate and develop their products according to their customers’ needs and for maximal customer utility [7]. Nevertheless, implementing value-based pricing increases the complexity of the pricing function substantially, as one cannot use one-size-fits-all solutions. Instead, one needs to approach pricing much more individualized, which involves developing numerous, often data-driven [11–13], pricing approaches based on various assumptions and hypotheses. However, testing such hypotheses requires more than uncontrolled before and after analyses [14] or simple correlational approaches [15]. Instead, researchers generally perceive randomized controlled trials (RCTs) as the best method for making such causal inferences [16, 17]. However, in practice, RCTs are complicated, expensive, and often not feasible [17, 18]. Based on this, we formulate our main research question: How to design causal inference systems that support value-based spare parts pricing decisions? Here we also follow calls for more research on applications of artificial intelligence (AI) systems in B2B marketing contexts in general [19] and B2B pricing strategies in particular [20]. To answer the research question, we conducted an action design research study (ADR) [21] at MAN Energy Solutions, one of the largest OEMs in the maritime industry and a world market leader for large-bore diesel engines. MAN Energy Solutions had recently started an initiative for implementing more value-based pricing strategies [8]. As a result of this, prices for more than 30.000 spare parts at one of the company’s headquarters had been either de- or increased. Due to this, the company was facing the challenge of assessing and quantifying whether the interventions were successful, and the hypotheses behind the different value-based pricing initiatives could be validated. Historically, the company had used uncontrolled before and after studies in which one compares an outcome (e.g., sales volume) for a treated unit (material for which the price was changed), before and after an intervention (price change). While such approaches can be indicative, they are not suitable for estimating causal treatment effects [14]. We identified this relevant field problem as a research opportunity to design and implement a novel system for value-based spare parts pricing support that is based on the state of the art of causal inference theory and addresses aftersales-specific challenges. Moreover, we go beyond the situated implementation of a system [22] and abstract some of the principles that underly our design to the class broader class of such systems. Thus, we follow the dual mission of information systems research (IS) to create utility for practitioners while contributing to the scientific body of knowledge [23]. We proceed as follows. The next section provides the background on the causal inference approaches that informed our initial system design. Then, we discuss our

Designing Causal Inference Systems for Value-Based Spare Parts Pricing

193

overall research method. We then describe the system that we designed and implemented at MAN Energy Solutions. After that, we present and discuss a set of design principles. The final section discusses implications for research, practice, limitations, and concludes the paper.

2 Causal Inference on Observational Data Causal inferences from observational data are challenging to make as one has to eliminate spurious correlations and rely heavily on the analyst’s subject-matter knowledge when specifying causal models [15]. Randomized controlled trials (RCTs) avoid such pitfalls when set-up ideally [16] and are often called the gold standard of causal inference [17]. In an RCT, one randomly assigns subjects to different groups so that each group, on average, is more or less similar and, thus, comparable. Then, one manipulates one variable of interest (the treatment or intervention), while keeping everything else equal. This way, one can be confident that the treatment was the only cause of changes in the subjects. In such a situation, one can calculate treatment effects by taking the difference in means of the outcome variables for the treated and untreated control or placebo groups (e.g. [16]). However, setting up RCTs is complex and often not feasible [24] as it is based on strict assumptions and requires random assignment to make inferences about how a treatment affects a population of subjects [16]. In industry, however, one is usually more interested in how the treated subjects affected the firm (the experimenter) [18]. Donald Rubin introduced the potential outcome framework to causal inference [25]. In this framework, one tries to compare what happened to a subject after receiving treatment (actually observed outcomes) with what would have happened had the subject not received treatment (the counterfactual outcomes). Inspired by this, researchers have developed quasi-experimental methods [26]. Such methods are based on observational data, but construct control groups from the data to estimate counterfactual outcomes. One of the most common approaches here is the difference-in-differences design (DID) [27, 28]. DID treat time-series data of the observed outcomes for treatment and control groups as cross-sectional data and model them via standard ordinary least squares (OLS) regression by adding a dummy variable for the pre-intervention period (zero for all records) and the post-intervention period (one for treatment records and zero for controls). This way, one can estimate average treatment effects by comparing the difference between pre- and post-period outcomes of the treatment with the difference between pre- and post-period outcomes of the control groups. The idea here is that the control group represents the counterfactual situation that would have occurred had the treatment group not received treatment. While much more robust than uncontrolled before and after studies [14], DID designs have some underlying assumptions that often do not fit reality [29]. Most of all, they usually only consider single points in time before and after the intervention (dummy variable) without considering how effects develop throughout several post-intervention periods [30]. Moreover, they assume independent and identically distributed (iid) random variables, which neglects the autocorrelation structure of time-series data and leads to biased estimates [31]. Newer approaches in the form of synthetic control methods (SCM) are more robust to biases. In such methods, one combines several potential control groups into one synthetic

194

T. Thiess and O. Müller

control group that, in the post-intervention period, represents the counterfactual outcome that one would have expected had the treatment not occurred [32, 33]. In the most common synthetic control approach as suggested by Abadie, Diamond and Hainmueller [33] and Abadie and Gardeazabal [32], one constructs a synthetic control as the weighted average of all potential controls (adding up to 1) that minimizes the difference between pre-intervention treatment and control matching variables. This approach has limitations, as it requires access to explainable matching variables (e.g., age or gender of subjects) and does not allow for non-convex optimization approaches in determining weighted averages of control groups [30]. According to Hal Varian, the chief economist at Google, alternatively, one can define the estimation of counterfactuals as a prediction problem [18, 34]: “[i]n this case, the counterfactual is the [prediction] of the outcome for the subject constructed using data from before the experiment. To implement this approach, one would normally build a model using time series methods such as trend, seasonal effects, autocorrelation […], and so on” (p. 7312). Brodersen et al. [30], Varian’s colleagues at Google, developed a related approach. While not strictly necessary, they encourage analysts to include potential control groups (subjects that are similar to the treated subjects but unaffected by the intervention). In general, however, in their method, the only aspects that matter when choosing variables to estimate counterfactuals are pre-intervention correlation with the outcome variable of the treated subject and that they were not directly affected by the intervention. Example variables could be a similar but unaffected material, seasonality, or an index of general economic development.

3 The Action Design Research Methodology We followed an action design research (ADR) process that was inspired by Sein et al. [21]. ADR is a design research (DR) “method [that] reflects the premise that IT artifacts are ensembles shaped by the organizational context during development and use. The method conceptualizes the research process as containing the inseparable and inherently interwoven activities of building the IT artifact, intervening in the organization, and evaluating it concurrently” (p. 37). This is different from traditional DR approaches, such as the one proposed by Hevner et al. [35] that explicitly separate artifact design and evaluation. ADR consists of four main stages: “problem formulation,” “building, intervention, and evaluation” (BIE), “reflection and learning,” and “formalization of learning” [21]. Following the method, in the problem formulation stage, we identified and conceptualized the research opportunity, and formulated our research question for the class of value-based spare parts pricing support systems (see Sect. 1). Furthermore, we conceptualized and informed our initial solution design with a review of B2B marketing and causal inference literature (see Sects. 1 and 2). Also, we set-up roles and responsibilities in the ADR team that, in its core, consisted of Oliver Müller as a researcher, Tiemo Thiess as both a researcher and developer and MAN Energy Solutions’ pricing manager for the aftersales business as an end-user and key informant and evaluator of the system. We assured the long-term organizational commitment with a formal research collaboration

Designing Causal Inference Systems for Value-Based Spare Parts Pricing

195

agreement that was part of a larger ADR program that one of the researchers conducted at the company for his Ph.D. studies. In the BIE stage, we built several prototypes before implementing the eventual system. Throughout this process, we continuously evaluated the artifacts. As our main criterion, we evaluated them in terms of their effectiveness, e.g., did the system help to solve the field problem? Also, we performed simple architectural analyses and white box tests (did the system execute without errors in the “technical infrastructure of the business environment?”) [35]. Moreover, we performed functional black-box tests, e.g., did the model predict the pre-intervention data satisfyingly well? Moreover, we evaluated the artifacts based on end-user feedback from demonstrations during status updates and stakeholder presentations (see Table 1). In the reflection, learning, and formalization stages, we formalized key learnings and design decisions as design principles (Sect. 5.1). To inform theorizing, design, evaluation, and understanding of the, we collected empirically rich observation notes of our main encounters at MAN Energy solutions (see Table 1). The data collection and analysis approach is an instance of design ethnography, a form of ethnography that relaxes strict objectivity assumptions and instead focusses on studying the whole design situation in which artifacts, stakeholders, and the researchers itself dynamically interact with each other [36]. Also, we had access to many of the companies’ systems, databases, and documents that further informed our study. Table 1. Study-related encounters at MAN Energy Solutions Meeting Type

Participants

#

h

Total (h)

Status update

Pricing manager, researchers

7

1

7

Stakeholder presentation

Diverse, including middle management

2

1

2

Development sprint

Researchers

14

4

56

4 Design and Implementation of a Causal Inference System for Value-Based Spare Parts Pricing at MAN Energy Solutions After we conceptualized our initial solution class as causal inference systems, we developed throughout several iterations of building, intervention, and evaluation, different prototypes. The explicit goal was to design and implement a system that helps to evaluate the success of value-based pricing initiatives at MAN Energy Solutions. In particular, the system should allow to visualize the effects of value-based price changes on materials and contain additional material-specific data to aid learning, test hypotheses, and aid future pricing decisions. Moreover, the system should estimate the effects of valuebased price changes for different key performance indicators, such as sales volume, sales revenue, and sales conversion rate. Some of the early prototypes involved DID related methods; however, we quickly noticed their limitations when observing seemingly unrealistic estimates. Justified by

196

T. Thiess and O. Müller

the theoretical literature (Sect. 2), we quickly focused on prediction-based counterfactual estimation methods, as described by Varian [18]. We tried and compared many approaches to estimate the counterfactual based on their ability to predict the preintervention data, and on plausibility checks. Such checks, again, were based on our and our informant’s subject-matter knowledge. Among the approaches that we tried were generalized least squares (GLS), autoregressive integrated moving average (ARIMA), OLS, and Croston’s method for intermittent demand [37, 38]. However, eventually, an approach based on a Bayesian structural time-series (BSTS) [30, 39] model consisting of a local-level trend and a regression component performed best. To increase the predictability of the often irregular and “lumpy” aftersales data [40], we tried a variety of different grouping characteristics. Grouping the treated materials based on their relative unit-price level (between 1 and 10), and the relative price change level (between −10 to +10) was the most plausible and best-performing approach. Also, we aggregated materials with unchanged prices into potential control groups based on their relative unit-price-level and a variable that indicates which regional headquarter is mainly responsible for selling it. To predict counterfactual outcomes of the situation that one would have expected had the price not changed, we first had to process the raw transactional data and transform it into monthly time-series data. In the BSTS model, we include a local-level trend component for the outcomes (e.g., average sales volume) of the treated unit (a repriced material group). Moreover, for the regression component, we include the following independent variables: 1.) a yearly seasonality term, 2.) a monthly seasonality term, 3.) the days of the month (e.g., 28 in February), 4.) Fourier terms (complex seasonality; see [41]), 5.) a variable of the average monthly unit price for the repriced material group that, for the post-intervention periods, we fixed at the last pre-intervention price, 6.) the potential material control groups. As we fit the model on many variables, and in particular on many potential control groups, we include an automatic variable selection method. As described by Scott and Varian [42] and Brodersen et al. [30], we place spike-and-slab priors on the regression coefficients to affect the probability that the algorithm includes a variable into the model. This way, the algorithm can exclude uncorrelated variables fully. In simplified terms, the algorithm first fits a model or function F(X) = Y on the pre-intervention data (when ignoring the local level trend component), then, the system inserts the observed values for the independent variables (e.g., X 1 , X 2 , X 3 ) in the postintervention period to predict the counterfactual (Y 0 ). Then, the system subtracts the predicted counterfactual outcomes in the post-intervention period from the observed outcomes (Y 1 ) in the post-intervention period to calculate treatment effects (see Fig. 1 for an illustration). To assure that none of the control variables are affected by the intervention on the treated subject (a material group), we include with the engine family an “unaffectedness condition.” In the system execution process (Fig. 2), we first group the materials on the above-described characteristics and engine family. Once the system starts to loop through each material group, it can first filter all control variables of the same engine family out before further aggregating the materials into their eventual groups.

Designing Causal Inference Systems for Value-Based Spare Parts Pricing

197

Fig. 1. Counterfactual prediction approach

Also, before the system fits the model on all pre-intervention data, it performs timeseries cross-validation to estimate average mean absolute prediction errors that indicate the pre-intervention predictability of a given repriced material group. Eventually, the system represents the analysis outputs in an interactive report. The snapshot in Fig. 3 shows the sum of the actual outcomes (e.g., sales volume) for repriced material groups (red) that the user selects. It also shows the sum of the predicted counterfactual outcomes for such repriced material groups (graphite), and the sum of total treatment effects (e.g., sales volume; grey). The diagrams show the sums for the full post-intervention period (Y2019) and for each quarter separately. Via the sheets, one can choose the KPI of interest (e.g., sales volume, sales revenue, or conversion rate), and assess additional information about the materials and their respective groups.

Fig. 2. The system execution process

198

T. Thiess and O. Müller

The system was well-received by MAN Energy Solutions, and based on it, they could validate many of the hypotheses behind their recent value-based pricing decisions and actions. As a result of this, the top management decided to role the value-based pricing initiative, that at this point was only conducted for the product range of one regional headquarter, out to other regional headquarters and their product ranges as well.

Fig. 3. Interactive report with adjustable filters (blurred for confidentiality reasons) (Color figure online)

5 Discussion of Design Principles 5.1 DP1: Pre-aggregation – Analysts Should Pre-aggregate Lumpy Data to Improve Its Predictability In the spare parts business in general and at MAN in particular, one often has to deal with large portfolios of materials with lumpy demand patterns. The reasons for this are 1.) a relatively small number of customers for a particular material, 2.) a heterogeneous customer base, e.g., a few large and many small customers, 3.) infrequent purchases, e.g., due to engine life-cycle dependent spare parts, and 4.) variable requests, e.g., large orders in case of breakdowns or mayor overhauls followed by small orders without a continuous pattern (see [40, 43]). Research suggests data aggregation as a way to improve the validity and predictability of statistical models when dealing with lumpy demand data [40, 43, 44]. According to Zotteri [44], when one wants to implement predictive models in such a situation, “the problem of the aggregation level of data […] is much more complex than the simple design or selection of an appropriate algorithm, and it involves the choice of the relevant

Designing Causal Inference Systems for Value-Based Spare Parts Pricing

199

pieces of information, the design of information systems, the control of data quality, and the definition of managerial processes” (p. 7). Similarly, we tried many different pre-aggregation characteristics and levels before we eventually settled on aggregating the data in terms of their temporal structure, e.g., aggregated daily to monthly data, and in terms of some pre-defined characteristics such as their relative unit-price-level and price-change-level as the best performing and most plausible approach. 5.2 DP2: Scalability – Analysts Should Use Robust Algorithms that Rely on Few Assumptions Only and Include Global Explainability Features to Enable Controlled Execution at Scale At MAN Energy Solutions, we had to estimate counterfactuals and calculate treatment effects for numerous material groups. Even though we reduced complexity by preaggregating the data into broader groups, we still had to conduct so many analyses that a thorough visual modeling approach was infeasible. Moreover, fitting models with many covariates is challenging because the amount of data needed to estimate model parameters accurately grows exponentially with each additional variable (“the curse of dimensionality”) [45]. This problem becomes even more severe when working with time-series data, as one often has only about a hundred records (monthly data) or less (lumpy data) per unit for model fitting. We addressed the challenge by choosing a BSTS model. This modeling approach generally shows good performance on many different datasets and involves, with spike and slab priors, an automatic variable selection method [39, 42]. Moreover, in our BSTS model, we chose a local-level trend component, which performs well in many cases [30]. Furthermore, we did not include seasonality as a state component of the BSTS model. Still, we included seasonality terms as covariates into the model, so that the variable selection method can include them whenever they help to explain variance in the outcome variable of a repriced material group and exclude them when they don’t. This approach is more flexible and requires fewer assumptions about the unobserved state-space (data generating process) than pre-selecting a seasonality state component for all repriced material groups. Also, we struggled to understand the behavior of individual models fully, as a visual exploration of all fitted models for all repriced material groups was simply not possible. Because of that, we implemented explainability features to help us understand better and improve our models. In particular, we displayed the strongest predictor together with its inclusion probability for each model and calculated pre-intervention prediction errors, respectively. Research about global explainability methods justifies our approach [3, 46, 47]. 5.3 DP3: Unaffectedness – Analysts Should Define Unaffectedness Conditions Based on Subject-Matter Knowledge, and Causal Diagrams and Filter Model Covariates Based on Them to Avoid Spillover Effects The most important assumption in synthetic control based causal inference is that the control groups are not directly affected by the treatment [30, 32, 33]. To assure that

200

T. Thiess and O. Müller

one does not violate this assumption, one could simply choose all non-treated material groups as potential controls. In aftersales and at MAN Energy Solutions, however, this approach can lead to biased estimates as spare parts orders can be large and consist of several combinations of frequently bought together materials. The result of this is that a price change in one of the related materials can affect the demand for the other material, which is equivalent to a spillover effect. We addressed this challenge by systematically mapping all potential causes of spillover effects using our and our informant’s subject-matter knowledge as well as causal diagram-like representations. We justify our approach by referring to Pearl [48] and Hernan et al. [15] that suggest both the use of subject-matter knowledge and causal diagrams for causal reasoning and modeling. Based on such an approach, we defined the engine family of a material as an appropriate unaffectedness condition and filtered potential control groups based on it. We made this decision based on the insight that it is unlikely that customers frequently buy materials of different engine families together because they usually order spare parts for a particular ship that usually only has one or two main engines of the same family installed. 5.4 DP4: Pre-intervention Predictability – Analysts Should Use Cross-validation and Evaluate Treatment Effects in Light of the Pre-intervention Predictability to Draw More Truthful Conclusions According to Brodersen et al. [30], the main assumptions that causal impact analyses need to fulfill are 1.) the unaffectedness of controls by the treatment and 2.) a relatively strong pre-intervention correlation of the covariates with the pre-intervention outcome of the treated unit. The first assumption is essential to avoid biased estimates due to spillover effects and the second one is important to assure a good model fit, and, thus, a valid estimation of the counterfactual. Many researchers have convincingly shown that goodness of fit is not sufficient in analyses that require predictive estimates [49–51]. Moreover, some time-series are simply easier to predict than others, such as time-series that have regular seasonality patterns and stable trends. At MAN Energy Solutions, we had to work with lumpy time-series that, despite the increased data quality due to pre-aggregation, still did not fulfill such ideal properties. In reaction to this, we assessed the pre-intervention predictability in terms of mean absolute percentage error (MAPE) for outcome data of each repriced material group by performing time-series cross-validation [52]. We then included a condition that marks the mean absolute percentage treatment effect for each treated material group as significant only if they are larger than the pre-intervention prediction MAPE. By doing so, we avoid displaying treatment effects that are only due to the variation in pre-intervention predictability.

Designing Causal Inference Systems for Value-Based Spare Parts Pricing

201

5.5 DP5: Treatment Simulation – Analysts Should Add the Treatment Variable to the Model and Fix Its Post-intervention Values at Its Last Pre-intervention Value to Strengthen the Counterfactual Prediction At the root of causal inference is the potential outcome framework by Donal Rubin [53] that suggests estimating causal treatment effects as the difference between the observed and the (counterfactual) potential outcome. All of the causal inference methods discussed above apply counterfactual reasoning as they try to simulate a world in which the unobservable potential outcome was observed. So they try to answer the question: “How would the outcome be had the treatment not occurred?” [54, 55]. In our case, the treatment of interest was a price change at a particular point in time. Nevertheless, it was likely not the first price change for a given material. In our model, we explain historical variations in the outcome that were caused by prior treatments by incorporating a continuous treatment variable (the monthly average unit price) as a predictor directly into the model. For the post-intervention prediction, however, we fix its values at the last pre-intervention unit price. By doing so, we account for the effects of prior interventions on the outcome time-series. Moreover, as we insert simulated (fixed) values for the treatment variable, we strengthen the counterfactual prediction of the model by using it as a simulation engine directly. 5.6 DP6: Interactive Visualization – Analysts Should Create Interactive Reports Instead of Static Presentations to Aid Understanding and Acceptance During our ADR project, we reasoned that it could be difficult for users to comprehend the inner workings of our analysis approach. Also, the pricing manager that was the key end-user of the system requested a possibility to explore the data interactively. Research in information visualization suggests that interactive representations of data increase the usability of systems and aid learning and understanding for its users [56]. Research in technology acceptance, on the other hand, suggests that an increased understanding increases the acceptance of a decision support system [46, 47, 57]. Motivated by these findings, we designed an interactive user interface in which users can explore the causal impact of price changes on different KPIs, e.g., by adjusting the information that the diagrams represent via slicers for the unit-price-level or the pricechange-level of materials. Here, whenever one adjusts a slicer, the visualization adapts immediately.

6 Discussion and Conclusions In this study, we contribute to information systems and industrial marketing by answering our research question with theory about how to design and implement causal inference systems for value-based spare parts pricing support [58]. Our study provides theoretical and empirical evidence on how causal inference approaches can solve the relevant field problem of estimating the causal effects of pricing interventions not only on subjects in a laboratory setting but on the experimenter (the firm) in its natural context [18]. Moreover, we show in a particular industry application how artificial intelligence (AI)

202

T. Thiess and O. Müller

related technologies can foster servitization processes towards more value-based and customer-centric sales and marketing approaches [4]. While our system incorporates an approach for counterfactual prediction and treatment effect calculation that is comparable to the one introduced by Brodersen et al. [30], our approach is specially adapted to the aftersales context and value-based pricing effect estimation problems (see DP1, DP2, and DP3). Moreover, our approach is an industrial application that required us to design and implement a socio-technical information system around the causal inference approach that is fully integrated into the IT architecture at MAN Energy Solution and includes a data processing pipeline and a user interface (see DP1 and DP6). Brodersen et al., on the other hand, introduce a general-purpose method for causal impact analysis and demonstrate its utility in a laboratory setting on empirical data from a digital marketing campaign. Furthermore, we improve the method by Brodersen et al. by adding measures of pre-intervention predictability (DP4) and adding a treatment variable to the model (unit price) that helps to explain the variation in the pre-intervention period and strengthens the counterfactual prediction by fixing it at its last pre-intervention value for the postintervention period (DP5). In the future, we want to further experiment with pre-aggregation levels to find a way that balances model robustness with more fine-grained applicability of results in organizational decision making processes. Furthermore, we want to investigate how time-series cross-validation based hyperparameter tuning and prediction combinations affect system performance and treatment-effect-validity.

References 1. Sundin, E.: Life-cycle perspectives of product/service-systems: in design theory. In: Sakao, T., Lindahl, M. (eds.) Introduction to Product/Service-System Design. Springer, London (2009). https://doi.org/10.1007/978-1-84882-909-1_2 2. Cohen, M.A., Agrawal, N., Agrawal, V.: Winning in the aftermarket. Harv. Bus. Rev. 84, 129–138 (2006) 3. Thiess, T., Müller, O., Tonelli, L.: Design principles for explainable sales win-propensity prediction systems. In: WI2020 Zentrale Tracks, pp. 326–340 (2020) 4. Huang, M.H., Rust, R.T.: Artificial intelligence in service. J. Serv. Res. 21, 155–172 (2018) 5. Rust, R.T., Huang, M.H.: The service revolution and the transformation of marketing science. Mark. Sci. 33, 206–221 (2014) 6. Thiess, T., Müller, O.: Towards design principles for data-driven decision making - an action design research project in the maritime industry. In: 26th European Conference on Information System Beyond Digitization –Facets of Socio-Technical Change. ECIS 2018 (2018) 7. Gallagher, T., Mitchke, M.D., Rogers, M.C.: Profiting from spare parts. McKinsey Q. 2, 1–4 (2005) 8. Hinterhuber, A.: Towards value-based pricing—an integrative framework for decision making. Ind. Mark. Manag. 33, 765–778 (2004) 9. Hinterhuber, A.: Value delivery and value-based pricing in industrial markets. Adv. Bus. Mark. Purch. 14, 381–448 (2008) 10. Hinterhuber, A., Liozu, S.M.: Is innovation in pricing your next source of competitive advantage? Bus. Horiz. 57, 413–423 (2014) 11. Wickboldt, C., Kliewer, N.: Value based pricing meets data science: a concept for automated spare part valuation (2018)

Designing Causal Inference Systems for Value-Based Spare Parts Pricing

203

12. Cullbrand, M., Levén, L.: Spare parts pricing-setting the right prices for sustainable profit at Atlet (2012) 13. Andersson, J., Bengtsson, J.: Spare parts pricing-Pre-study for a pricing strategy at Pon (2013) 14. Goodacre, S.: Uncontrolled before-after studies: discouraged by cochrane and the EMJ. Emerg. Med. J. 32, 507–508 (2015) 15. Hernán, M.A., Hernández-Diaz, S., Werler, M.M., Mitchell, A.A.: Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am. J. Epidemiol. 155, 176–184 (2002) 16. Cochrane, A.L., et al.: Effectiveness and efficiency: random reflections on health services. Nuffield Provincial Hospitals Trust London (1972) 17. Cartwright, N.: Are RCTs the gold standard? Biosocieties 2, 11–20 (2007) 18. Varian, H.R.: Causal inference in economics and marketing. Proc. Natl. Acad. Sci. U.S.A. 113, 7310–7315 (2016) 19. Mora Cortez, R., Johnston, W.J.: The future of B2B marketing theory: a historical and prospective analysis. Ind. Mark. Manag. 66, 90–102 (2017) 20. Martínez-López, F.J., Casillas, J.: Artificial intelligence-based systems applied in industrial marketing: An historical overview, current and future insights. Ind. Mark. Manag. 42, 489–495 (2013) 21. Sein, M.K., Henfridsson, O., Purao, S., Rossi, M., Lindgren, R.: Action design research. MIS Q. 35, 37–56 (2011) 22. Gregor, S., Hevner, A.R.: Positioning and presenting design science research for maximum impact. MIS Q. 37, 337–355 (2013) 23. Benbasat, I., Zmud, R.W.: The identity crisis within the IS discipline: defining and communicating the discipline’s core properties. MIS Q. Manag. Inf. Syst. 27, 183–194 (2003) 24. Schulz, K.F., Chalmers, I., Hayes, R.J., Altman, D.G.: Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA J. Am. Med. Assoc. 273, 408–412 (1995) 25. Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66, 688–701 (1974) 26. Cook, T.D., Campbell, D.T., Day, A.: Quasi-Experimentation: Design & Analysis Issues for Field Settings. Houghton Mifflin, Boston (1979) 27. Ashenfelter, O., Card, D.: Using the longitudinal structure of earnings to estimate the effect of training programs (1985) 28. Card, D.: The impact of the Mariel boatlift on the Miami labor market. Ind. Labor Relat. Rev. 43, 245 (1990) 29. Abadie, A.: Semiparametric difference-in-differences estimators. Rev. Econ. Stud. 72, 1–19 (2005) 30. Brodersen, K.H., Gallusser, F., Koehler, J., Remy, N., Scott, S.L.: Inferring causal impact using Bayesian structural time-series models. Ann. Appl. Stat. 9, 247–274 (2015) 31. Bertrand, M., Duflo, E., Mullainathan, S.: How much should we trust differences-indifferences estimates? Q. J. Econ. 119, 249–275 (2004) 32. Abadie, A., Gardeazabal, J.: The economic costs of conflict: a case study of the Basque Country. Am. Econ. Rev. 93, 113–132 (2003) 33. Abadie, A., Diamond, A., Hainmueller, J.: Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program. J. Am. Stat. Assoc. 105, 493–505 (2010) 34. Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28, 3–28 (2014) 35. Hevner, A.R., March, S.T., Park, J., Ram, S.: Design science in information systems research. MIS Q. 28, 75–105 (2004) 36. Baskerville, R.L., Myers, M.D.: Design ethnography in information systems. Inf. Syst. J. 25, 23–46 (2015)

204

T. Thiess and O. Müller

37. Croston, J.D.: Forecasting and stock control for intermittent demands. Oper. Res. Q. 23, 289–303 (1972). https://doi.org/10.1057/jors.1972.50 38. Harvey, A.C., Amemiya, T.: Advanced Econometrics. Harvard University Press, Cambridge (1987) 39. Scott, S.L., Varian, H.R.: Predicting the present with Bayesian structural time series. Int. J. Math. Model. Num. Optim. 5, 4–23 (2014) 40. Bartezzaghi, E., Verganti, R., Zotteri, G.: Simulation framework for forecasting uncertain lumpy demand. Int. J. Prod. Econ. 59, 499–510 (1999) 41. Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice. OTexts, Australia (2018) 42. Scott, S.L., Varian, H.: Bayesian variable selection for nowcasting economic time series (2013) 43. Bartezzaghi, E., Kalchschmidt, M.: The Impact of aggregation level on lumpy demand management. In: Altay, N., Litteral, L. (eds.) Service Parts Management. Springer, London (2011). https://doi.org/10.1007/978-0-85729-039-7_4 44. Zotteri, G., Kalchschmidt, M.: A model for selecting the appropriate level of aggregation in forecasting processes. Int. J. Prod. Econ. 108, 74–83 (2007) 45. Bellman, R.E.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (2015) 46. Gregor, S., Benbasat, I.: Explanations from intelligent systems: theoretical foundations and implications for practice. MIS Q. Manag. Inf. Syst. 23, 497–530 (1999) 47. Martens, D., Provost, F.: Explaining data-driven document classifications. MIS Q. Manag. Inf. Syst. 38, 73–99 (2014) 48. Pearl, J.: Causal diagrams for empirical research. Biometrika 82, 669 (1995) 49. Tashman, L.J.: Out-of-sample tests of forecasting accuracy: an analysis and review. Int. J. Forecast. 16, 437–450 (2000) 50. Fildes, R., Makridakis, S.: The impact of empirical accuracy studies on time series analysis and forecasting. Int. Stat. Rev. Int. Stat. 63, 289–308 (1995) 51. Makridakis, S., et al.: The accuracy of extrapolation (time series) methods: results of a forecasting competition. J. Forecast. 1, 111–153 (1982) 52. Bergmeir, C., Hyndman, R.J., Koo, B.: A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput. Stat. Data Anal. 120, 70–83 (2018) 53. Rubin, D.B.: Causal inference using potential outcomes: design, modeling, decisions. J. Am. Stat. Assoc. 100, 322–331 (2005) 54. Pearl, J.: Probabilities of causation: three counterfactual interpretations and their identification. Synthese 121, 93–149 (1999). https://doi.org/10.1023/A:1005233831499 55. Morgan, S.L., Winship, C.: Counterfactuals and Causal Inference. Cambridge University Press, Cambridge (2015) 56. Liu, S., Cui, W., Wu, Y., Liu, M.: A survey on information visualization: recent advances and challenges. Vis. Comput. 30(12), 1373–1393 (2014). https://doi.org/10.1007/s00371013-0892-3 57. Kayande, U., De Bruyn, A., Lilien, G.L., Rangaswamy, A., van Bruggen, G.H.: How incorporating feedback mechanisms in a DSS affects DSS evaluations. Inf. Syst. Res. 20, 527–546 (2009) 58. Gregor, S., Jones, D.: The anatomy of a design theory. J. Assoc. Inf. Syst. 8, 312–335 (2007)

Organizational Change Toward IT-Supported Personal Advisory in Incumbent Banks Maik Dehnert(B) Chair of Business Informatics and Digitalization, University of Potsdam, Potsdam, Germany [email protected]

Abstract. Due to changing customer behavior in digitalization, banks urge to change their traditional value creation in order to improve interaction with customers. New digital technologies such as core banking solutions change organizational structures to provide organizational and individual affordances in ITsupported personal advisory. Based on adaptive structuration theory and with qualitative data from 24 German banks, we identify first, second and third order issues of organizational change in value creation, which are connected with a set of affordances and constraints as the outcomes for customer interaction. Keywords: Digital workplace · Retail banking · Advisory · Adaptive structuration theory · Affordances · Organizational change

1 Introduction The way customers interact with banks has changed radically in the last years [1]. Fintech raise the expectations of user interaction greatly [2]. Decreasing switching costs and increasing transparency entail the risk of losing customers. Hence, banks face manifold challenges to maintain the relationship with customers in the future [3]. Given their historically grown IT and organization, incumbent banks began to transform work processes and organizational structures to meet the increasing digital expectations. However, the necessary realization of customer-oriented processes is actually challenging for incumbents, since dysfunctional forms of the organization, which are also reflected in the IT system landscape, often contradict a successful digital transformation [4]. Whereas the value creation dimension consists of processes and structures in a bank organization, the actual intersubjective customer advisory takes place in customer interaction. In the context of moving from traditional pen and paper advisory towards digital forms, the interplay between these two dimensions seems to become a crucial aspect for profitability and customer satisfaction [5]. Currently, there are change initiatives in many banks with a particular focus on customer advisory [6]. These banks focus on IT-supported personal customer advisory at the service-sales interface [7], especially in traditional branchbased settings. Surprisingly, there is only few research on the role of organizational change for incumbent banks [8–10], and none with regard to digital transformation. Against this background, we ask: How does organizational change affect customer interaction in banks, such as affordances and constraints in customer advisory? At this, © Springer Nature Switzerland AG 2020 R. A. Buchmann et al. (Eds.): BIR 2020, LNBIP 398, pp. 205–219, 2020. https://doi.org/10.1007/978-3-030-61140-8_14

206

M. Dehnert

the aim of this study is to examine the interplay between value creation and customer interaction in digital transformation. Based on the adaptive structuration theory as the theoretical lens, we identify several first, second and third order issues regarding the prerequisites and consequences of a digital transformation initiative in an incumbent retail banking group. The outcome of the paper is a set of IT-induced affordances and constraints for personal advisory in banking. The structure of the paper is as follows: Firstly, we describe the theoretical background and present the methodology used for data collection and analysis. Secondly, we present the results of the data analysis. Finally, we point out implications for future research and practice.

2 Background 2.1 Digital Transformation in Banking Digital transformation affects the banking industry in three dimensions: Value creation, value proposition and customer interaction [11]. The value creation model (VCM) includes the impact on a company’s core processes to create products and services, including the necessary support processes. Traditional banks are, for instance, working on making their processes increasingly paperless. The value proposition model (VPM) includes the impact on a company’s products, services and revenue models. Traditional banks, for instance, introduce digital products like personal finance management and offer new digital tariff models. The customer interaction model (CIM) includes the crosschannel and holistic design of the customer relationship with both digital and non-digital ways of customer interaction and modern forms of data analysis. These developments evolve around digital channels such as mobile apps and online branches. Especially new ways of customer interaction become a crucial part of digital bank strategies [12]. This includes human-material service practices like IT-supported customer advisory. In this study, we focus on the interplay of the VCM and CIM (see Fig. 1).

Fig. 1. Digital transformation dimensions and the locus of this study [11]

The stream of banking research on value creation has dealt with structural characteristics of incumbent banks [13], efficiency measures of financial service processes

Organizational Change Toward IT-Supported Personal Advisory

207

[14] and highlighted specific barriers to digital bank processes [15]. In the area of value proposition, scholars have identified novel types of digital products in financial services, such as finance, payment and insurance [16]. Regarding customer interaction, digital channels are an important research stream [17–19]. 2.2 Organizational Change The dynamics of digital transformation are well understood regarding business models, products and markets. However, the dynamics of the underlying organizational change from the inside to the outside is less well understood. There is a vast amount of literature on organizational change in the management discipline (e.g., [20–22]), however, only little research comes from the business informatics or information systems discipline (e.g., [23–25]). Markus [25] has coined the term ‘technochange’ for technology-driven organizational change, whereas Besson and Rowe [23] did a transdisciplinary review on IS-enabled organizational transformations. The management research on organizational change involves organizational change processes, its outcomes and implications for change recipients (for an in-depth review, see: [26]). From the management perspective, organizational change aims at “the planned change of the organizational work setting for the purpose of enhancing individual development and improving organizational performance” [27]. Van den Ven and Poole [28] have systemized four patterns of development and change in organizations, which they call ‘change motors’: life-cycle, teleology, dialectic, and evolution. These four motors represent different sequences of change events and operate at different organizational levels. Interesting is in particular the interplay of these four motors in complex change phenomena such as digital transformation. In cases where change is planned [29], it occurs episodically, discontinuously and intentionally; organizations such as banks may set up goal-oriented, teleological change programs to move from one equilibrium to another [30], changing their known organizational structures and routines. At this, banks also have to take into account the life-cycles of their IT systems and the possible dialectic tensions between different organizational entities, such as employees and managers, in implementing the change. An important target of change initiatives is digital work design, shifting work from traditional to digital forms [31]. In digital work, human capabilities are augmented in a way that creativity, problem solving and learning is enhanced. In this paper, we identify several issues in organizational change for personal IT-supported advisory in banks. Following the conceptualization of digital transformation as a complex sociotechnical organizational transformation process a solid theoretical foundation is necessary for the purpose of analysis [23]. At this, we take both a teleological (i.e. social constructive) and dialectical (i.e. conflicting) view on organizational change [28]. We draw on the adaptive structuration theory (AST) for the analysis. The AST is specifically a theory intended for the analysis of the impacts of advanced information technology use in organizations, which makes it useful in the context of studies on digital transformation [32–34]. The AST describes the creation and recreation of a social system through an interplay of material technologies and social human interactions, such as in IT-driven technochange projects [35]. A study on sales personal found that the success of planned change interventions largely depends on identifying and appreciating the heterogeneity of the individual traits of employees that share meaning with the change [36]. A major

208

M. Dehnert

aspect of the AST in this regard is the ‘spirit’ that reflects the long-term goals of an IT-induced change project. At this, the AST helps to examine the interplay between the intended way of IT system usage and the actual application by the users [37]. On this basis, we identify several issues regarding organizational structures, relationships among them and the appropriation of these structures by customer advisors in digital transformation. As relational concepts, affordances and constraints facilitate the scholarly understanding that what one individual or organization with particular capabilities and purposes can or cannot do with a technology may be very different from what a different individual or organization can do with the same technology [38–40]. This resembles Hammer’s view [41] that stronger organizational capabilities lead to stronger change enablers and higher performances. To analyze this relationship qualitatively, we especially draw on the notion by Zammuto et al. [38], who stress that affordances emerge from the intersection of IT and organizational systems, and are, thus, a potential outcome of the structural features and spirit, whereas constraints can be an adverse effect of change. 2.3 Research Setting IT-supported personal advisory in banking provides the research setting for this study. In particular, we analyze the implications of a core banking system renewal in a large German banking group. At this, we study the organizational changes from several independent banks that introduced a new tailored IT-supported personal advisory solution from the group’s IT service provider. The new cooperative IT solution was built to support the advisor in traditional service encounters [42]. With the new core banking software the banks introduced joint service encounters and a collaborative customer interaction setting [43]. At this, the sales frontend followed a co-creation approach [44– 46]. Moreover, the software introduced new features, such as shared multimedia content to be played during a customer appointment. With the new banking software, also new standardized guidelines have been introduced. The IT-supported personalized advisory tools aim to provide new affordances for the specific user group of customer advisors. These are intended to simplify and improve outcomes such as customer experience by enhancing overall communication [46] (see Fig. 2).

Advisory goals Organizational Structures Structures

Information Technology

Outcomes Affordances

Customer Interaction

Fig. 2. IT-supported personal advisory as the research setting

Organizational Change Toward IT-Supported Personal Advisory

209

2.4 Methodology To tackle our research question, we follow a qualitative case study research approach [47]. Our study sample consists of 24 German banks which belong to the same umbrella organization, given several structural similarities. The corresponding data was collected in the summer of 2018. The data collection consisted of two steps. Firstly, we conducted in-depth telephone interviews with 28 bank executives. The interviews lasted 45 to 90 min. We used semi-structured questions about a) the banks’ current business situation, b) the banks’ current projects in digital transformation and people responsibilities, c) the banks’ implementation and usage of digital technologies across the three digital transformation dimensions, d) employee and change measures and e) the adaptions of organizational features. Secondly, we collected surveys from senior managers responsible for digital transformation and from 160 customer advisors working for these banks. Each manager answered questions on measures of digital transformation, such as changes of processes and structures, values and norms across the organization, the impact of the adaptation of new IT tools on customer interaction. The focus of the employee survey was on the influence of the new core banking user interface on customer interaction. At this, we asked employees quantitative questions on 5-point likert scales (the mean values are indicated in brackets) as well as open-ended questions. The employees had to a) describe changes in customer advisory and b) assess the impact on their job routines. The responses were matched with the managers of each participating bank. Subsequently, the collected data was analyzed qualitatively [48]. At this, we followed the analytical open, axial and selective coding techniques of Grounded Theory by Strauss and Corbin [49, 50]. The aim of open coding was to identify relevant features and dimensions in the data. This included the thematic classification of the collected data into subject areas. Axial coding is the process of connecting emerging categories to their sub-categories. Thus, the subject areas were classified into the overarching theoretical entities. We synthesized our findings across several orders of issues or themes [51, 52]. The analysis of systems and actors comprised of: Identification of spirit (i.e. transformation goals), organizational structures and influencing relationships, appropriation of the structures and resulting affordances or constraints. In selective coding, we further analyzed the relationships between value creation and customer interaction.

3 Findings In this section, we present our findings. Table 1 gives an initial overview on the organizational change dimensions from the perspective of the AST. Changing customer behavior was the starting point for the banks to change their personal advisory setting. Customers have limited time for contact with the bank, demanding temporal and local flexibility and usability along the lines of Fintech. At this, customer advisors as the actors have to cope with changing customer expectations and multi-brand loyalty. An integral objective of the change initiatives was the banks’ spirit, which represents the general intent of a company’s digital transformation and, thus, incorporates purpose and goals as well as norms and values [53]. At this, the banks notably wanted to get rid of their self-concept as legacy monoliths. The introduction of digital technologies such as the core banking solution changed the banks’ organizational structures such as sales or service processes

210

M. Dehnert

and job roles. These new structures also affect the social interactions of the employees. At this, appropriation describes the individuals’ realization of the organizational spirit and new structures, which forms new behavioral rules and norms. As an employee actively decides on how organizational changes are used (e.g., directly using the new structures or avoiding them), he or she may appropriate IT more or less faithfully. A bank employee may apply new behavioral rules and resources in their daily working context more or less congruent to the intended spirit of the organization. Hence, organizations may use appropriation support to improve employees’ appropriation, such as employee workshops or trainings. These measures aim to embed the changing spirit in the social subsystem (e.g., new skills, attitudes or rewards). This process reshapes the organizational structures and may affect the outcomes of organizational change. At this, the feedback process of reciprocal causation supposes new organizational structures will emerge in a company or group as organizations react to employee appropriation with ongoing changes of their organizational structures. Table 1. Organizational change dimensions (c.f. [37]) Characteristic

Instantiation

Context

Advisory: changing organizational environment (e.g., Fintech competitors) and customer behavior (e.g., multi-brand loyalty)

Actors

Customer advisors along with their working groups and organizational leaders

Spirit

Digital transformation goals (e.g., “remain a traditional bank but to become more digital”), sticking with traditional logics

Technologies

IT-supported co-created advisory based on a new core banking software for advisory

Organizational structures New corresponding processes for sales and services Appropriation

How faithfully advisors appropriate technologies, perceive constraints in service processes depends on change self-efficacy and acceptance of the newly created organizational structures

Appropriation support

Scaffolds such as workshops and trainings not applied consistently to all related structures and processes, thus, regarded as “superficial”

Reciprocal causation

Little impact on advisors to change social interaction patterns so that new organizational structures may emerge

3.1 Open and Axial Coding In the following, we identify first, second, and third order issues of organizational change by means of open and axial coding.

Organizational Change Toward IT-Supported Personal Advisory

211

First Order Issues This category comprises concrete issues regarding structures, technology, training, support of organizational change or issues with resource access and expertise [51]. Digital Infrastructure The banks induced several changes regarding their historically grown IT infrastructures, such as the introduction of a new core banking software. The banks provided additional technical interfaces to their customers (e.g., payment and identity services) and integrated third-party channels (e.g., personal assistants by Apple or Google) as well as web chats and feedback forms. Moreover, the banks built interfaces for collecting customer data (e.g., payment and identity services), automating customer data acquisition (e.g., selfservices) and standardizing data across the organization (e.g., target customer profiles). However, the access, sharing and usage of customer data anytime from anywhere (e.g., gathering information before the advisory session) was not possible as the integration of data across channels and systems (i.e. hybrid customer interaction) was not realized yet. Hence, the degree of digitalization of the processes was assessed by the managers as mediocre (3.00). In particular, data-driven customer interaction was evaluated as low (2.29). The degree of digital customer interaction, such as channels, was estimated as to be high (4.00). In particular, advisors evaluated the processes in the back office (2.67), such as support processes (2.75), to show strong potential for digital innovation. Standardization The banking processes changed considerably with the new banking software, as customer and advisor go through the standardized advisory process together. Accordingly, new support processes have been introduced, such as transaction handling, advisory and after-sales services. At this, the banks started to standardize their internal and external processes. Customer contact centers, for instance, were appointed for special types of customers, such as online customers. After the introduction of these new service platforms, customer advisors have obtained new roles or responsibilities, such as for certain channels. Hence, there was also a shift of skill demands across job roles (e.g., new advisory templates, event-driven, rule-based advisory). This led to more centralized organizational operations (e.g., interactive service platforms). Digital teams were formed to push the intended changes throughout the organization. Their actual work, however, did on average not follow clear guidelines (2.50), as there was often also no job description (2.54). Change Self -efficacy of Employees The banks used scaffolds such as employee trainings and pushed digital change across the organization via digital teams that supported traditional departments. In many of the banks, however, these change initiatives were not applied consistently. As such, acceptance problems occurred at the employee side. In particular, change self -efficacy [54], as an employee’s belief in his or her inner ability to achieve goals, may influence how faithful employees appropriate new processual guidelines. The advisors slightly preferred a digital way of working (e.g., with iPad, PC) to pen and paper (3.42). Since many employees of the banks have learned their profession in times of paper-based processes, from the managers’ point of view “habits must be broken.”

212

M. Dehnert

Second Order Issues This category shows issues that are the result of contextual effects. As a result, paradoxes in infrastructure or skills might become more tangible. This includes cultural influences such as customer behavior which broadens the context of change [51]. Changing Customer Behavior In the eyes of advisors, customers are overstrained by a large number of offerings in banking and strive for convenient services. This affects advisors directly in their daily work processes as several advisors claimed: “Since it has become much easier to compare individual products with each other in digitalization, customers are less familiar with the products our bank offers.” Hence, customer advisors have to cope with customer expectations and multi-brand loyalty. As several advisors admitted, they had to adapt work routines, especially regarding the more complex preparation of personal appointments with customers: “What’s more, virtually everyone now offers current accounts and loans, and customers simply lose track.” Many advisors mentioned that this would lead to an increasingly difficult and more time-consuming customer sales and service process. In this regard, the new cooperative advisory setting did in fact simplify and accelerate their decision-making: “In the past, it was much more difficult to search for the data for analysis and control; today they can already be called up and viewed using a simple button.” Nonetheless, the degree of customer proximity was assessed as mediocre by the advisors (2.86). Individualized Customer Interaction The other side of the story was the lack of integration of processes, data, and channels that was described as leading to long-winded customer processes. Many advisors complained that too little was being done from the managers to respond adequately to the new customer requirements. Especially for the employees who continued to think ahead, the changes simply did not go far enough. Many advisors revealed that, despite the standardization of processes, the consulting time usually depends on the complexity of the customer request. Many advisors claimed that standardized processes may help to get information on customer preferences, but there is less and less time to find out more about a customer’s personal matters: “The individuality in customer meetings is lost, since everything is recorded on the PC and no personal points can be included.” The quantitative ratings indicate that the actual knowledge about customers is based mostly on master data, but only little on transaction data. For example, the advisors rated the knowledge of customers prior to the consulting appointment as rather high with regard to the family situation (3.43) and professional situation (3.62), but stated that they knew less about the customer’s leisure time and hobbies (2.72) or future plans (2.85). The same advisors also claimed that the simplified decision-making systems did not help in complex decision-making situations with well-informed customers, they would rather have a paradoxical effect and require a more context-sensitive approach: “Solutions are often not thought in the routines of our daily work.” Shift of Necessary Skills and Routines Many advisors described changes in preparation and follow-up of appointments, as the complexity of advisory topics has increased. The employees rated the need for digital skills in advisory as above-average (3.49), which suggests that a greater portion of the

Organizational Change Toward IT-Supported Personal Advisory

213

advisors pretends to be aware of the necessity to change their way of working. Almost all advisors admitted that there is an increasing amount of time to invest into back office tasks, such as regulatory fulfilments after sales. It became clear that many advisors especially struggled with their daily professional challenges that have not been resolved by IT: “We have to reckon with many more questions, which of course also means a longer preparation time of the conversation.” With changing customer behavior, the need for indepth personal discussions with customers also increased: “The well-informed customer wants to go into details relatively quickly, here it is even more important to put the customer benefit in the foreground.” At this, the advisors described the need to develop more into “a financial strategic advisor.” At the same time, many advisors argued that this would be impeded by the banks’ efficiency-thinking leading to a decline of advisory competencies and sociability: “What we do, no longer appears very unique, but rather dull, as if everyone can do the job.” Third Order Issues This category shows broader issues with multiple meanings in structuration. This includes deeply rooted attitudes that affect problem solving in the widest context [51]. Commitment to Change The appropriation of the IT changes was a two-edged sword for the banks. On the one hand, many advisors stated that they actually liked “the visualizations provided by the new software.” On the other hand, a number of employees was not convinced by the new solution, some even did use workarounds as the new process was not developed “from an employee perspective” and “many processes would still require a lot of human intervention.” Exemplarily, iPads could only be used for an advisory appointment after a lengthy and complicated registration procedure and were therefore not used across the board. Our data also shows that personal characteristics influence the commitment to change [55] (i.e. the willingness to transform), such as to be open to try something new: “Some employees are already using the tablets very actively in their counselling, while others are not yet using them too much.” Another separation was observable regarding employees who showed low commitment or even resistance to change. This became evident for those customer advisors who “will not be dragged into the new processes.” The managers evaluated the digital affinity of employees as mediocre (3.00). On reason could be that the transformation initiatives led to the introduction of new roles (e.g.„ project leaders; 3.63) and to restructuring of organizational units (3.88), however, employees at the operational level have been rarely involved in important decisions (2.25). Spirit An integral part of change initiatives is spirit, which represents the general intent of digital transformation and, thus, incorporates purpose and goals as well as norms and values. In this regard, the traditional institutional logics in banking had a twofold impact: an enhancing one by maintaining a trustworthy partner, but also a limiting one by sticking to the old (“remain a traditional bank but becoming more digital”). This was reflected by different viewpoints on the necessity of paper-based processes, such as in the data center. Moreover, digital teams had been deployed, but the collaboration with professional departments had only started and for the banks it was not clear how much influence this

214

M. Dehnert

already had on changing the established structures. At the time of our study, an upstream of employee-level digital initiatives had in fact not considerably affected the banks yet. Table 2. Organizational changes, affordances and constraints Organizational changes

Affordances

Constraints (issue)

Providing new advisory software and hardware

Facilitating communication, Lack of employee making better product appropriation (commitment to offerings, increasing intimacy change, spirit) and trust

Providing standardized and simplified decision-making functionalities

Facilitating communication, making better product offerings

Lack of personalized interaction (changing customer behavior, individualization)

Enhancing digital customer interaction channels (e.g., direct mails, video banking, WhatsApp)

Facilitating communication, increasing intimacy and trust

Lack of data and channel integration, process and media breaks and distorted or isolated work practices (digital infrastructure)

Providing interfaces for collecting further customer data (e.g., new payment and identity services)

Facilitating communication, making better product offerings

Regulations and data privacy restrictions (spirit)

Accessing, sharing and using customer information anytime from anywhere (e.g., gathering information before the advisory session)

Facilitating communication, making better product offerings, increasing intimacy and trust

Lack of data and channel integration (digital infrastructure, shift of necessary skills and routines)

Integrating third party Facilitating communication channels (e.g., Google, Apple, Facebook, Amazon)

Lack of profound data access (digital infrastructure)

Automating customer data Facilitating communication acquisition (e.g., self-services)

Challenges with pulled interaction (individualization)

Capturing and archiving digital data about customers across channels (e.g., releasing data silos)

Making better product offerings

Lack of process and data integration, media breaks (digital infrastructure)

Coordinating customer data across channels and systems (i.e. hybrid customer interaction)

Facilitating communication, making better product offerings

Lack of channel integration (digital infrastructure)

Standardizing data across the organization (e.g., target customer profiles)

Increasing transparency and traceability

Regulations, data privacy restrictions (spirit) (continued)

Organizational Change Toward IT-Supported Personal Advisory

215

Table 2. (continued) Centralizing organizational operations (e.g., customer contact centers, interactive service platform)

Optimizing work efficiency

Risk of decline in competence and sociability (standardization, change self-efficacy)

Shifting work across roles (e.g., advisory templates, event-driven, rule-based advisory)

Optimizing work efficiency

Overspecialization in the target working groups (standardization, commitment to change)

Allocating customers on a value-base (i.e. alternating allocations for standard customers, fixed allocations for premium customers)

Optimizing work efficiency

Overstraining increase of customers to manage per advisor (standardization, commitment to change)

Quite the contrary, many of the transformational change initiatives were not applied consistently, but often regarded by the employees as being superficial: “Those steps referred to as digitalization are rather beauty actions with screens instead of posters in windows.” Hence, many employees even perceived these initiatives as badly executed. Even the managers came to the conclusion that products and services are not thought of digitally yet (2.33). Hence, in the eyes of the managers, the focus of project work is more on improving existing products with proven structures (3.50), rather than bringing entirely new products to the market (2.13). The agility of the organization was therefore judged to be rather low (2.54), as the focus in the allocation of resources was also rather on the traditional business. Regulation was mentioned to be a higher priority (4.08), after all. 3.2 Selective Coding In this section, we connect our prior findings with the identified outcomes of organizational change by means of selective coding. In line with AST, we found that the spirit and organizational structure generate different forms of social interaction through the introduction of a new technology. Spirit suggests that organizations have to define clear goals, to which organizational change strives. We conclude that digital advisory technologies changed organizational structures in the banks and that only to some extent new organizational structures emerged when these technologies were applied. We also found that appropriation occurred only partially in the banks, depending on the actual organizational structures and appropriation support. This was a major constraining factor for organizational change. Table 2 summarizes the identified relationships between organizational changes and individual affordances and constraints. It indicates that many of the positively intended change measures also had an opposing constraining impact on the advisors. This carves out manifold dialectic tensions that inhibit the natural progression of the change projects in the banks. These tensions were identified as first, second and third order issues in the

216

M. Dehnert

technical and social subsystems of the banks and basically evolved around the spirit of the bank: “remaining traditional, but becoming digital”. This motif points to the fact that the mode of change corresponds with a digital improvement, but not a reengineering approach, which would be necessary to overcome the identified obstacles.

4 Conclusion In this paper, we introduced an approach for the analysis of organizational change in digital transformation of banks. From the researcher’s point of view, our goal was to describe the interplay between the entities of value creation and the customer interaction in incumbent retail banks. Therefore, we investigated affordances and constraints in an IT-supported personal customer advisory setting. The analysis revealed two major problems of organizational change in banks: Firstly, changes of organizational structures in the VCM are not following a re-engineering approach and are thus only barely appropriated. Secondly, new organizational structures do not emerge consistently and also not driven by the employees at the operational level. These limitations constraint possible outcomes of organizational change in the CIM. At this, we identified several dialectic tensions such as the antitheses of traditional and digital, standardization and individualization or professional and technical requirements that must be managed (i.e. balanced) in order to improve appropriation and structuration. The resolution of these tensions in the organizational culture of banks (i.e. spirit) is likely a decisive factor to achieve credibility and commitment among those employees who show at least a fundamental willingness to transform. Future research could especially take the identified dialectics of incumbent organizations into account and develop measures on how to resolve them. This could also possibly advance research on organizational ambidexterity. One limiting factor of the present study is that it is based on survey data and not on observational data. Hence, an in-depth analysis of social interactions would be beneficial in future. For a more detailed analysis of the behavior of bank employees work shadowing could be an appropriate method [56]. Future research could also extend the humancentered advisory setting to omnichannel use cases with both human- and non-humancentric customer interfaces (e.g., chatbots or robo advisors). Future research might extend our approach to other contexts, such as health service providers (e.g., IT-supported personal patient-doctor encounters). A comparison between heterogeneous firms and industries would also be beneficial. From the practitioner’s point of view, our case study analysis provides an indicative picture of the relationships of the different change entities within retail banks on their pathway towards digitalization. Our results show that individual affordances for customer interaction are limited when the digital transformation of the value creation model has not yet been completed.

References 1. Pousttchi, K., Dehnert, M.: Exploring the digitalization impact on consumer decision-making in retail banking. Electron. Markets 28(3), 265–286 (2018). https://doi.org/10.1007/s12525017-0283-0

Organizational Change Toward IT-Supported Personal Advisory

217

2. Alt, R., Puschmann, T.: Digitalisierung der Finanzindustrie. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-50542-7_6 3. Alt, R., Beck, R., Smits, M.T.: FinTech and the transformation of the financial industry. Electron. Markets 28(3), 235–243 (2018). https://doi.org/10.1007/s12525-018-0310-9 4. Pousttchi, K.: Digital transformation. In: Gronau, N., Becker, J., Sinz, E.J., Suhl, L., Leimeister, J.M. (eds.) Encyclopedia of Business Informatics. Potsdam (2017) 5. Setia, P., Venkatesh, V., Joglekar, S.: Leveraging digital technologies. How information quality leads to localized capabilities and customer service performance. MIS Q. 37, 565–590 (2013) 6. Skinner, C.: Digital Bank Strategies to Launch or Become a Digital Bank. Marshall Cavendish, Singapore (2014) 7. Rapp, A.A., Bachrach, D.G., Flaherty, K.E., Hughes, D.E., Sharma, A., Voorhees, C.M.: The role of the sales-service interface and ambidexterity in the evolving organization. A multilevel research agenda. J. Serv. Res. 20, 59–75 (2017) 8. Wright, G., Donaldson, B.: Sales information systems in the UK financial services industry. An analysis of sophistication of use and perceived barriers to adoption. Int. J. Inf. Manage. 22, 405–419 (2002) 9. Lyons, R.K., Chatman, J.A., Joyce, C.K.: Innovation in services. Corporate culture and investment banking. Calif. Manage. Rev. 50, 174–191 (2007) 10. Bennett, H., Durkin, M.G.: Developing relationship-led cultures – a case study in retail banking. Int. J. Bank Mark. 20, 200–211 (2002) 11. Pousttchi, K., Gleiß, A., Buzzi, B., Kohlhagen, M.: Technology impact types for digital transformation. In: Proceedings Conference on Business Informatics (2019) 12. Sebastian, I., Ross, J., Beath, C., Mocker, M., Moloney, K., Fonstad, N.: How big old companies navigate digital transformation. MIS Q. Executive 16, 197–213 (2017) 13. Zhu, K., Kraemer, K., Xu, S., Dedrick, J.: Information technology payoff in e-business environments. An international perspective on value creation of e-business in the financial services industry. J. Manage. Inf. Syst. 21, 17–54 (2004) 14. Frei, F.X., Harker, P.T.: Measuring the efficiency of service delivery processes. J. Serv. Res. 1, 300–312 (1999) 15. Graupner, E., Maedche, A.: Process digitisation in retail banking. An empirical examination of process virtualization theory. Int. J. Electron. Bus. 12, 364–379 (2015) 16. Gomber, P., Koch, J.-A., Siering, M.: Digital Finance and FinTech Current research and future research directions. J. Bus. Econ. 87, 537–580 (2017). https://doi.org/10.1007/s11573-0170852-x 17. Cortiñas, M., Chocarro, R., Villanueva, M.L.: Understanding multi-channel banking customers. J. Bus. Res. 63, 1215–1221 (2010) 18. Geng, D., Abhishek, V., Li, B.: When the bank comes to you: branch network and customer multi-channel banking behavior. In: ICIS Proceedings (2015) 19. Honka, E., Hortaçsu, A., Vitorino, M.A.: Advertising, consumer awareness, and choice. Evidence from the U.S. banking industry. RAND J. Econ. 48, 611–646 (2017) 20. Robertson, P.J., Roberts, D.R., Porras, J.I.: Dynamics of planned organizational change. Assessing empirical support for a theoretical model. Acad. Manage. J. 36, 619–634 (1993) 21. Cummings, T.G., Worley, C.G.: Organization Development & Change. Cengage Learning, Stamford (2015) 22. Jones, G.R.: Organizational Theory, Design, and Change. Pearson, Boston (2013) 23. Besson, P., Rowe, F.: Strategizing information systems-enabled organizational transformation. A transdisciplinary review and new directions. J. Strateg. Inf. Syst. 21, 103–124 (2012) 24. Henfridsson, O., Mathiassen, L., Svahn, F.: Managing technological change in the digital age. The role of architectural frames. J. Inf. Technol. 29, 27–43 (2014) 25. Markus, M.L.: Technochange management. Using IT to drive organizational change. J. Inf. Technol. 19, 4–20 (2004)

218

M. Dehnert

26. Oreg, S., Berson, Y.: Leaders’ impact on organizational change. Bridging theoretical and methodological chasms. Acad. Manage. Ann. 13, 272–307 (2019) 27. Porras, J.L., Robertson, P.J.: Organisational development: theory, practice and research. In: Dunnette, M.D., Hough, L.M. (eds.) Handbook of Industrial and Organizational Psychology, pp. 719–822. Consulting Psychologists Press (1992) 28. van de Ven, A.H., Poole, M.S.: Explaining development and change in organizations. Acad. Manag. Rev. 20, 510 (1995) 29. Hinsen, S., Jöhnk, J., Urbach, N.: Disentangling the concept and role of continuous change for IS research. In: ICIS Proceedings (2019) 30. Lyytinen, K., Newman, M.: Explaining information systems change. A punctuated sociotechnical change model. Eur. J. Inf. Syst. 17, 589–613 (2008) 31. Richter, A., Heinrich, P., Stocker, A., Schwabe, G.: Digital work design. Bus. Inf. Syst. Eng. 60(3), 259–264 (2018). https://doi.org/10.1007/s12599-018-0534-4 32. DeSanctis, G., Poole, M.S.: Capturing the complexity in advanced technology use. Adaptive structuration theory. Organ. Sci. 5, 121–147 (1994) 33. Poole, M.S., DeSanctis, G.: Structuration theory in information systems research. Methods and controversies. In: Whitman, M.E., Woszczynski, A.B. (eds.) The Handbook of Information Systems Research, pp. 206–249. Idea Group Pub, Hershey, PA (2004) 34. Jones, K.: Giddens’s structuration theory and information systems research. MIS Q. 32, 127 (2008) 35. Hartl, E., Hess, T.: IT projects in digital transformation: a socio-technical journey towards technochange. In: ECIS Proceedings (2019) 36. Ahearne, M., Lam, S.K., Mathieu, J.E., Bolander, W.: Why are some salespersons better at adapting to organizational change? J. Market. 74, 65–79 (2009) 37. Bostrom, R.P., Gupta, S., Thomas, D.: A meta-theory for understanding information systems within sociotechnical systems. J. Manage. Inf. Syst. 26, 17–48 (2009) 38. Zammuto, R.F., Griffith, T.L., Majchrzak, A., Dougherty, D.J., Faraj, S.: Information technology and the changing fabric of organization. Organ. Sci. 18, 749–762 (2007) 39. Markus, M.L., Silver, M.: A foundation for the study of IT effects. A new look at DeSanctis and Poole’s concepts of structural features and spirit. J. Assoc. Inf. Syst. 9, 609–632 (2008) 40. Leonardi, P.M.: When flexible routines meet flexible technologies. Affordance, constraint, and the imbrication of human and material agencies. MIS Q. 35, 147 (2011) 41. Hammer, M.: The process audit. Harvard Bus. Rev. 85, 111 (2007) 42. Dolata, M., Schwabe, G.: Tuning in to more interactivity – learning from IT support for advisory service encounters. I-Com 16, 23–33 (2017) 43. Schmidt-Rauch, S., Nussbaumer, P.: Putting value co-creation into practice. A case for advisory support. In: ECIS Proceedings (2011) 44. Ballantyne, D., Varey, R.J.: Creating value-in-use through marketing interaction. The exchange logic of relating, communicating and knowing. Market. Theory 6, 335–348 (2016) 45. Vargo, S.L., Lusch, R.F.: Evolving to a new dominant logic for marketing. J. Market. 68, 1–17 (2004) 46. Grace, A., Finnegan, P., Butler, T.: Service co-creation with the customer. The role of information systems. In: 16th European Conference on Information Systems, ECIS Proceedings (2008) 47. Yin, R.K.: Case Study Research and Applications. Design and Methods. Sage, Los Angeles, London, New Dehli, Singapore, Washington DC, Melbourne (2018) 48. Miles, M.B., Huberman, A.M., Saldaña, J.: Qualitative Data Analysis. A Methods Sourcebook. Sage, Los Angeles, London, New Delhi, Singapore, Washington DC (2014) 49. Strauss, A.L., Corbin, J.M.: Basics of Qualitative Research. Grounded Theory Procedures and Techniques. Sage, Newbury Park (1990)

Organizational Change Toward IT-Supported Personal Advisory

219

50. Matavire, R., Brown, I.: Profiling grounded theory approaches in information systems research. Eur. J. Inf. Syst. 22, 119–129 (2013) 51. Star, S.L., Ruhleder, K.: Steps toward an ecology of infrastructure. Design and access for large information spaces. Inf. Syst. Res. 7, 111–134 (1996) 52. Gioia, D.A., Corley, K.G., Hamilton, A.L.: Seeking Qualitative Rigor in Inductive Research. Organ. Res. Methods 16, 15–31 (2013) 53. Giddens, A.: The Constitution of Society. Outline of the Theory of Structuration. Polity Press, Cambridge (1984) 54. Stouten, J., Rousseau, D.M., de Cremer, D.: Successful organizational change. Integrating the management practice and scholarly literatures. Acad. Manage. Ann. 12, 752–788 (2018) 55. Fugate, M., Prussia, G.E., Kinicki, A.J.: Managing employee withdrawal during organizational change. J. Manag. 38, 890–914 (2012) 56. McDonald, S.: Studying actions in context. A qualitative shadowing method for organizational research. Qual. Res. 5, 455–473 (2016)

Author Index

Brink, Henning

19, 82

Mouratidis, Haralambos Müller, Oliver 191

53

Ćukušić, Maja 143 Dehnert, Maik 205

Packmohr, Sven 19, 82 Pincuka, Marina 67 Popovs, Rolands 101

Gaidels, Edgars 128 Góralski, Patrick 35 Grabis, Jānis 101 Gross, Christina 3

Reiz, Achim

Haidabrus, Bohdan 101 Henkel, Martin 174 Jadrić, Mario 143 Jegermane, Marina 67 Kiopa, Daiga 67 Kirikova, Marite 67, 128 Lackes, Richard

3

Mijač, Tea 143 Miltina, Zane 67 Minkēviča, Vineta 101

111

Sandkuhl, Kurt 111 Siepermann, Markus 3 Stasko, Arnis 67 Stirna, Janis 53 Tell, Anders W. 174 Thiess, Tiemo 191 Vencovský, Filip 159 Vogelsang, Kristin 19, 82 Wichmann, Johannes 35 Wißotzki, Matthias 35 Zdravkovic, Jelena 53