Empirical Studies on the Development of Executable Business Processes [1st ed. 2019] 978-3-030-17665-5, 978-3-030-17666-2

This book collects essential research on the practical application of executable business process modeling in real-world

254 66 5MB

English Pages XXVI, 223 [236] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Empirical Studies on the Development of Executable Business Processes [1st ed. 2019]
 978-3-030-17665-5, 978-3-030-17666-2

Table of contents :
Front Matter ....Pages i-xxvi
Front Matter ....Pages 1-1
Empirical Research in Executable Process Models (Daniel Lübke, Cesare Pautasso)....Pages 3-12
A Template for Categorizing Business Processes in Empirical Research (Daniel Lübke, Ana Ivanchikj, Cesare Pautasso)....Pages 13-29
Front Matter ....Pages 31-31
Effectively and Efficiently Implementing Complex Business Processes: A Case Study (Volker Stiehl, Marcus Danei, Juliet Elliott, Matthias Heiler, Torsten Kerwien)....Pages 33-57
Analysis of Data-Flow Complexity and Architectural Implications (Daniel Lübke, Tobias Unger, Daniel Wutke)....Pages 59-81
Front Matter ....Pages 83-83
Requirements Comprehension Using BPMN: An Empirical Study (Olga Lucero Vega-Márquez, Jaime Chavarriaga, Mario Linares-Vásquez, Mario Sánchez)....Pages 85-111
Developing Process Execution Support for High-Tech Manufacturing Processes (Irene Vanderfeesten, Jonnro Erasmus, Konstantinos Traganos, Panagiotis Bouklis, Anastasia Garbi, George Boultadakis et al.)....Pages 113-142
Developing a Platform for Supporting Clinical Pathways (Kathrin Kirchner, Nico Herzberg)....Pages 143-164
Front Matter ....Pages 165-165
IT-Centric Process Automation: Study About the Performance of BPMN 2.0 Engines (Vincenzo Ferme, Ana Ivanchikj, Cesare Pautasso, Marigianna Skouradaki, Frank Leymann)....Pages 167-197
Effectiveness of Combinatorial Test Design with Executable Business Processes (Daniel Lübke, Joel Greenyer, David Vatlin)....Pages 199-223

Citation preview

Daniel Lübke · Cesare Pautasso Editors

Empirical Studies on the Development of Executable Business Processes

Empirical Studies on the Development of Executable Business Processes

Daniel L¨ubke • Cesare Pautasso Editors

Empirical Studies on the Development of Executable Business Processes

123

Editors Daniel L¨ubke Fachgebiet Software Engineering Leibniz Universit¨at Hannover Hannover, Germany

Cesare Pautasso Faculty of Informatics Universit`a della Svizzera italiana (USI) Lugano, Switzerland

ISBN 978-3-030-17665-5 ISBN 978-3-030-17666-2 (eBook) https://doi.org/10.1007/978-3-030-17666-2 © Springer Nature Switzerland AG 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To my family and my friends. Without them none of this would be possible. I am very glad to be surrounded by people who are that nice and support me every day. Thank you! – Daniel To my Hope for the future – Cesare

Foreword

Executable business processes. At first sight, the topic may sound trivial to some people used to model business processes on a regular basis, yet I feel it is not. Getting from a graphical model to software (or a configuration thereof) that is able to automatically orchestrate other software components, such as web services or generic web APIs, or to coordinate people performing different business activities, requires not only familiarity with the chosen modeling notation but also intimate knowledge of the target runtime environment and of the model’s execution semantics. Let’s be frank: while it is relatively easy to draw some form of understandable process model on paper, for example, using Business Process Model and Notation (BPMN), how many of us have the necessary knowledge and skills to also deploy that model on a given business process engine and to successfully run it at the first attempt? I don’t. At least, not any longer. I first got in touch with business process modeling and execution during my PhD in the early 2000s, where I worked as teaching assistant of a course on workgroup and workflow management systems at Politecnico di Milano. I adopted model- and process-driven paradigms in my research since, e.g., to enable end users to model own process logics or to specify complex crowdsourcing processes. I used them in the research papers written together with other PhD students and colleagues, collecting lots of comments and criticism from reviewers. As senior PC member and PC chair of the International Conference on Business Process Management (BPM), I then had the chance to look behind the curtain and to review (and criticize) myself the work of others—an activity that also allowed me to get to know and appreciate the editors of this volume. Both Cesare and Daniel are known for their sensibility toward well-designed business processes and concrete practices. No frills. This concreteness is evident in this volume. In fact, it brings together contributions that all provide some form of empirical perspective on or evidence of stateof-the-art business process management challenges. The first part of the volume looks into architectural aspects and covers the implementation of process-driven applications and the analysis of how data flow logics are supported. The second part proposes a set of case studies and experiments on the suitability of process modeling notations for the collection of requirements and on the implementation of process vii

viii

Foreword

support in both manufacturing and services industries—two sectors with contrasting business process requirements, ranging from very fine grained and repetitive to coarse grained and case based. The third and last part concerns quality, quality in terms of engine performance and process model correctness. The volume is the result of the joint work of authors from all over the world, and evidence itself of the significance and widespread acknowledgment of the problem of correct process execution and the need for concrete and repeatable results. As researchers, teachers, and practitioners, how many times have we seen process models containing gateway nodes testing conditions without any prior activity producing the necessary data for the evaluation of the condition? How many times have we seen models where it was impossible to understand who was supposed to execute a given activity, sometimes not even being able to tell whether the executor was a software or a human agent? And how many times was it evident that a given model, although formally correct, would not meet expected running times? Well, this volume will not teach you how to model your business processes or how to do so better. For this, lots of other books already exist. This volume provides you with a reasoned snapshot of empirical evidence showing that your impressions are right and of actionable results that you may want to know to prevent your concerns from coming true. That is, this volume shows you what other people like you actually learned in their projects, case studies, and experiments and how they solved their problems, in practice. I am confident that, as a reader, you will find in this volume both practical hints and research stimuli, just like I did, and that you will appreciate the thoughtful selection of content as well as the meticulous work by the contributing authors. Associate Professor Politecnico di Milano, Milano, Italy January 2019

Florian Daniel

Preface

Thank you for your interest in the topic of empirical research in the domain of executable business processes! We want to take you on an interesting tour on how technologies in this domain can be applied in practice and what obstacles and benefits projects actually encounter and how they can overcome the former and achieve the latter. Executable business processes are one of the success stories of model-driven engineering (MDE) at the intersection of software engineering (SE) and business process management (BPM). On the one hand, an executable model is formally and precisely defined so that a computer is able to interpret it and execute it. On the other hand, models are visualized so that humans can describe, document, and optimize business processes at a higher level of abstraction than with traditional textual programming languages. While these important research areas have been long separated from each other, this book is an attempt at cross-fertilization, driven by the understanding that business processes are the software running today’s digital organizations and that achieving a precise representation of such processes so that they can be reliably executed requires to adopt rigorous software engineering practices. With the rising importance of digitalization and fully end-to-end automated or at least software-supported business processes, we expect the interest in executable business processes to rise and software technology supporting business processes to become ever more important in organizations across all domains and all sizes. While research into executable business processes has been ongoing for a few decades, as witnessed, for example, by the significant efforts put into applying process technology within service-oriented architectures, our focus in this book is on empirical research. We wanted to compile an up-to-date snapshot featuring empirical case studies in order to assess and give visibility to examples of practical impact of BPM within the industry. To lay some groundwork before starting on this book, we realized that empirical studies were hard to compare and that researchers lacked in their design concerning the collection of metadata of the analyzed processes, since such metadata is lacking from plain process model collections. Therefore, our first step toward this book was ix

x

Preface

the development of a template that allows an easy overview of the business process models used in a publication and gives researchers a template for collecting general metadata. You will find this template used in every chapter of this book, where it adds value to the chapter. A description of the contents of this template and how it was derived can be found in Chap. 2 in this book. In the second half of 2017, we advertised a call looking for chapters that investigate questions of interests to both academia (e.g., identifying challenges for which no solution exists, new insights into how existing approaches are really used) and industry (e.g., guidelines for using certain technologies, guidelines for modeling understandable executable processes). Our open call was answered with proposals by many interested potential contributors, spanning across both industry and academia, out of which we selected based on their relevance and quality the chapters in the book you are currently reading. As a result, the book collects valuable real-world experience on the development and practical usage of executable business processes in software architectures, e.g., model-driven solutions that are built with languages such as BPEL or BPMN for the support and automation of digital business processes. This experience was acquired within different application domains (e.g., healthcare, high-tech manufacturing, software development), and it covers most phases of the software engineering life cycle (from requirements analysis to testing). We are also grateful to our chapter authors for explicitly featuring insights and takeaway messages directed to practitioners as well as to researchers. Hannover, Germany Lugano, Switzerland January 2019

Daniel Lübke Cesare Pautasso

How to Read This Book

Besides the background chapters found in Part I, this book presents research results and industry experience on a variety of topics related with executable business processes. Part II is concerned with architectural implications: what do we need to think about when implementing executable business process solutions. In Part III, two case studies and one experiment are presented. The case studies deal with how to successfully implement executable business processes in different domains, while the experiment is concerned with analyzing the effect of complementing use cases with BPMN process models. The two chapters of Part IV are concerned with extrafunctional quality attributes (i.e., performance benchmarking and testability) of the solutions implemented with executable BPM. You can read and skip ahead and back the different chapters as you like. All chapters close with takeaways for both researchers and practitioners. Researchers can find open challenges and new ideas for their research, while practitioners can read how to apply in their projects the valuable insights shared by the authors.

xi

Book Chapters Overview

Part I. Background Chapter 1, Empirical Research in Executable Process Models. Perhaps one of the reasons BPM research concentrates on analytical modeling of business processes is that BPMN is standardized fully in this regard and modeling tools support the notation very well. In this book, we focus instead on empirical research in executable process models. This requires a complete and precise specification of process models, which graduate from “PowerPoint slide” into an executable artifact running inside a workflow engine in the Cloud. In this chapter, we introduce fundamental background concepts defining executable business processes, discussing empirical research methods suitable for business process management, and presenting different architectural options for process execution and close with a brief history leading toward executable BPMN. Chapter 2, A Template for Categorizing Business Processes in Empirical Research. Empirical research is becoming increasingly important for understanding the practical uses of and problems with business process technology in the field. However, no standardization on how to report observations and findings exists. This sometimes leads to research outcomes which report partial or incomplete data and makes published results of replicated studies on different data sets hard to compare. In order to help the research community improve reporting on business process models and collections and their characteristics, this chapter defines a modular template with the aim of reports’ standardization, which could also facilitate the creation of shared business process repositories to foster further empirical research in the future. The template has been positively evaluated by representatives from both BPM research and industry. The survey feedback has been incorporated in the template. We have applied the template to describe a real-world executable WSBPEL process collection, measured from a static and dynamic perspective.

xiii

xiv

Book Chapters Overview

Part II. Solution Architecture Chapter 3, Effectively and Efficiently Implementing Complex Business Processes: A Case Study. The implementation of business processes has been neglected for many years in research. It seemed to be that only hard coding was the appropriate solution for business process implementations. As a consequence in classical literature about business process management (BPM), the focus was mainly on the management aspects of BPM, less on aspects regarding an effective and efficient implementation methodology. This has changed significantly since the advent of BPMN 2.0 (Business Process Model and Notation) in early 2011. BPMN is a graphical notation for modeling business processes in an easy to understand manner. Because the BPMN standard had the process execution in mind when it was designed, it allows for a new way of implementing business processes, on which the process-driven approach (PDA) is based. This approach has been applied in a huge project at SAP SE since 2015 comprising more than 200 business-critical processes. In order to get an impression about the power of the process-driven approach for really complex business process implementation scenarios, this chapter explains the basics about the process-driven approach and shares experiences made during the execution of the project. Chapter 4, Analysis of Data-Flow Complexity and Architectural Implications. Service orchestrations are frequently used to assemble software components along business processes. Despite much research and empirical studies into the use of control flow structures of these specialized languages, like BPEL and BPMN2, no empirical evaluation of data flow structures and languages, like XPath, XSLT, and XQuery, has been made yet. This paper presents a case study on the use of data transformation languages in industry projects in different companies and across different domains, thereby showing that data flow is an important and complex property of such orchestrations. The results also show that proprietary extensions are used frequently and that the design favors the use of modules, which allows for reusing and testing code. This case study is a starting point for further research into the data flow dimension of service orchestrations and gives insights into practical problems that future standards and theories can rely on.

Part III. Case Studies and Experiments Chapter 5, Requirements Comprehension Using BPMN: An Empirical Study. The Business Process Model and Notation (BPMN) has become the de facto standard for process modeling. Currently, BPMN models can be (1) analyzed or simulated using specialized tools, (2) executed using business process management systems (BPMSs), or (3) used for requirements elicitation. Although there are many studies comparing BPMN to other modeling techniques for analyzing and executing processes, there are few showing the suitability of BPMN models as a

Book Chapters Overview

xv

source for requirements comprehension in projects where process-aware software is built without using BPMSs. This chapter presents a study aimed at comparing the comprehension of software requirements regarding a business process using either BPMN or traditional techniques, such as use cases. In our study, we analyzed responses of 120 undergraduate and graduate students regarding the requirements comprehension achieved when using only BPMN models, only use cases, or both. The results do not show significant impact of the artifacts on the comprehension level. However, when the understanding of the requirement involves sequence of activities, using the BPMN shows better results on the comprehension time. Chapter 6, Developing Process Execution Support for High-Tech Manufacturing Processes. This chapter describes the development of an information system to control the execution of high-tech manufacturing processes from the business process level, based on executable process models. The development is described from process analysis to requirements elicitation to the definition of executable business process, for three pilot cases in our recent HORSE project. The HORSE project aims to develop technologies for smart factories, making end-toend high-tech manufacturing processes, in which robots and humans collaborate, more flexible, more efficient, and more effective to produce small batches of customized products. This is done through the use of Internet of Things, Industry 4.0, collaborative robot technology, dynamic manufacturing process management, and flexible task allocation between robots and humans. The result is a manufacturing process management system (MPMS) that orchestrates the manufacturing process across work cells and production lines and operates based on executable business process models defined in BPMN. Chapter 7, Developing a Platform for Supporting Clinical Pathways. Hospitals are facing high pressure to be profitable with decreasing funds in a stressed healthcare sector. This situation calls for methods to enable process management and intelligent methods in their daily work. However, traditional process intelligence systems work with logs of execution data that is generated by workflow engines controlling the execution of a process. But the nature of the treatment processes requires the doctors to work with a high freedom of action, rendering workflow engines unusable in this context. In this chapter, we describe a process intelligence approach to develop a platform for clinical pathways for hospitals without using workflow engines. Our approach is explained using a case in liver transplantation, but is generalizable on other clinical pathways as well.

Part IV. Quality Chapter 8, IT-Centric Process Automation: Study About the Performance of BPMN 2.0 Engines. Workflow management systems (WfMSs) are broadly used in enterprise to design, deploy, execute, monitor, and analyze automated business processes. Current state-of-the-art WfMSs evolved into platforms delivering complex service-oriented applications that need to satisfy enterprise-grade performance

xvi

Book Chapters Overview

requirements. With the ever growing number of WfMSs that are available in the market, companies are called to choose which product is optimal for their requirements and business models. Factors that WfMSs’ vendors use to differentiate their products are mainly related to functionality and integration with other systems and frameworks. They usually do not differentiate their systems in terms of performance in handling the workload they are subject to or in terms of hardware resource consumption. Recent trend saw WfMSs deployed on environments where performance in handling the workload really matters, because they are subject to handling millions of workflow instances per day, as does the efficiency in terms of resource consumption, e.g., if they are deployed in the Cloud. Benchmarking is an established practice to compare alternative products, which helps to drive the continuous improvement of technology by setting a clear target in measuring and assessing its performance. In particular for WfMSs, there is not yet a standard accepted benchmark, even if standard workflow modeling and execution languages such as BPMN 2.0 have recently appeared. In this chapter, we present the challenges of establishing the first standard benchmark for assessing and comparing the performance of WfMSs in a way that is compliant to the main requirements of a benchmark: portability, scalability, simplicity, vendor neutrality, repeatability, efficiency, representativeness, relevance, accessibility, and affordability. A possible solution is also discussed, together with a use case of micro-benchmarking of open-source production WfMSs. The use case demonstrates the relevance of benchmarking the performance of WfMSs by showing relevant differences in terms of performance and resource consumption among the benchmarked WfMSs. Chapter 9, Effectiveness of Combinatorial Test Design with Executable Business Processes. Executable business processes contain complex business rules, control flow, and data transformations, which makes designing good tests difficult and, in current practice, requires extensive expert knowledge. In order to reduce the time and errors in manual test design, we investigated using automatic combinatorial test design (CTD) instead. CTD is a test selection method that aims at covering all interactions of a few input parameters. For this investigation, we integrated CTD algorithms with an existing framework that combines equivalence class partitioning with automatic BPELUnit test generation. Based on several industrial cases, we evaluated the effectiveness and efficiency of test suites selected via CTD algorithms against those selected by an expert and random tests. The experiments show that CTD tests are not more efficient than tests designed by experts, but that they are a sufficiently effective automatic alternative.

Acknowledgements

We are grateful to all of our authors for their efforts in contributing to the book, their time dedicated to the cross-review and revision of their chapters, and most of all their patience with the long publication process. We would also like to thank our external reviewers very much for their constructive feedback. Without them, such a book would have not been possible. Therefore, we like to offer warmful thanks to (in alphabetical order): • • • • • •

Dieter Burger Peter Fasler Tammo van Lessen Kai Niklas Kurt Schneider Barbara Ulrich

xvii

Contents

Part I

Introduction and Background

1 Empirical Research in Executable Process Models . .. . . . . . . . . . . . . . . . . . . . Daniel Lübke and Cesare Pautasso 1.1 Executable Business Processes .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 Empirical Research in Business Process Management .. . . . . . . . . . . . . . . 1.3 Architectures for Process Execution . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4 Executable BPMN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2 A Template for Categorizing Business Processes in Empirical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Daniel Lübke, Ana Ivanchikj, and Cesare Pautasso 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2 Motivation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3 Template .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3.1 Metadata Template.. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3.2 BPEL Element and Activity Count Template . . . . . . . . . . . . . . . . . 2.3.3 BPEL Extensions Template . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3.4 Process Runtime Performance Template . .. . . . . . . . . . . . . . . . . . . . 2.4 Validation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.4.1 Survey with Researchers and Industry Experts.. . . . . . . . . . . . . . . 2.4.2 Case Study with Industry Processes . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.5 Related Work .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

3 3 4 6 9 10 13 13 14 15 16 18 18 18 19 19 22 26 27 28

xix

xx

Part II

Contents

Solution Architecture

3 Effectively and Efficiently Implementing Complex Business Processes: A Case Study.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Volker Stiehl, Marcus Danei, Juliet Elliott, Matthias Heiler, and Torsten Kerwien 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2 The Process-Driven Approach.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.1 Definition of a Process-Driven Application . . . . . . . . . . . . . . . . . . . 3.2.2 Process-Driven Collaboration (BizDevs) ... . . . . . . . . . . . . . . . . . . . 3.2.3 Process-Driven Thinking . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.4 Process-Driven Methodology . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.5 Process-Driven Architecture and Process-Driven Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.6 Process-Driven Technologies . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3 Implementation Project at SAP Language Services Using the Process-Driven Approach . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4 Conclusions and Outlook .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4.1 Conclusions for Researchers and Practitioners .. . . . . . . . . . . . . . . 3.4.2 Outlook .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4 Analysis of Data-Flow Complexity and Architectural Implications. . . . Daniel Lübke, Tobias Unger, and Daniel Wutke 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2 Related Work .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2.1 Earlier Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2.2 Theory .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.3 Business Process Execution Language (BPEL) . . .. . . . . . . . . . . . . . . . . . . . 4.4 Case Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.1 Research Questions .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.2 Case and Subject Selection .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.3 Data Collection and Analysis Procedure . .. . . . . . . . . . . . . . . . . . . . 4.4.4 Validity Procedure . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5.2 Interpretation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5.3 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6.1 Conclusions for Researchers . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6.2 Conclusions for Practitioners.. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6.3 Outlook and Future Work . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

33

34 36 37 37 38 39 40 45 46 53 53 55 57 59 59 61 61 61 62 64 64 66 68 69 69 70 74 76 78 78 78 79 80

Contents

Part III

xxi

Case Studies and Experiments

5 Requirements Comprehension Using BPMN: An Empirical Study .. . . Olga Lucero Vega-Márquez, Jaime Chavarriaga, Mario Linares-Vásquez, and Mario Sánchez 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.2 Related Work .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3.2 Participants Distribution .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3.3 Analysis Method .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.4.1 Exploratory Data Analysis . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.4.2 Hypothesis Significant Testing . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.4.3 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5 Threats to Validity.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.6 Conclusions and Outlook .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6 Developing Process Execution Support for High-Tech Manufacturing Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Irene Vanderfeesten, Jonnro Erasmus, Konstantinos Traganos, Panagiotis Bouklis, Anastasia Garbi, George Boultadakis, Remco Dijkman, and Paul Grefen 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3 Case Study Analysis and Requirements Elicitation . . . . . . . . . . . . . . . . . . . 6.3.1 Case Study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.2 General Requirements Framework .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Architecture of the HORSE System. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.5 Executable Process Models .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.5.1 Executable Processes for Case Study 1 . . . .. . . . . . . . . . . . . . . . . . . . 6.5.2 Method to Develop Executable Process Models . . . . . . . . . . . . . . 6.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.7 Conclusions and Outlook .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.7.1 Conclusions for Researchers . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.7.2 Takeaways for Practitioners . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.7.3 Outlook .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7 Developing a Platform for Supporting Clinical Pathways .. . . . . . . . . . . . . . Kathrin Kirchner and Nico Herzberg 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2 Case Description and Pathway Modeling .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3 Related Work .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

85

85 86 87 90 93 95 95 95 97 102 105 107 109 113

113 115 117 118 124 126 130 130 133 137 139 139 140 140 141 143 143 144 146

xxii

Contents

7.4 Application of the Methodology . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.4.1 Design Process Event Monitoring Points . .. . . . . . . . . . . . . . . . . . . . 7.4.2 Connect PEMPs to Data Sources . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.4.3 Collect Monitoring Data. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.4.4 Monitor and Analyze Processes . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5.1 Technical Evaluation . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5.2 User Experience Evaluation .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.6 Conclusions and Outlook .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.6.1 Conclusions for Researchers . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.6.2 Takeaways for Practitioners . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.6.3 Outlook .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Part IV

148 149 150 152 152 154 154 158 160 160 161 162 162

Quality

8 IT-Centric Process Automation: Study About the Performance of BPMN 2.0 Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Vincenzo Ferme, Ana Ivanchikj, Cesare Pautasso, Marigianna Skouradaki, and Frank Leymann 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2 Challenges and State of the Art . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3 BPMN 2.0 WfMS Performance Benchmarking Methodology . . . . . . . 8.3.1 BPMN 2.0 Representative Workload Mixes .. . . . . . . . . . . . . . . . . . 8.3.2 Benchmark Execution and Results . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4 WfMS Micro-Benchmarking: A Use Case . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.1 Workload Definition . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.2 Environment Setup . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.3 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.4 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.5 Lessons Learned and Conclusion . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9 Effectiveness of Combinatorial Test Design with Executable Business Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Daniel Lübke, Joel Greenyer, and David Vatlin 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3 Related Work .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4 Experiment Design .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4.1 Research Questions .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4.2 Case Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4.3 Data Collection Procedure . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4.4 Analysis Procedure . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

167

168 169 170 174 177 179 179 182 183 185 194 195 199 199 201 204 205 205 206 206 208

Contents

9.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.5.1 Measurements .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.5.2 Interpretation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.5.3 Evaluation of Validity . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6.2 Future Work .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

xxiii

209 209 210 219 220 220 221 222

Contributors

Panagiotis Bouklis European Dynamics, Athens, Greece George Boultadakis European Dynamics, Athens, Greece Jaime Chavarriaga Universidad de los Andes, Bogotá, Colombia Marcus Danei SAP SE, Walldorf, Germany Remco Dijkman School of Industrial Engineering, Eindhoven University of Technology, Eindhoven, Netherlands Juliet Elliott SAP SE, Walldorf, Germany Jonnro Erasmus School of Industrial Engineering, Eindhoven University of Technology, Eindhoven, Netherlands Vincenzo Ferme Software Institute, Faculty of Informatics, USI, Lugano, Switzerland Anastasia Garbi European Dynamics, Athens, Greece Joel Greenyer Leibniz Universität Hannover, Fachgebiet Software Engineering, Hannover, Germany Paul Grefen School of Industrial Engineering, Eindhoven University of Technology, Eindhoven, Netherlands Matthias Heiler SAP SE, Walldorf, Germany Nico Herzberg SAP SE, Dresden, Germany Ana Ivanchikj Software Institute, Faculty of Informatics, USI, Lugano, Switzerland Torsten Kerwien itelligence AG, Bielefeld, Germany Kathrin Kirchner Technical University of Denmark, Kgs. Lyngby, Denmark

xxv

xxvi

Contributors

Frank Leymann Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Stuttgart, Germany Mario Linares-Vásquez Universidad de los Andes, Bogotá, Colombia Daniel Lübke Leibniz Universität Hannover, Fachgebiet Software Engineering, Hannover, Germany Cesare Pautasso Software Institute, Faculty of Informatics, USI, Lugano, Switzerland Mario Sánchez Universidad de los Andes, Bogotá, Colombia Marigianna Skouradaki Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Stuttgart, Germany Volker Stiehl Faculty of Electrical Engineering and Computer Science, Technische Hochschule Ingolstadt (THI), Ingolstadt, Germany Konstantinos Traganos School of Industrial Engineering, Eindhoven University of Technology, Eindhoven, Netherlands Tobias Unger Opitz Consulting Deutschland GmbH, Nordrhein-Westfalen, Germany Irene Vanderfeesten School of Industrial Engineering, Eindhoven University of Technology, Eindhoven, Netherlands David Vatlin Leibniz Universität Hannover, Fachgebiet Software Engineering, Hannover, Germany Olga Lucero Vega-Márquez Universidad de los Andes, Bogotá, Colombia Universidad de los Llanos, Villavicencio, Colombia Daniel Wutke W&W Informatik GmbH, Ludwigsburg, Germany

Part I

Introduction and Background

Chapter 1

Empirical Research in Executable Process Models Daniel Lübke and Cesare Pautasso

Abstract Perhaps one of the reasons BPM research concentrates on analytical modeling of business processes is that BPMN is standardized fully in this regard and modeling tools support the notation very well. In this book, we focus instead on empirical research in executable process models. This requires a complete and precise specification of process models, which graduate from “PowerPoint slide” into an executable artifact running inside a workflow engine in the Cloud. In this chapter, we introduce fundamental background concepts defining executable business processes, discussing empirical research methods suitable for business process management, and presenting different architectural options for process execution and close with a brief history leading toward executable BPMN.

1.1 Executable Business Processes Modeling executable business processes requires domain knowledge from business process management (BPM) combined with software engineering (SE) skills. Executable models are at the foundation of model-driven engineering (MDE), where running software systems are generated from formally specified, sufficiently detailed, and precisely defined representations of processes [14]. These specify the behavior of software compositions, both in terms of the control flow and data flow connecting different types of tasks, which when successfully executed together allows to achieve a given goal [20]. Is executable process modeling a refined form of visual programming? Or why are traditional developers skeptical when approaching

D. Lübke () Leibniz Universität Hannover, Fachgebiet Software Engineering, Hannover, Germany e-mail: [email protected] C. Pautasso Software Institute, Faculty of Informatics, USI, Lugano, Switzerland e-mail: [email protected] © Springer Nature Switzerland AG 2019 D. Lübke, C. Pautasso (eds.), Empirical Studies on the Development of Executable Business Processes, https://doi.org/10.1007/978-3-030-17666-2_1

3

4

D. Lübke and C. Pautasso

BPMN modeling tools and execution engines? And should business analysts, thanks to executable process models, be the ones driving the integration of large and complex IT information systems? These and many other questions have been investigated [16, 24, 26], sometimes independently and sometimes in an interdisciplinary way, across software engineering and business process management. For example, in the area of quality assurance, the BPM community has focused much on model verification [34, 36], while SE has developed its own set of model checking methods and a depth of further resources concerned with testing (e.g., Myers [22]) and test coverage metrics (e.g., Malaiya et al. [21]). This has also led to combining techniques from both domains for executable processes, e.g., by Foster et al. [7]. Another overlapping activity is the elicitation of requirements as SE puts it, or process analysis as BPM names it. Interestingly, a whole workshop series called REBPM (https://www.rebpm.org/) is concerned with the application of methodologies of each domain in this intersection. Considering the empirical results we have collected in this book, the two disciplines can still learn much from one another. In this chapter, we provide an overview over the main topics covered by this book—Empirical Research in Business Process Management (Sect. 1.2), Architectures for Process Execution (Sect. 1.3), and Executable BPMN (Sect. 1.4).

1.2 Empirical Research in Business Process Management Empirical research tries to find and explain effects in real life by observing the application of technologies (or other objects in research domains). While computer science was very theory- and technology-driven in its beginnings, the whole research area concerned with software engineering has built up a tremendous amount of empirical studies especially over the last years. The Journal of Empirical Software Engineering is one of the premiere journals to publish and is highly regarded in the academic world. The same is also true for BPM, which is conducting intensive research in the application of defined methodologies, notations, and tools. However, empirical research is limited by the amount of access to primary sources. In this regard, SE benefits from the availability of open-source platforms and projects [41], while empirical—especially experimental—BPM research is mostly concerned with analytical, i.e., non-executable, process modeling across industry and public administration [9]. While the theoretical foundations in computer science and BPM are very valuable and necessary, in the end, going into the “wild,” i.e., trying things out and gathering feedback from and in practice, is the only way for improving our understanding of the application of novel technologies. Many software, BPM, and digitalization projects are not as successful as we want them to be. If we only look at developers’ resistance in adopting workflow languages, we can already see the huge need for empirical research.

1 Empirical Research in Executable Process Models

5

There are three main research designs researchers can choose from and combine in order to answer their research questions: • Experiments try to control the environment as much as possible and induce wanted, controlled effects. Thus, usually two groups are involved, which are differing in the aspects to be understood. Examples for experiments include studying modeling practices [18], notations [32], visual metaphors [39], or layout strategies (e.g., in UML [33] and BPMN [5]). However, because the environment in an experiment needs to be controlled, it is usually only possible to study the impact of small changes. The replication of whole (industrial) projects, which are differing in only some aspects, is an impossible task. For example, it is impossible as a researcher to conduct an experiment that focuses on two different architectural styles. The (realistic, industrial) software solution requires many professional developers for several months. Because an experiment needs a large data set in order to derive statistical conclusions, such an experiment would need to fund a large number of project teams for a nontrivial time span. More information about experiments can be found in [40]. • Case studies are somewhat complementary to experiments: researchers analyze real projects, try to gather data, and interview people. While therefore they can research in the most realistic environment possible, in real projects, they lose the control about the environment and influencing variables. Therefore, usually many case studies need to be made, or findings of case studies need to be validated by other research methods in order to improve them. However, case studies are very valuable for being done in real environments which cannot be simulated in experiments. Further information about case studies can be found in [31]. • Surveys are the third option for empirical inquiries. By letting people answer questionnaires, one can gather insights, get to know the target population, and even find very interesting problems to solve (e.g., [10, 30]). However, surveys or focus groups could be affected by sampling bias and are limited in what can be achieved with them. They are usually a very good starting point in finding or validating your research questions and hypotheses, which then can be followed up with case studies or experiments. Another empirical research stream, which is becoming more popular, is repository mining. A whole conference called IEEE International Working Conference on Mining Software Repositories is concerned with mining existing software repositories (http://www.msrconf.org/). However, it is not clear whether open repositories, e.g., those found on GitHub, necessarily resemble or can be considered representative of typical industry code repositories [13]. But not only public repositories face the risk of lacking generalizability so that the observed effects hold true or the achieved results can be applied within other domains; all empirical research methods share this limitation—even if the research was done perfectly [6]. Researchers need to be aware of typical industry practices and constraints in order to judge what can and what cannot be generalized into other contexts. In the domain of executable business processes, there can be different generalizabil-

6

D. Lübke and C. Pautasso

ity questions, e.g., Does notation understandability generalize between different stakeholder groups? Do findings generalize to different business process modeling languages? Are results applicable to other BPM tool suites? This list can continue, and every research project must be careful not to be overconfident and—as a result— wrongly overstate the generality of its results. Good empirical reports therefore state their threats to validity in which possible errors in the research design and limitations of their sources with regard to the data and its interpretation are discussed [23]. Empirical research allows to gain insight into how business users and IT developers are applying certain technologies (languages, notations, modeling tools, execution engines), which is important because both SE and BPM are not purely technical nor theoretical disciplines. Instead, they also consider the broader sociotechnical context in which projects are carried out. Human nature, and the degree of understanding, mastery, and experience with technologies, is likely to influence a project outcome. For example, the theoretical expressiveness of a business process modeling language, e.g., evaluated by using workflow patterns [29], is only half the equation when comparing different languages. The other half is how comprehensible they are by process modelers or process readers. The latter question has been subject to ongoing research, most of which is compiled by Figl [5]. Statistical methods are required to evaluate effects in empirical studies. However, this book will not introduce hypothesis testing, correlation, and other statistical concepts. There is a wide range of literature on this topic available. One practical introduction is written by Crawley [3].

1.3 Architectures for Process Execution Business processes together with software are part of any organization which claims being digital or undergoing a digital transformation. However, the alignment between those two is a critical property, which makes successfully supporting processes with software possible [19]. While in general software can support parts of business processes in isolation, an organization is more efficient if the software is integrated along the process flow, i.e., data is exchanged between systems so that (human) tasks can access and process it from all relevant software systems. Integration architecture is concerned with how systems communicate with each other and which technologies should be used within an organization [2]. Choices range (including old standards) from CORBA, SOAP, and REST to different messaging protocols [28]. These technologies build the technical foundation on which business contents are exchanged between the heterogeneous, autonomous, and distributed software systems, which can be internal (owned by the same organization) or external (owned by somebody else.). The main two architectural choices for organizing the control flow and data flow between systems are orchestration and choreography [1]. Orchestrations are centralized, hub-and-spoke, or star-like architectures in which a central orchestrator calls other systems (often via its service interfaces or API) and

1 Empirical Research in Executable Process Models

7

Service 1

Client

Process Orchestrator

Service 2

Service 3

Fig. 1.1 Orchestration: process models drive the behavior of the process orchestrator used to implement a composite service out of three services

waits for their answers, which in turn trigger the next actions until the process is completed (Fig. 1.1). Orchestrations can be programmed with standard executable business process languages like BPEL and BPMN. Such a programming-in-thelarge implementation has the advantage of a clear mapping of analytical business processes to executable ones and thus an easy way to measure key performance indicators when analyzing the behavior of running process instances. Alternatively, orchestrations can be manually implemented using general-purpose programming language, but this would require a significant effort to map the original businesslevel flow into code and then at run-time to reverse engineer (or mine) low-level event logs back into process models. With choreographies, all systems exchange messages directly as they make their (service) calls in a peer-to-peer-like manner without going through the centralized orchestration hub (Fig. 1.2). The advantage of this approach is that there is no central point of failure and the orchestrator does not become a scalability bottleneck. The rise of event-based architectures nowadays can be seen as a form of choreographies in which every service consumes and produces events to be asynchronously processed by other services. Thanks to events, systems are even more decoupled

8

D. Lübke and C. Pautasso

Fig. 1.2 Choreography: direct, peer-to-peer, event-driven service interaction

Service 1

Client

Service 2

Service 3

because a system emitting an event does not know which system or systems might respond to it [27]. This improved decoupling is achieved while running the risk of losing oversight about how a business process is executed by which system and which system is involved in which process, especially if messages do not carry along any form of correlation identifier. As opposed to hard-coding assumptions about a business process into the architecture of an integrated information system (e.g., the choreography approach), having an explicit description of the orchestration process has several advantages: 1. A process model represents the high-level procedural aspect of a business process. It can be used as documentation for business analysts and internal auditors while retaining a formal, executable semantics that can be automatically enforced by a workflow engine. 2. Explicit workflow models make it possible to directly track the progress of running processes and perform off-line analysis of the executions which can provide useful feedback for improving the performance of the processes and contribute to the efficiency of an organization. 3. In addition to information and data, also business processes, i.e., the connection of tasks in a value chain, are valuable assets of an organization. Thus, similar to database management systems, which are normally used for the safekeeping and management of an organization’s data, also process management tools should be used to model, analyze, and execute its business processes.

1 Empirical Research in Executable Process Models

9

1.4 Executable BPMN Process modeling languages go beyond classical architectural description languages as in addition to providing information on which components need to be assembled, they explicitly describe the behavior of the composition. To do so, processes use representations (e.g., control and data flow graphs [26], Petri nets [35, 37], XML document trees [15]) which are meant to be independent of the specific integration technology used to access individual services, thus abstracting away technical details which need to be provided if the model should become executable. With some standards, e.g., Web Services Business Process Execution Languages (WSBPEL), this independence was lost, making the process model (represented in XML) tightly coupled with the service invocation technology (also XML based). While initially this was expected to make it easier and cheaper to connect processes with services [25], as service APIs evolved toward more lightweight messaging formats such as JSON/YAML, the language needed to be heavily extended [17]. In this book, we include chapters featuring both processes modeled using BPEL and BPMN. These are both very rich and sophisticated languages [42], for which this book can only introduce the relevant features in order for readers to follow each chapter’s research questions and findings. For interested readers looking for a gentle introduction into each of those modeling languages, we recommend [38] for BPEL and [8] for BPMN. Following several standardization attempts for representing executable workflows such as XPDL and BPML, the Web Services Business Process Execution Languages (WS-BPEL [12]) was probably the first standard language emphasizing the fully executable aspect of process models represented using it. It was the result of merging Microsoft’s structured XLANG (the standard born out of the BizTalk integration tool) and IBM’s unstructured WSFL (derived from the FlowMark engine). However, the BPEL standard lacked a graphical notation as it was intended purely for technical users, as modelers were expected to describe processes using the XML syntax. A visual syntax was provided instead by BPMN 1.0, frequently used for the analytical models of the business level. These visual processes would be manually or semiautomatically translated to BPEL for execution [29]. BPMN 1.0 (standardized in May 2004)—defining only the visual syntax of the notation—evolved into BPMN 2.0 (released in January 2011 [11]). This was a significant progress with the inclusion of a token-based executable semantics as well as an XML serialization. The former would lead to a new generation of process engines which could directly (or indirectly, via model translation) execute BPMN; the latter would ensure the portability of models across graphical modeling tools. Other execution attributes, e.g., which service to use for a service task, which code to execute in a script task, and how to configure a human task, are specified by attributes to the corresponding BPMN model elements which are not visible in BPMN’s graphical notation. While BPMN 2.0 defines, e.g., how to specify which service to call in a service task, BPMN 2.0 does not define a standard technology mapping. While BPEL is clearly using WSDL descriptions for defining the offered

10

D. Lübke and C. Pautasso

and consumed services, BPMN 2.0 does not directly support any integration technology. Vendors have to define their mapping to WSDL/SOAP, HTTP/REST, and/or Java/JavaScript code and other options for providing the implementation of tasks as they see fit. While the BPMN 2.0 specification specifically mentions WS-HumanTask, most vendors also have opted to use other standards for human task management. Consequently, vendors and their products differ very much in this regard. Some are offering SOAP-based BPMN execution environments, which allow easy integration in large SOA-based enterprise architectures using services as building blocks for processes. Other vendors provide the opportunity for tasks to call into arbitrary Java code, which then can do anything the developers want it to, thus fulfilling the old vision of combining programming in the large (with executable process models) with programming in the small (with tasks implemented using traditional programming languages) [4].

References 1. G. Alonso, F. Casati, H. Kuno, V. Machiraju, Web Services: Concepts, Architectures and Applications (Springer, Berlin, 2004) 2. C. Bussler, B2B Integration: Concepts and Architecture (Springer, Berlin, 2003) 3. M.J. Crawley, Statistics: An Introduction Using R (Wiley, Hoboken, 2014) 4. F. DeRemer, H.H. Kron, Programming-in-the-large versus programming-in-the-small. IEEE Trans. Softw. Eng. SE-2(2), 80–86 (1976) 5. K. Figl, Comprehension of procedural visual business process models. Bus. Inf. Syst. Eng. 59(1), 41–67 (2017) 6. R. Fisher, Statistical methods and scientific induction. J. R. Stat. Soc. Ser. B Methodol. 17, 69–78 (1955) 7. H. Foster, S. Uchitel, J. Magee, J. Kramer, LTSA-WS: a tool for model-based verification of web service compositions and choreography, in Proceedings of the 28th International Conference on Software Engineering, ICSE ’06 (ACM, New York, 2006), pp. 771–774 8. J. Freund, B. Rücker, Real-Life BPMN: With Introductions to CMMN and DMN (CreateSpace, Scotts Valley, 2016) 9. C. Houy, P. Fettke, P. Loos, Empirical research in business process management–analysis of an emerging field of research. Bus. Process. Manag. J. 16(4), 619–661 (2010) 10. A. Ivanchikj, C. Pautasso, S. Schreier, Visual modeling of restful conversations with restalk. Softw. Syst. Model. 17(3), 1031–1051 (2018) 11. D. Jordan, J. Evdemon, Business Process Model and Notation (BPMN) Version 2.0 (Object Management Group, Inc., Needham, 2011). http://www.omg.org/spec/BPMN/2.0/ 12. D. Jordan, J. Evdemon et al., Web Services Business Process Execution Language (WS-BPEL) Version 2.0 OASIS standard, Burlington, April 2007, pp. 1–264 13. E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D.M. German, D. Damian, The promises and perils of mining github, in Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014 (ACM, New York, 2014), pp. 92–101 14. S. Kent, Model driven engineering, in International Conference on Integrated Formal Methods (Springer, Basel, 2002), pp. 286–298 15. R.K.L. Ko, S.S.G. Lee, E.W. Lee, Business process management (bpm) standards: a survey. Bus. Process. Manag. J. 15(5), 744–791 (2009) 16. J. Koehler, R. Hauser, J. Küster, K. Ryndina, J. vanhatalo, M. Wahler, The role of visual modeling and model transformations in business-driven development. Electron. Notes Theor. Comput. Sci. 211, 5–15 (2008)

1 Empirical Research in Executable Process Models

11

17. O. Kopp, K. Görlach, D. Karastoyanova, F. Leymann, M. Reiter, D. Schumm, M. Sonntag, S. Strauch, T. Unger, M. Wieland et al., A classification of bpel extensions. J. Syst. Integr. 2(4), 3–28 (2011) 18. M. Kunze, A. Luebbe, M. Weidlich, M. Weske, Towards understanding process modeling – the case of the BPM academic initiative, in Business Process Model and Notation (BPMN 2011), ed. by R. Dijkman, J. Hofstetter, J. Koehler, vol. 95 (Springer, Berlin, 2011), pp. 44–58 19. F. Leymann, Managing business processes via workflow technology, in Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), VLDB 2001 (2001), p. 729 20. F. Leymann, D. Roller, M.-T. Schmidt, Web services and business process management. IBM Syst. J. 41(2), 198–211 (2002) 21. Y.K. Malaiya, M.N. Li, J.M. Bieman, R. Karcich, Software reliability growth with test coverage. IEEE Trans. Reliab. 51(4), 420–426 (2002) 22. G.J. Myers, The Art of Software Testing (Wiley, Hoboken, 1979) 23. A.J. Onwuegbuzie, N.L. Leech, Validity and qualitative research: an oxymoron? Qual. Quant. 41(2), 233–249 (2007) 24. C. Ouyang, M. Dumas, W.M.P. van der Aalst, A.H.M. ter Hofstede, J. Mendling, From business process models to process-oriented software systems. ACM Trans. Softw. Eng. Methodol. 19(1), 2 (2009) 25. J. Pasley, How BPEL and SOA are changing web services development. IEEE Internet Comput. 9(3), 60–67 (2005) 26. C. Pautasso, G. Alonso, Visual composition of web services, in Proceedings of the 2003 IEEE Symposium on Human Centric Computing Languages and Environments (VL/HCC2003) (IEEE, Piscataway, 2003), pp. 92–99 27. C. Pautasso, E. Wilde, Why is the web loosely coupled?: a multi-faceted metric for service design, in Proceedings of the 18th International Conference on World Wide Web (ACM, New York, 2009), pp. 911–920 28. C. Pautasso, O. Zimmermann, The web as a software connector. IEEE Softw. 35(1), 93–98 (2018) 29. J. Recker, J. Mendling, On the translation between BPMN and BPEL: conceptual mismatch between process modeling languages, in Proceedings of Workshops and Doctoral Consortium of the 18th International Conference on Advanced Information Systems Engineering (CAISE) (2006), pp. 521–532 30. H.A. Reijers, J. Mendling, A study into the factors that influence the understandability of business process models. IEEE Trans. Syst. Man Cybern. Part A Syst. Humans 41(3), 449– 462 (2011) 31. P. Runeson, M. Höst, A. Rainer, B. Regnell, Case Study Research in Software Engineeering – Guidelines and Examples (Wiley, Hoboken, 2012) 32. K. Sarshar, P. Loos, Comparing the control-flow of epc and petri net from the end-user perspective, in Business Process Management, ed. by W.M.P. van der Aalst, B. Benatallah, F. Casati, F. Curbera (Springer, Berlin, 2005), pp. 434–439 33. H. Störrle, On the impact of layout quality to understanding uml diagrams: diagram type and expertise, in Proceedings of the 2012 IEEE Symposium on Visual Languages and HumanCentric Computing (VL/HCC) (IEEE, Piscataway, 2012), pp. 49–56 34. W.M.P. van der Aalst, Verification of workflow nets, in International Conference on Application and Theory of Petri Nets (Springer, Berlin, 1997), pp. 407–426 35. W.M.P. van der Aalst, The application of petri nets to workflow management. J. Circuits Syst. Comput. 8(1), 21–66 (1998) 36. W.M.P. van der Aalst, Formalization and verification of event-driven process chains. Inf. Softw. Technol. 41(10), 639–650 (1999) 37. W.M.P. van der Aalst et al., Three good reasons for using a petri-net-based workflow management system, in Proceedings of the International Working Conference on Information and Process Integration in Enterprises (IPIC’6). Citeseer (1996), pp. 179–201 38. S. Weerawarana, F. Curbera, F. Leymann, T. Storey, D.F. Ferguson, Web Services Platform Architecture: SOAP, WSDL, WS-Policy, WS-Addressing, WS-BPEL, WS-Reliable Messaging, and More (Prentice Hall, Upper Saddle River, 2005)

12

D. Lübke and C. Pautasso

39. R. Wettel, M. Lanza, R. Robbes, Software systems as cities: a controlled experiment, in 33rd International Conference on Software Engineering (ICSE), 2011 (IEEE, Piscataway, 2011), pp. 551–560 40. C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, A. Wesslén, Experimentation in Software Engineering (Springer, Berlin, 2012) 41. L. Zhang, J.-H. Tian, J. Jiang, Y.-J. Liu, M.-Y. Pu, T. Yue, Empirical research in software engineering—a literature survey. J. Comput. Sci. Technol. 33(5), 876–899 (2018) 42. M. zur Muehlen, J. Recker, How much language is enough? Theoretical and practical use of the business process modeling notation, in Proceedings of the 20th International Conference on Advanced Information Systems Engineering (CAiSE 2008), CAiSE 2008 (Springer, Berlin, 2008), pp. 465–479

Chapter 2

A Template for Categorizing Business Processes in Empirical Research Daniel Lübke, Ana Ivanchikj, and Cesare Pautasso

Abstract Empirical research is becoming increasingly important for understanding the practical uses of and problems with business processes technology in the field. However, no standardization on how to report observations and findings exists. This sometimes leads to research outcomes which report partial or incomplete data and makes published results of replicated studies on different data sets hard to compare. In order to help the research community improve reporting on business process models and collections and their characteristics, this chapter defines a modular template with the aim of reports’ standardization, which could also facilitate the creation of shared business process repositories to foster further empirical research in the future. The template has been positively evaluated by representatives from both BPM research and industry. The survey feedback has been incorporated in the template. We have applied the template to describe a real-world executable WSBPEL process collection, measured from a static and dynamic perspective.

2.1 Introduction Empirical research in the field of business process management follows the increasingly wide adoption of business process modeling practices and business process execution technologies [9, 18]. The validation of theoretical research, the transfer between academia and industry, and the quest for new research perspectives are all supported by empirical research, e.g., experiments, case studies, and surveys.

This chapter was originally published as part of the BPM Forum 2017 [12]. D. Lübke () FG Software Engineering, Leibniz Universität Hannover, Hannover, Germany e-mail: [email protected] A. Ivanchikj · C. Pautasso Software Institute, Faculty of Informatics, USI, Lugano, Switzerland e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2019 D. Lübke, C. Pautasso (eds.), Empirical Studies on the Development of Executable Business Processes, https://doi.org/10.1007/978-3-030-17666-2_2

13

14

D. Lübke et al.

The goal of empirical research is to find repeatable results, i.e., observations that can be replicated, thus providing results that can be combined and built upon. The more data points are available, the higher the significance of a study. One way to increase the number of data points is to perform meta-studies that combine results from multiple researchers (e.g., [15]). While this is common in other disciplines, such as ecology or medicine, business process-related data is usually not published in a comparable nor reusable way. Additionally, the access to industry data is often restricted due to confidentiality requirements. Thus, publication of data sets must be done in an aggregated and/or anonymized manner. To improve the reporting of empirical research concerning business processes, we propose a template that can be used to characterize processes in terms of their metadata and (if applicable) their static and dynamic properties, without revealing confidential details. For example, business process models are used for different modeling purposes such as discussion, analysis, simulation, or execution. Processes are modeled using different languages (e.g., BPMN, BPEL, EPC). Process models also vary in terms of their size and structural complexity, which can be determined depending on the actual modeling language used to represent them. The goal of the proposed template is to (a) give readers the opportunity to “get a feeling” of a process (collection) and (b) allow researchers to build on top of existing research by ensuring the presence of metadata with well-defined semantics. Since, to the best of our knowledge, no such classification exists, in this chapter we make an initial top-down proposal, intended as a starting point for extending and refining the template together with the research community. In order to improve the reporting of research related to business process model collections (e.g., [6, 21] as a starting point), we propose a set of metadata described in tabular form. The metadata template can be extended with other tables. For such extensions, we initially propose static metrics for BPEL processes and some dynamic metrics, although further extensions for other modeling languages are welcomed. We validate the metadata template by a survey gathering the feedback of academic and industry professionals. Additionally, we apply the template in an industry case study to describe a large process collection. The remainder of this chapter is structured as follows: In Sect. 2.2, we motivate the need for such template, which we describe in Sect. 2.3. Section 2.4 depicts how we validated the template with a survey and a case study. Section 2.5 presents related work before concluding in Sect. 2.6.

2.2 Motivation Models describing business processes contain sensitive information, making it difficult for companies to reveal how they use standard languages and tools and rendering it challenging for empirical researchers to further improve the state of the

2 A Template for Categorizing Business Processes in Empirical Research

15

art. As one of our survey respondents emphasized, much of the “research stops at the toy example level.” It is possible to anonymize process models, thereby limiting the understandability of what the processes do and hiding their purposes and sources. Anonymized processes retain their entire control and data flow structure (which would be available for static analysis) while losing important metadata (which would limit the types of analyses that can be performed). For example, Hertis and Juric published a large study with a set of over 1000 BPEL processes [8]. However, they stated that they “were unable to classify the processes into application domains since plain BPEL processes do not contain required information.” This shows that researchers have to be aware when collecting the processes that they also need to collect associated metadata. Thus, whether or not a complete or anonymized process model is present, it is necessary to accompany it with a given set of metadata. The metadata has to be carefully selected and placed in a template to ensure that readers and other researchers can get an overall understanding of the discussed processes. Such a template needs to support the following goals: 1. Help researchers to collect data about processes that is relevant to others. 2. Help researchers to publish meaningful results by knowing which properties of the business processes can be anonymized and which should not. 3. Help researchers to report the important properties of business processes in their publications, so that their audience has sufficient details to evaluate the quality of the reported research. 4. Foster empirical research about business processes so that a body of knowledge can be accumulated based upon multiple, comparable works. 5. Enable meta-studies that combine, aggregate, and detect trends over existing and future empirical research about practical use of business processes.

2.3 Template Business process models can be created in many languages and can serve many purposes. Thus, it makes sense to report only values that have been actually measured in the specific usage context and are related to the conducted research. The templates are defined in a tabular format with a key/value presentation in order to allow quick digestion and comparison of reports. We understand that research publications need to present their results in a compact form. When space does not allow to use the tabular format, the tabular templates can be published together with the data, e.g., in technical reports and research data repositories. The template we propose is built in a modular fashion. It consists of a required metadata template that describes general, technology-independent properties of the process. The metadata part can be extended by standardized templates for reporting different properties that have been analyzed. Researchers should reuse existing

16

D. Lübke et al.

templates as much as possible in order to provide results that can be compared to previous works. For instance, in this chapter, two additional templates for executable BPEL processes are presented. The list of static and dynamic metrics proposed in the additional templates is not exhaustive and can be extended depending on the research needs. BPEL was chosen for convenience, as the case study in Sect. 2.4.2 uses BPEL processes. Support for other languages can be easily defined in additional templates.

2.3.1 Metadata Template The metadata template, as shown in Table 2.1, is the only required part. It is designed to be applicable to any process model regardless of the modeling language used. This template contains the basic information necessary to obtain general understanding about a process model and the most important properties that can be of interest to filter and classify such process model. Its content has been updated with the feedback received during the survey described in Sect. 2.4.1. The following is a more detailed description of the categories and the classes included in the table: Process name: The process name as used in the organization. If the real name cannot be published, this field can be anonymized by providing an ID that can be used to reference the process from the text. Version: If available, the name can be augmented with process versioning metadata. Domain: The business domain which this process is taken from. Existing ontologies like [7] can be used. Geography: The geographical location where the process is used. Time: The time period the process data refers to.

Table 2.1 The metadata template for describing business process models Process name Version Domain Geography Time Boundaries Relationship Scope Process model purpose People involvement Process language Execution engine Model maturity

Name or anonymous identifier of the process Process version (if available) Business domain of the process Location of the processes Period of data collection Cross-organizational/intraorganizational/within department Calls another/is being called/no call/event triggered Business scope: core, auxiliary, or technical scope Descriptive/simulation/execution None/partly/no automation For example, WS-BPEL 2.0/EPC/BPMN1/BPMN2 / . . . Engine used for running the process model if the model is executable Illustrative/reference/prototypical/reviewed/productive/retired

2 A Template for Categorizing Business Processes in Empirical Research

17

Boundaries: The organizational scope of the process: cross-organizational for processes that span across multiple legal entities, intraorganizational for processes that are conducted within one legal entity but across different departments/units in it, and within department for processes that are narrowed to a single organizational unit within one legal entity. Relationship: The structural dependencies of the process with other processes: calls another, is being called, no call, and event triggered. Scope: The process model can have a horizontal, business scope or a technical scope. In the business scope, we can distinguish between end-to-end processes for fully end-to-end descriptions like order to cash and auxiliary processes for processes that do not contribute directly to the business purpose. Processes can have a pure technical scope instead, e.g., an event handling process that propagates permissions in the infrastructure. Process model purpose: The purpose of a process model can be description, simulation, or execution. A descriptive process is a model from a business point of view, which is more abstract in order to facilitate discussion and analysis among stakeholders and also to prescribe how operations are carried out in an organization; a simulative process contains further details regarding resources, costs, duration, frequency, etc.; and an executable process contains sufficient details to enable the automation of the process. Because a model can serve multiple purposes, this field is a list. The main purpose should be the first item in this list. People involvement: Classification of how much manual/human work is to be done. Ranges from none (fully automated) to partly to no automation (people involvement in each task). Process language: The process language used to create the process model. If a standard process language, such as BPEL, BPMN, etc., has been extended, that should be specified in the metadata. Execution engine: The execution engine(s) used to run the process model (if executable), including the exact version, if available. Model maturity: Illustrative for models which are not intended for industry use but to showcase certain modeling situations for educational purposes, reference for generic models which prescribe best practices and are used as starting point for creating other types of models, prototypical for models that are under discussion or are technical prototypes, reviewed for models that have been reviewed but are not yet in productive use, productive for models that are used productively in a real-world organization with or without systems to enact them automatically, and retired for models which had been productive previously but have been replaced with other models. The metadata template is the main template that describes process characteristics regardless of the context and used technologies. In order to report details, additional templates should be used which often need to be language specific. Within this chapter, we define additional templates that describe different viewpoints of business processes, especially for those modeled in executable WS-BPEL.

18

D. Lübke et al.

2.3.2 BPEL Element and Activity Count Template One of the interesting properties of processes are the various “size” metrics, with “size” being defined by Mendling [14] as “often related to the number of nodes N of the process model.” Since every process language provides different ways to express nodes and arcs for defining the control flow, such template must be process language specific. Thus, in this paper, we define the template for measuring the size of BPEL processes by using activity and element counts, since BPEL is used in the case study that is presented in Sect. 2.4.2. The template for reporting BPEL element counts is shown in the case study in Table 2.3. The values are merely the counts of different BPEL constructs as defined by the WS-BPEL 2.0 standard [10]. In addition, the total count of basic activities and structured activities is given because these are often used to judge the size of a process model. In the literature, they are also called number of activities (NOA) and number of activities complex (NOAC) [5]. In addition to activities, this table also contains the number of links, number of different sub-activity constructs (e.g., pick branches, if branches), and the number of partner links (service partners). To distinguish between the different BPEL constructs, basic activities are marked with a (B), and structured activities are marked with an (S) in Table 2.3.

2.3.3 BPEL Extensions Template Although BPEL is a standardized language, it offers support for extensions. These extension points are used to extend the BPEL standard, e.g., the standardized extension BPEL4People to support human tasks or to enable vendors to offer unique features that distinguish their products from their competitors’. BPEL defines a general facility to register extensions globally and the extension activity that can contain activities that are not defined in the core standard or to use additional query and expression languages that are referenced by a nonstandard URI. In contrast to [16], we think that the use of extensions is common. Also, the case study has shown a high use of both vendor-specific and standardized extensions. When reporting on BPEL processes, researchers can use the template as shown in the case study in Table 2.4 that contains all declared extensions in the BPEL process and the extension activities used together with their activity counts.

2.3.4 Process Runtime Performance Template For executable processes, it becomes possible to report their runtime performance. While a large number of metrics have been proposed (e.g., [19]), for space reasons, in this chapter, we propose to focus on reporting the number of process instances

2 A Template for Categorizing Business Processes in Empirical Research

19

and their duration. These metrics can be described for each process instance or aggregated among multiple instances. Counting the total number of process instances for a given process model gives an idea of its usage frequency relative to other process models. Capturing the performance of individual process instances amounts to measuring their execution time (T (finish)−T (start)). Since the execution time of every process instance is usually not of interest, we suggest to give statistical information about the distribution of the process instance duration for all process instances of a given process model as shown in Table 2.5.

2.4 Validation To validate the usefulness of the proposed templates, we combine an exploratory survey with researchers and industry experts (Sect. 2.4.1) and a case study of realworld BPEL business processes (Sect. 2.4.2).

2.4.1 Survey with Researchers and Industry Experts To validate whether the proposed template fulfills the goals presented in Sect. 2.2, we have conducted an exploratory survey [20, Chap. 2].1 The intention of this survey was not statistical inference of the results, but rather getting a deeper understanding of the surveyed field. We targeted audience from both academia and industry, i.e., both producers and consumers of empirical research. Thus, we used different social media channels and private connections to disseminate the survey. Survey Design We organized the questions in five sections: Background, Metadata Template, Template Remarks, Template Extensions, and Empirical Research in BPM. While the Background questions were mandatory to enable further classification in the analysis of the results, the remaining questions were optional to incentivize greater survey participation. In the Metadata Template section, we showed the metadata presented in Table 2.1 and asked the respondents to rate the importance of each of the proposed metadata classes. In the Template Remarks section, we focused on the perceived need of standardized reporting and asked suggestions for the appropriateness and completeness of the proposed process classification and metadata. In the Template Extensions section, we inquired about the relevance of reporting structure and performance metrics on process level, as well as on the usefulness of using the metadata and metrics for describing entire

1 The

questionnaire is available at http://benchflow.inf.usi.ch/bpm-forum-2017/.

20

D. Lübke et al.

collections of process models. Last but not least, in the Empirical Research in BPM section, we asked for personal opinions on the state of the empirical BPM research. Survey Sample Since we were not aiming at inferring statistical conclusions from the conducted survey, we closed the survey as soon as we considered the obtained feedback sufficient for improving the proposed templates. This has resulted in 24 respondents with diversified background. To obtain more insights into respondents’ professional background, they could select multiple options between experience in academia (further divided into IT or business process management) and in industry (further divided into IT or business). While most of the respondents, i.e., 46%, have experience only in academia, 21% have experience only in industry and 33% in both academia and industry. Most of them, i.e., 88%, have IT background (16 respondents in academia and 12 in industry), and 63% have been dealing with the business perspective of process management (12 respondents in academia and 3 in industry). Respondents participate in different phases of the business process life cycle and/or simply conduct empirical research in BPM. When asked what type of experience they have with business processes, the majority, i.e., 83%, marked analyzing, while 79% marked defining, 75% implementing, and 29% researching. These results could already indicate some lack of empirical research in this area. All the respondents have more than 1 year of experience in working with business processes with 50% having up to 5 years and other 33% over 10 years of experience. Figure 2.1 shows the years of experience vs. the business process life cycle experience of the survey participants. It is noticeable that people with longer experience have been more exposed to different phases of the business process life cycle. Survey Results We have presented the metadata and process classifications as shown in Sect. 2.3.1 to the respondents, which in addition included the Modeling Tool category that we removed from the updated table as per respondents’ feedback. We asked them to evaluate each proposed category on a scale of 1 (not important) to 5 (very important). As per the average score, the Process Model Purpose was considered the most important with 4.38 points to be followed by People Involvement with 4.13 points. As mentioned previously, the Modeling Tool was considered as the least valuable with 3.17 points together with the Execution Engine

Fig. 2.1 Survey respondents: years of experience vs. expertise in business process areas

2 A Template for Categorizing Business Processes in Empirical Research

21

Fig. 2.2 Process metadata template validation (mean importance)

with 3.38 points. Indeed in an ideal world, where the standards are correctly implemented, these two categories would not add to the understanding of the process model. In Fig. 2.2, we stratify the importance rating of each proposed category per sector (industry, academia, or both). It is interesting to notice that, even if those having experience only in industry allocate less importance to the metadata on average, similar importance trends are evident between the different sectors. If stratified per years of experience, the highest ratings are provided by respondents with 1–2 years of experience to be followed by those with over 10 years of experience. Encouraging ratings were also obtained on the helpfulness of the standardized reporting approach for “getting a feeling” about the studied process (4.08 points on average) and for comparing different empirical reports (4.29 points on average). Based on the feedback on missing metadata, we have added the Version, Geography, Time, and Relationship categories to Table 2.1 as well as the Reference and Retired classes in the Model Maturity category. In the next section of the survey, we focused on the extended tables presented in Sects. 2.3.2 and 2.3.4. Always on the same scale of 1–5, the respondents found the presentation of structure metrics and performance metrics sufficiently relevant, with average points of 3.40 and 3.57, respectively. We were curious to see whether priorities and interests change when using the metadata and extended data presented in Sect. 2.3 on a collection of business processes. Thus, we asked respondents to rate them. While on process level, as mentioned earlier, Process Model Purpose and People Involvement were considered the most important, at collection level the Aggregated Structure Metrics (4.11 points) and the Domain (3.84 points) were considered the most important. As on process level, on collection level as well, the least important remained the Modeling Tool (3.11 points) and the Execution Engine (2.68 points). As for the processes, also with the collections, the responses followed similar trends among different sectoral experiences (academia, industry, or both) evident from Fig. 2.3, with industry always providing lower average scores than academia, while people with experience in both sectors tending to have opinions more aligned with academia. The greatest differences in opinions between industry and academia refer to the Model Maturity and Process Name where average academia’s

22

D. Lübke et al.

Fig. 2.3 Process collection metadata template validation (mean importance)

importance rating is around 4, while industry’s importance rating is around 3 on process level and 2 on collection level. Significant differences in opinion are also noticed on collection level regarding the importance of the Structure Metrics which is rated at 2.5 by industry, 3.9 by academia, and 4.9 by respondents with experience in both sectors. However, when aggregating among the importance ratings of all proposed metadata and extended data categories, the opinions are relatively positive with an average of 3.77 out of 5 points for data on process level and 3.53 out of 5 points for data on collection level. We asked for additional properties that respondents would like to have in the template. Two recommendations, the connectedness of the model and a link to a process map, were made. However, connectedness is hard to define without requiring a special modeling language, while without standardized process maps, we think that the links are not helpful. Last but not least, when asked whether they consider the existing empirical research in business process management (surveys, experiments, case studies) sufficient, out of the 16 respondents to this question, only 4 answered positively.

2.4.2 Case Study with Industry Processes We use the Terravis project as a case study for reporting process metadata and metrics in a standardized fashion. Terravis [2] is a large-scale process integration project in Switzerland that coordinates between land registries, banks, notaries, and other business processes concerning mortgages. In contrast to previous reporting of metrics [11], in this chapter, we apply our template and all additional templates as defined in this chapter. The research questions addressed by this case study are the following: • Can the template be applied without problems? Especially are all category values clearly defined and applicable? • Can all categories be measured? Which measurements can be automated? • Is the categorization in the metadata template beneficial when evaluating the process metrics?

2 A Template for Categorizing Business Processes in Empirical Research

23

Table 2.2 The metadata template for a Terravis process Process name Version Domain Geography Time Boundaries Relationship Scope Process model purpose People involvement Process language Execution engine Model maturity

Transfer register mortgage certificate to trustee 26.0 Land registry transactions Switzerland 08-30-2016 Cross-organizational Calls another/is being called Core Executable None WS-BPEL 2.0 plus vendor extensions Informatica ActiveVOS 9.2 Productive

The set contains 62 executable BPEL models that are executed on ActiveVOS 9.2. We acquired a total of 918 versions of the process models and information for 435,093 process instances executed in Switzerland in the period between 2012 and 2016. To apply the templates, we conducted the following steps: • For each process, we assigned a value to each category of the general metadata template, automating the assignment where possible. • We automatically measured the static metrics for the models. • We validated the People Involvement assignment by cross-checking the value of the count of human activities in the static metrics. • We automatically collected the used BPEL extensions. • We calculated the runtime metrics from the process logs. In the first step, we manually classified each process as per our metadata template. In the People Involvement category, we initially chose to offer more finegrained values (partly, mostly). However, it was impossible to find a meaningful and objective threshold for these values. Thus, we opted to offer only one intermediate value, i.e., partly. To showcase the application of the metadata template, the metadata of one process model is shown in Table 2.2. Many static metrics, e.g., the static element counts [3, 13], have been proposed; and some tools have been developed for calculating them [1, 8]. However, to our knowledge, no working tool is freely available to calculate element counts and extract extension information from BPEL process models. Thus, we have built an open-source implementation2 to automatically calculate the data for the BPEL element and activity count template (Tables 2.3 and 2.4).

2 Available

at https://github.com/dluebke/bpelstats.

24

D. Lübke et al.

Table 2.3 BPEL element and activity counts for a Terravis process Transfer register mortgage certificate to trustee (version 26.0) BPEL element Count BPEL element Assign (B) 79 OnAlarm (pick) Catch 4 OnAlarm (handler) CatchAll 2 OnMessage (pick) Compensate (B) 0 OnEvent (handler) Compensate scope 0 Partner link Compensation handler 0 Pick (S) Else 13 Receive (B) Else if 3 Repeat until (S) Empty (B) 42 Reply (B) Exit (B) 9 Rethrow (B) Extension activity 1 Scope Flow (S) 1 Sequence (S) ForEach (S) 4 Throw (B) If (S) 13 Validate (B) Invoke (B) 37 Wait (B) Link 2 Derived metrics Basic activities (B) 198 Structured activities (S)

Count 0 0 6 0 15 3 13 0 18 0 74 90 0 0 0

185

Table 2.4 BPEL extensions for a Terravis process Extensions

Activities

http://www.activebpel.org/2006/09/bpel/extension/activity http://www.activebpel.org/2009/06/bpel/extension/links http://www.activebpel.org/2006/09/bpel/extension/query_handling http://www.activebpel.org/2009/02/bpel/extension/ignorable http://www.omg.org/spec/BPMN/20100524/DI Type Count ActiveVOS continue 1 Total 1

To calculate the runtime metrics, the process logs were extracted and processed automatically. However, not all executable processes were configured with persistence and logging enabled. Thus, for some models, we could not calculate any runtime metrics. Process instance runtime metrics are shown in Table 2.5. After successfully applying the templates to all process models, an aggregation over the whole collection can be made. The results are shown in templated form in Table 2.6 with information on the percentage of models belonging to each class. If the categorization in the metadata template is meaningful, there should be no overlapping between classes in the same category, and preferably each class should have some processes which pertain to it. We grouped the static metrics and process duration metrics of the latest version of every process model according

2 A Template for Categorizing Business Processes in Empirical Research Table 2.5 Template for capturing runtime performance metrics of process instances

25

Transfer register mortgage certificate to trustee (version 26.0) Number of process instances 13 Execution time (min) 00 h:00 min:01 s Execution time (med) 02 h:33 min:00 s Execution time (mean) 12 h:34 min:39 s Execution time (max) 64 h:24 min:14 s Execution time (total) 163 h:30 min:32 s

Table 2.6 Aggregated metadata for the Terravis process collection Collection name Process count Domain Geography Time Boundaries Relationship

Scope Process model purpose People involvement Process language Execution engine Model maturity

Terravis 62 models with 918 versions Land registry transactions Switzerland 03-09-2012–08-30-2016 Cross-organizational 37%, intraorganizational 13%, within system 50% Is being called 31%, calls another 26% is being called/calls another 8%, event triggered 24% no call 11% Technical 52%, core 39%, auxiliary 10% Executable None 79%, partly 21% WS-BPEL 2.0 plus vendor extensions Informatica ActiveVOS 9.2 51 productive, 11 retired models 51 productive, 867 retired model versions

to the different categories and their classes. The results are shown in Table 2.7. As can be seen, the distribution of the number of process models in the classes is different than the distribution of the number of activities. For example, only 37% of the process models describe cross-organizational processes, but they contain 71% of the activities. This means that on average the cross-organizational models are larger than those in the different classes of the Boundaries category and the withinsystem processes are the smallest on average. The distribution of the number of process instances and the distribution of the accumulated process duration among all executed process instances also differ. Only 14% of the process instances are cross-organizational but account for 68% of the overall process time spent. This means that cross-organizational and intraorganizational processes on average take longer to complete than within-system processes. Also, technical process models have a very different distribution. The results support the classification categories because based on these values different characteristics of the processes in this collection are exhibited.

26

D. Lübke et al.

Table 2.7 Distribution of Terravis process models and instances by category Total Boundaries Cross-organizational Intraorganizational Within system Relationship Is being called Calls another Is being called, calls another Event triggered No call Scope Technical Core Auxiliary People involvement None Partly Model maturity Production Retired a Only

#Models 62

#Activities 10,132

#Instancesa 86,035

#Duration 2,238,583 h

37% 13% 50%

71% 19% 10%

14% 8% 78%

68% 32% 0.1%

31% 26% 8% 24% 11%

22% 55% 12% 3% 9%

19% 62% 2% 15% 2%

71% 9% 20% 0% 1%

52% 39% 10%

10% 85% 5%

85% 13% 2%

0.2% 99% 1%

79% 21%

66% 34%

86% 14%

10% 90%

82% 18%

84% 16%

100% 0.2%

96% 4%

for latest process model version

2.5 Related Work The extensions to the metadata template (Sects. 2.3.2–2.3.4) are language specific, and their aim is emphasizing the need of including structure and performance metrics while not trying to be exhaustive in the list of metrics. Defining such metrics is out of the scope of this chapter and has already been addressed in existing work [4, 5, 14, 19]. The main goal of this chapter is standardizing the metadata on process level and/or collection level. Thus, the related work we survey in this section refers to current availability and definition of such metadata. The need of extracting knowledge from business processes has been identified in literature and has led to the creation of business process repositories. Yan et al. [21] propose a repository management model as a list of functionalities that can be provided by such repositories and survey which of them are offered by existing repositories. Since what they propose is a framework, they emphasize the need of metadata for indexing the processes, but do not define which metadata should accompany each process. They have found that only 5 out of 16 repositories use a classification scheme based on part-whole and generalization-specialization relations. Vanhatalo et al. [17] built a repository for storing BPEL processes with the

2 A Template for Categorizing Business Processes in Empirical Research

27

related metadata, which in their usage scenario referred to the number of activities, degree of concurrency, execution duration, and correctness. Their flexible repository architecture could be used to store the templates proposed in our paper. The MIT Process Handbook Project focuses on classifying the process activities and on knowledge sharing.3 We focus on standardization of the reporting of such acquired knowledge. The BPM Academic Initiative [6] is a popular process repository offering an open process analysis platform, aimed at fostering empirical research on multiple process collections. The metadata required when importing processes refers to the process title, the collection it belongs to, the process file format, and the modeling language. Even though the data to be stored is not restricted only to these fields, no further standardization of the process classification is offered. In their survey on empirical research in BPM, Houy et al. [9] define a meta-perspective, a contentbased and a methodological perspective for classifying the surveyed articles. Their content-based perspective refers to context (industry or public) and orientation (technological, organizational, or interorganizational). The standard metadata we propose can offer a richer classification for meta-studies like [9, 15] and more indepth analysis performed using platforms like [6].

2.6 Conclusions and Future Work Empirical research in BPM helps to close the feedback loop between theory and practice, enabling the shift from assumptions to facts and fostering real-world evaluation of so far untested theories. While the process mining research has benefited from the availability of large event log collections, the same cannot be claimed concerning process model collections [6]. As process models clearly represent trade secrets for the companies using them productively, in this chapter, we have proposed a language-independent template for describing them by focusing on key properties (classification metadata, size, and instance duration) which are useful for empirical analysis by the academic research community without revealing proprietary information. The template has been validated with an exploratory survey among 24 experts from industry and academia, who have positively commented on the choice of properties (no negative score was reported) and also made constructive suggestions that have already been incorporated in the template described in this chapter. We have also demonstrated the applicability of the template in an industrial case study by using it to report on the Terravis collection of 62 BPEL processes and a subset of their 435,093 process instances executed across multiple Swiss financial and governmental institutions in the period between 2012 and 2016. While the metadata template presented in this chapter is language independent, the extensions concerning static metrics are BPEL specific. Therefore, we plan to

3 http://process.mit.edu/Info/Contents.asp.

28

D. Lübke et al.

work on similar templates for other modeling languages in the future. Additionally, we plan to collaborate with modeling tool vendors to enable the automated collection of the metadata described in this chapter. The long-term plan is to grow the amount of available and well-classified process models to the empirical BPM community. One way to increase the number of classified processes is to autoclassify existing model collections. Future work will elaborate which properties can be inferred from existing data. Most of the respondents of our survey said that there is not enough empirical research in the field of BPM. We hope that more empirical research will be conducted and that the metadata presented in this chapter will help researchers to improve the classifications of data collections and make them easier to compare and reuse across different publications. Acknowledgements The authors would like to thank all of the participants in the survey for their time and valuable feedback.

References 1. E. Alemneh et al., A static analysis tool for BPEL source codes. Int. J. Comput. Sci. Mob. Comput. 3(2), 659–665 (2014) 2. W. Berli, D. Lübke, W. Möckli, Terravis – large scale business process integration between public and private partners, in Lecture Notes in Informatics (LNI), vol. P-232, ed. by E. Plödereder, L. Grunske, E. Schneider, D. Ull (Gesellschaft für Informatik e.V., Bonn, 2014), pp. 1075–1090 3. J. Cardoso, Complexity analysis of BPEL web processes. Softw. Process Improv. Pract. J., 12, 35–49 (2006) 4. J. Cardoso, Business process control-flow complexity: metric, evaluation, and validation. Int. J. Web Serv. Res. 5(2), 49–76 (2008) 5. J. Cardoso, J. Mendling, G. Neumann, H.A Reijers, A discourse on complexity of process models, in International Conference on Business Process Management (Springer, Berlin, 2006), pp. 117–128 6. R.-H. Eid-Sabbagh, M. Kunze, A. Meyer, M. Weske, A platform for research on process model collections, in International Workshop on Business Process Modeling Notation (Springer, Berlin, 2012), pp. 8–22 7. Executive Office of the President – Office of Management and Budget, North American Industry Classification System (2017) 8. M. Hertis, M.B. Juric, An empirical analysis of business process execution language usage. IEEE Trans. Softw. Eng. 40(08), 738–757 (2014) 9. C. Houy, P. Fettke, P. Loos, Empirical research in business process management-analysis of an emerging field of research. Bus. Process Manag. J. 16(4), 619–661 (2010) 10. D. Jordan et al., Web Services Business Process Execution Language Version 2.0. OASIS (2007) 11. D. Lübke, Using metric time lines for identifying architecture shortcomings in process execution architectures, in 2015 IEEE/ACM 2nd International Workshop on Software Architecture and Metrics (SAM) (IEEE, Florence, 2015), pp. 55–58 12. D. Lübke, A. Ivanchikj, C. Pautasso, A template for categorizing business processes in empirical research, in Proceedings of the Business Process Management Forum (BPM 2017), vol. 297, ed. by J. Carmona, G. Engels, A. Kumar. LNBIP (Springer, Cham, 2017), pp. 36–52

2 A Template for Categorizing Business Processes in Empirical Research

29

13. C. Mao, Control and data complexity metrics for web service compositions, in Proceedings of the 10th International Conference on Quality Software (2010) 14. J. Mendling, Metrics for Process Models: Empirical Foundations of Verification, Error Prediction, and Guidelines for Correctness, 1st edn. (Springer, Berlin, 2008) 15. J. Mendling, Empirical studies in process model verification, in Transactions on Petri Nets and Other Models of Concurrency II (Springer, Berlin, 2009), pp. 208–224 16. M. Skouradaki, D. Roller, C. Pautasso, F. Leymann, “bpelanon”: anonymizing bpel processes, in ZEUS (2014), pp. 1–7. Citeseer 17. J. Vanhatalo, J. Koehler, F. Leymann, Repository for business processes and arbitrary associated metadata, in Proceedings of the Demo Session of the 4th International Conference on Business Process Management (2006) 18. B. Weber, B. Mutschler, M. Reichert, Investigating the effort of using business process management technology: results from a controlled experiment. Sci. Comput. Program. 75(5), 292–310 (2010) 19. B. Wetzstein, S. Strauch, F. Leymann, Measuring performance metrics of WS-BPEL service compositions, in Proceedings of ICNS (2009), pp. 49–56 20. C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, A. Wesslén, Experimentation in Software Engineering (Springer, Berlin, 2012) 21. Z. Yan, R. Dijkman, P. Grefen, Business process model repositories–framework and survey. Inf. Softw. Technol. 54(4), 380–395 (2012)

Part II

Solution Architecture

Chapter 3

Effectively and Efficiently Implementing Complex Business Processes: A Case Study Volker Stiehl, Marcus Danei, Juliet Elliott, Matthias Heiler, and Torsten Kerwien

Abstract The implementation of business processes has been neglected for many years in research. It seemed to be that only hard coding was the appropriate solution for business process implementations. As a consequence in classical literature about business process management (BPM), the focus was mainly on the management aspects of BPM, less on aspects regarding an effective and efficient implementation methodology. This has changed significantly since the advent of BPMN 2.0 (Business Process Model and Notation) in early 2011. BPMN is a graphical notation for modeling business processes in an easy to understand manner. Because the BPMN standard had the process execution in mind when it was designed, it allows for a new way of implementing business processes, on which the process-driven approach (PDA) is based. This approach has been applied in a huge project at SAP SE since 2015 comprising more than 200 business-critical processes. In order to get an impression about the power of the process-driven approach for really complex business process implementation scenarios, this chapter explains the basics about the process-driven approach and shares experiences made during the execution of the project.

V. Stiehl () Faculty of Electrical Engineering and Computer Science, Technische Hochschule Ingolstadt (THI), Ingolstadt, Germany e-mail: [email protected] M. Danei · J. Elliott · M. Heiler SAP SE, Walldorf, Germany e-mail: [email protected]; [email protected]; [email protected] T. Kerwien itelligence AG, Bielefeld, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2019 D. Lübke, C. Pautasso (eds.), Empirical Studies on the Development of Executable Business Processes, https://doi.org/10.1007/978-3-030-17666-2_3

33

34

V. Stiehl et al.

3.1 Introduction Business process management (BPM) in general has been explored over many years covering a variety of topics such as strategic BPM, process organization, process planning, process controlling, process evaluation, risk management for processes, process performance analysis, process optimization, process mining, and change management when introducing BPM in organizations. These areas are well researched, and many improvements have been achieved for all of these topics over time. However, in the authors’ view, one area in this whole process universe seems somewhat neglected by comparison: the model-driven implementation of complex business processes. There have been several standards in the past like BPEL which tried to give answers to the topic. However, due to a range of issues, from missing standardized notations to an almost exclusively technical focus, companies were not able on a large scale to use them to implement software to meet the needs of complex, real-life scenarios. Precisely this implementation of complex business processes is what companies need to do when they want to address differentiating business processes which cannot be covered by standard processes delivered by standard software (e.g., SAP S/4HANA) due to their uniqueness for a company. But what options do we have at our disposal for implementing differentiating business processes? For many years, the only option seemed to be hard coding the processes using an appropriate development environment of choice, e.g., environments based on widespread programming languages such as Java/C#/JavaScript or proprietary environments like SAP’s development environment based on the ABAP programming language. Experience has shown though that this approach has some weaknesses. Issues that companies have experienced include the following: • • • •

Development speed and ease of maintenance. Making changes is usually cumbersome. Transparency in running or finished process instances is not innately given. Changes in market conditions can require extensive recoding.

Since differentiating business processes are a key factor in gaining or keeping a competitive advantage, we can see that finding more effective ways to address these issues could be vital to a company’s success. Maintaining that competitive advantage is more crucial than ever before, given the ever-increasing pace of change brought on by the pressure to innovate as a result of global digitalization. If companies miss new trends, they might be out of business very soon. In this dynamic environment, the need to address the challenges arising out of the business/IT alignment problem becomes ever more acute: in most cases, process experts in operating departments work out the to-be business processes using graphical notations such as EPC (Event-Driven Process Chain) or BPMN (Business Process Model and Notation). As part of the software specification, process models are exchanged with the developers who have to implement the new solution based on the process models. Experience at several companies has shown though that changes made to the models during implementation mean the original

3 Effectively and Efficiently Implementing Complex Business Processes

35

models are outdated by the time the implementation is complete and they are rarely updated to reflect the reality of the implementation. This rather limits the usefulness of the models, and consequently they become “shelf-ware.” This is highly frustrating considering the effort which has been spent on these models. In order to gain transparency into running/finished processes, companies often then invest in additional software for process mining and process analysis. Process mining, for example, helps to determine which paths the finished processes followed during their execution. It sounds illogical that additional software is necessary to derive a process model out of the logged data, although originally the processes were implemented using process models. Experience therefore shows that this approach has its shortcomings, and it is valid to search for alternative approaches which address all or at least some of the mentioned limitations—ideally without introducing new limitations at the same time. With the introduction of BPMN, we now have new options at our disposal, especially with version 2.0 of the BPMN specification because it explicitly contains execution semantics for business process engines which can now execute BPMNbased process models. The approach of developing business processes using a standardized notation and running the models on a process engine would seem to offer some potential for addressing the limitations mentioned. Model-based development is not in itself new, but the development of software based on models is seen critically by experts due to the fact that models get quite complicated and unmanageable when it comes to complex real-life business scenarios. One method of addressing this challenge is introduced in the book Process-Driven Applications with BPMN [4] published in 2014. It introduces a holistic approach for implementing complex real-life business processes based on BPMN models. The presented solution is named the “process-driven approach (PDA)” and describes precisely what needs to be done to successfully implement differentiating core business processes. The process-driven approach comprises the following: • A collaboration model between business and IT called “BizDevs” to overcome the business/IT alignment problem (see Sect. 3.2.2) • A new way of thinking about BPMN-based business process implementations (process-driven thinking; see Sect. 3.2.3) • A new methodology for business process implementation projects (processdriven methodology; see Sect. 3.2.4) • A specific software architecture recommendation for process-driven applications (process-driven architecture) and a suggested development approach (processdriven development; see Sect. 3.2.5) • A recommendation for a technology stack which supports process-driven applications best (process-driven technology; see Sect. 3.2.6) The feasibility of the approach in theory was proven in the book itself. Using a small example, the basic architectural and development details were explained. However, what remained open was the applicability of the approach for real-life scenarios.

36

V. Stiehl et al.

The arguments in favor of the process-driven approach led SAP Language Services to decide to use this approach for a major project to fulfill their core business requirement: providing services for translations (in 40 languages) of a variety of items for SAP products, such as user interfaces, business reports, marketing materials, videos, and handbooks. The requirements for running the business processes supporting these services were so unique that no standard offthe-shelf translation management software could meet them. So the SAP Language Services team decided, after an intensive evaluation phase, to build their differentiating business processes following the PDA methodology. As part of this chapter, we will describe the project and will summarize the experiences made with the process-driven approach applied to a real-life scenario. It addresses in particular the following questions: • Is it possible to use the model-based approach to build applications that fulfill complex, real-life business needs? What needs to be done to achieve this goal? • Is it possible to preserve BPMN process models developed in operating departments following the BizDevs collaboration model during implementation? • One of the main attributes of PDA is the separation of business process and technical artifacts. Is it possible to keep the obvious technical and business complexities under control if the process-driven approach is applied? What needs to be considered? • Which benefits do companies gain by applying the process-driven approach, and which of the aforementioned shortcomings are being addressed by it? • Finally, how does the BizDevs collaboration model contribute to overcoming the business/IT alignment problem? The remainder of this chapter is structured as follows. In Sect. 3.2, we explain briefly the ideas behind the process-driven approach. Section 3.3 describes the project at SAP SE in more detail, and Sect. 3.4 summarizes the results for researchers as well as for practitioners and gives an outlook on further research topics.

3.2 The Process-Driven Approach The spark for the process-driven approach came from the release of the BPMN 2.0 specification in January 2011[2]. For the first time, execution semantics were defined for a graphical process notation by a standards organization (OMG—Object Management Group). This created a clear definition of how a process should behave if executed by a process engine that complied with the BPMN 2.0 standard. Software vendors immediately started implementing the new process modeling standard, providing process engines that executed BPMN process diagrams. This was a big step forward and laid the foundation on which process-driven applications could prosper. The question was: How can this idea of running BPMN-based models using an engine be transferred into real-life projects? The research carried out for the book

3 Effectively and Efficiently Implementing Complex Business Processes

37

on process-driven applications established that several aspects are required, which must work hand in hand for it to be successful. Although it is impossible to repeat all the details of the process-driven approach described in the book, in the forthcoming paragraphs, we will summarize the main ideas. More details can be found in [4].

3.2.1 Definition of a Process-Driven Application The definition of a process-driven application is as follows [4, p.19]: Process-driven applications are business-oriented applications that support differentiating end-to-end business processes spanning functional, system, and organizational boundaries by reusing data and functionality from platforms and applications. The definition stresses already the importance of business requirements and process logic as the main driver for all decisions that need to be made while developing the application. The process-driven application is the result of applying the process-driven approach. If we take a closer look at business processes, we can distinguish between standard business processes and unique, companyspecific, differentiating business processes. Standard processes are well covered by standard products, and it doesn’t make too much sense for companies to implement these themselves. However, companies cannot do much to differentiate themselves from the competition by using standard processes, so the next question we have to answer is this: How can companies quickly and sustainably build, run, and monitor differentiating business processes? This is exactly where the process-driven approach fits into the picture by providing an effective and efficient implementation methodology. Key criteria for a process-driven application are independence from the IT landscape and process flexibility in regard to changing market conditions and competition. These criteria will be mainly supported by the process-driven architecture which will be discussed in Sect. 3.2.5.

3.2.2 Process-Driven Collaboration (Biz Devs) The main idea behind process-driven collaboration is overcoming the alignment problem between business and IT, with both sharing common responsibility for one BPMN model right from the beginning of a project. The traditional development process was very much dictated by business folks handing over software specifications which had to be implemented by their IT colleagues. Because of the potential misunderstandings caused by software specifications formulated using mainly prose, the results of the implementations rarely fulfilled the original requirements immediately. The typical ping-pong game between business and IT started, consisting of implementation (by developers) and review phases (by

38

V. Stiehl et al.

business colleagues) until the final result was eventually reached. This “procedure” is time-consuming, error prone, and highly frustrating for both parties. The process-driven approach targets those shortcomings, changing the collaboration between business and IT by stipulating that a well-defined notation (BPMN) must be used to depict the process logic precisely. In addition, the modeling of the business processes is done together right from the start of an implementation project. Both sides enter into a partnership of equals. Because both sides work together on one BPMN model, chances are very high that the implementation immediately fits the expectations, and that increases development productivity. This raises the question of whether a roundtrip of one BPMN model between the business and IT teams is possible or not. However, BPMN as the common language between business and IT allows work on new levels. The new collaboration model is based on collaborative work on one BPMN model, which is then executed, as it is, by a BPMN engine. The BizDevs collaboration model simply does not permit changes to the BPMN process model just to make it executable. Although this may sound challenging to achieve in practice, the goal can be reached if organizations are willing to follow the new collaboration model, where both sides are equally responsible for one BPMN model and where the focus is on the preservation of this model throughout the transition to execution. This preservation of one BPMN model is also supported by the process-driven architecture which will be discussed in more detail in Sect. 3.2.5. The responsibility for the executed processes is now extended to the business side, so there can be no more finger-pointing between the two camps. For this new kind of collaboration, the term BizDevs has been coined to describe the collaboration between business and development. The term is influenced by the term “DevOps,” which describes the collaboration between development and operations. BizDevs means that business people become an integral part of the process development cycle—a new accountability that the business folks have to get used to. In addition to defining how the process should ideally run, it is also important from the start to define exceptions—what should happen if an expected outcome is not reached. For example, if it is critical that a process participant responds within a certain time frame, what should happen if they fail to do so? Another example could be a technical error that prevents the process from moving on. Here again, the value of business-IT collaboration becomes obvious. Without this collaboration model, the implementation of BPMN-based process models becomes questionable at best. Hence, BizDevs is an indispensable prerequisite for successful PDA implementations.

3.2.3 Process-Driven Thinking BPMN is not just another modeling notation for business processes. Unfortunately, many authors of books about business process management see it this way: they only describe BPMN alongside other modeling notations, reducing the comparison between them to just the different shapes supported by the notations. Process-driven

3 Effectively and Efficiently Implementing Complex Business Processes

39

thinking uses the full shape set of the BPMN palette. For a thorough understanding of BPMN, it is crucial to consider the semantics of all shapes in the palette in order to apply them correctly in process models that can then be correctly interpreted by BPMN process engines at runtime. BPMN process engines in the end implement the semantics described in the BPMN specification. So thinking in “process engines” is a new challenge for modelers, business people, and developers, who have to design for execution right from the start. Process models need a new level of precision as engines require detailed information to make models executable. Because of this precision, there is no room for misunderstandings or ambiguities left. To increase this level of precision, modeling guidelines such as the ones described in Bruce Silver’s book [3] are highly recommended. Together, modeling guidelines and the awareness that process engines rely on precise process models result in high-quality process models which can be understood from the diagrams alone. Another aspect of process-driven thinking puts the business processes in the center of gravity. Every decision to be made during a project’s lifetime always asks for the business requirements first. It is also at the heart of our next section, the process-driven methodology.

3.2.4 Process-Driven Methodology Because of the importance of the business processes, one central question is: How should a process-driven project be started? Should we start with an actual analysis of the current (process) situation and derive the to-be processes from there (bottomup)? Or should we start with the new to-be processes right away, without considering the current situation at all (top-down)? The answer for the process-driven approach is pretty clear: it’s the second option. The problem with the bottom-up approach is the following: you will most probably spend a lot of time and money on the documentation of processes that you already know don’t work satisfactorily and for very little benefit. If you try to improve the current process, you are working on symptoms, not on an overall process improvement that takes advantage of the latest technology options. Starting with the to-be processes gives you the freedom to innovate, both in terms of the business logic itself and of harnessing technical innovations. One core rule of the process-driven methodology is not to let yourself be restricted by the current process implementation or by other technical or organizational constraints, such as the existing IT landscape, external systems, partners, suppliers or customers. The key question for decision-making in the processdriven methodology is always: What does the business logic require? From this point of view, it is possible to derive the business objects (e.g., a purchase order, an account, an employee) and their properties, the required services and their interfaces, user interfaces, decision rules, events, process steps, etc.—everything that is necessary to make a business process model executable. Applying the process-driven methodology sounds easy at first, but is sometimes hard to follow

40

V. Stiehl et al.

because people tend to always think about their IT landscapes and the restrictions they imply. The clear recommendation is not to think too much about IT landscapes and systems because they are changing anyway, especially in times like these where the trend to cloud-based systems is increasing, where mergers and acquisitions happen, all contributing to an even more fragmented IT landscape. You simply cannot afford to depend on such a brittle foundation. It is better to abstract from specific systems and stay independent from them. Following this approach allows much shorter time to market cycles from concept to implementation. Remember that it is one of the major goals of a process-driven application to be as independent as possible from a company’s IT landscape, and the process-driven methodology contributes to that goal. It is further strengthened by the process-driven architecture which will be discussed next.

3.2.5 Process-Driven Architecture and Process-Driven Development In order to fulfill the promises of independence and flexibility for the processdriven application as described in Sects. 3.2.2 and 3.2.3 as well as the promise of preserving a BPMN model throughout the transition from the original model to execution, an architectural blueprint is required: the process-driven architecture. An architectural blueprint is needed because the usage of BPMN alone neither ensures a successful development project nor a sophisticated architecture for the resulting applications. The problems known from normal programming apply for BPMNbased developments as well and can best be explained using an example which is taken from [4, pp. 67–74]. Compare Fig. 3.1, the result of the traditional approach, with Fig. 3.2 which uses the process-driven architecture. You can see from the model how the core processes that are so critical to the success of the company (the upper process in Fig. 3.2) are not obscured by the technical details because these are put into a separate layer: the service contract implementation layer (SCIL). The valuable business process stays intact and, most importantly, remains under the control of the business department. However, the process can be easily adapted for use in other regions; you simply need to adapt the service contract implementation layer (the lower BPMN model in Fig. 3.2). Of course, this adaptation does involve some effort, but applying this approach will be of benefit in the long term, as it releases you from the complex web of connections between back-end systems. Business processes (e.g., the upper BPMN model in Fig. 3.2) and technical processes (e.g., the lower BPMN model in Fig. 3.2) can be developed and modified independently, but remain connected by the service contract (e.g., the message flow between the two BPMN models). This architecture also helps you to keep the business processes in their original form as conceived by the business departments. The key question in an implementation project will be to determine exactly which activities belong in which layer, in other words, which

41

Fig. 3.1 Order process after processing by developer

3 Effectively and Efficiently Implementing Complex Business Processes

42

V. Stiehl et al.

Fig. 3.2 Order process after separation of layers

activities are really part of the core differentiating process and which are supporting activities. Essentially, this determination will be made each time through businessIT collaboration, with the business side in the lead. Key criteria are as follows: • Does the business see this activity as critical to the business process? Is it necessary, and does it add value from a business perspective? • Can process participants easily understand the activity? The basic idea presented above was refined, and this resulted in the reference architecture for process-driven applications depicted in Fig. 3.3. The PDA layer comprises the business processes and everything needed to make the processes executable, e.g., local persistency for the business objects the processes work on, the user interfaces for BPMN user tasks, business rules for BPMN business rules tasks, events, etc. Communication with the outside world is handled by the service contract layer. The interfaces described there (the fields and the data types needed for the technical implementation) just consider the needs from the view of the business processes and are defined in both directions: from the business processes to the external world and vice versa. The data types being used for the interface’s descriptions

3 Effectively and Efficiently Implementing Complex Business Processes

43

Fig. 3.3 Reference architecture for process-driven applications

are identical to the ones being used within the business process itself, avoiding mappings between different data types. The service contract layer is an abstraction from the specific back-end systems and shields the process-driven application from the IT landscape with its proprietary data types, interfaces, and technologies. A process-driven application never connects to one of the back-end systems directly. This is a typical pitfall in many BPMN models. They contain direct connections to back-end systems, and therefore a change in the system landscape makes a change in the process models necessary. The abstraction of the business model from the IT landscape is no longer given and makes adapting the model to new requirements unnecessarily complicated. The actual implementation for each service contract is summarized in the service contract implementation layer (SCIL) which, for sure, looks different for each IT landscape the business processes should run on. It takes over the integration part of a process-driven application. As can been seen from Fig. 3.3, the SCIL differentiates between stateful and stateless integration. Stateful integration means the handling of wait states during integration. This is, for example, the case if an aggregation of several messages is necessary before finally sending the collection to a target system, e.g., a combined bank transfer. Stateful integration still relies on a harmonized data type system as it is used in the business process and the service

44

V. Stiehl et al.

contract. However, stateful integration is not necessary for every service requirement from the business process. That’s why a third option is shown on the right of Fig. 3.3. The individual data type systems being used in the diverse applications are only relevant when connecting to specific back-end systems. Hence, mapping between the harmonized data type system being used so far and the proprietary data type systems being used in the back-end systems is only necessary in the stateless integration part of the SCIL. Therefore, routing and mapping are the main tasks of the SCIL’s stateless integration part. The SCIL layer in Fig. 3.3 depicts three implementation alternatives for the integration: 1. On the left: Stateful integration is handled by a BPMN process, and the stateless integration is covered by specialized integration software. More and more companies are using BPMN for integration purposes as well, especially for stateful integration which can nicely be modeled using BPMN. Transparency is again the key argument in favor of using BPMN for stateful integrations because the BPMN process engines collect all data needed for monitoring the integration processes during runtime. This significantly simplifies operations. However, the usage of BPMN for stateful integrations is only recommended if the engines fulfill the performance requirements. For stateful integrations with millions of messages in short time periods (high-frequency scenarios), the recommendation is to use specialized integration software, leading to the alternative in the middle of Fig. 3.3. 2. In the middle: Both parts, stateful and stateless integration, are handled by specialized integration software. This is recommended for high-performance scenarios where an optimized integration engine is capable of managing the load. 3. On the right: As outlined above, a stateful integration part is not necessary in all cases. A simple transfer of a message (including routing) to the right target system(s) and mapping between data types is all that is required in this scenario. This is best covered by specialized integration software. It is not recommended to use BPMN engines for these use cases as BPMN software is optimized for executing business processes and not for integration. Even though vendors of BPMN engines claim to integrate with many systems out of BPMN processes, it is definitely not recommended to use this functionality. BPMN engines cannot connect to as many systems as specialized integration software can, and they (BPMN engines) are also not optimized for executing complex mappings between data types. Leave those tasks to optimized integration software. We can conclude that a process-driven architecture relies on the “separation of concerns” principle allowing for a maximum of parallelism during development which increases development efficiency. This is the key principle of processdriven development. We gain process flexibility because we can easily adapt the BPMN process models in the PDA layer to changing market conditions and new competitors as the BPMN models are not polluted with technical integration flows. Hence, they are less complex and easier to maintain. The adaptability to changing IT landscapes is ensured by the service contract together with the service contract implementation layer. If there is a change in interfaces or systems, this can be

3 Effectively and Efficiently Implementing Complex Business Processes

45

adjusted locally using the specialized integration software. And finally, we preserved the BPMN model during development—exactly what we wanted to achieve.

3.2.6 Process-Driven Technologies In order to implement a full-fledged process-driven application, it is recommended to use the following technologies: 1. BPMN engines for the execution of the business processes as well as the stateful integration part of the SCIL. As was outlined in Sect. 3.2.5, the usage of BPMN engines for integration purposes is only recommended if the performance requirements for handling the message volume are met. 2. Business rules engines (or decision management systems (DMSs) as they are also known) complement process engines. BPMN engines concentrate on the execution of process logic, whereas a DMS executes decision logic and/or calculations. 3. Enterprise service bus for integrations—both stateful (e.g., aggregator pattern) and stateless (e.g., routing/mapping) integrations. 4. Although not discussed in detail in this chapter, event stream processing (ESP) software is recommended for new IoT (Internet of Things) scenarios with a multitude of sensors sending signals about, e.g., temperature, pressure, humidity, etc. which need to be filtered and analyzed for business-relevant information. The ESP solution is responsible for signaling business-critical events to the business processes. They are typically not directly connected with BPMN-based processes in the PDA layer of Fig. 3.3; instead, they send the business events to the SCIL which is then in charge of handing them over to the responsible processes. This is mentioned here for the sake of completeness—while not directly relevant to this case study, it gives an indication of potential further use cases for process-driven applications. This setup ensures a very flexible environment for process-driven applications which is prepared for fast adaptability to changing conditions for a long time to come. In this section, we’ve covered the basic ideas of the process-driven approach and proposed theoretical answers to the following questions raised at the beginning of this chapter: • Is it possible to preserve BPMN process models developed in operating departments following the BizDevs collaboration model during implementation? • Is it possible to keep the obvious technical and business complexities under control if the process-driven approach is applied? What needs to be considered? • Which benefits do companies gain by applying the process-driven approach, and which of the aforementioned shortcomings are being addressed by it?

46

V. Stiehl et al.

3.3 Implementation Project at SAP Language Services Using the Process-Driven Approach In Sect. 3.2, we’ve described the main ideas behind the process-driven approach in reasonable detail because it is the foundation for the ongoing project at SAP Language Services. All the aspects discussed in Sect. 3.2 were completely applied during this project. We will now continue with a closer look at the situation at SAP Language Services before the project and how it was improved using the processdriven approach. The remaining part of this section is based on an article by Matthias Heiler, which was first published in the January-February-March 2016 issue of SAPinsider [1]. It was updated with latest numbers and slightly enhanced. The SAP Language Services (SLS) department provides translation services (in 40 languages) for a variety of items for SAP products, such as user interfaces, business reports, marketing materials, videos, handbooks, and documentation. SAP Language Services collaborates with several translation agencies across the world and coordinates more than 2800 native speakers in order to achieve high-quality translations, even taking into account the local culture of the respective country for which a translation is needed. Just to give you an impression about the volume that needs to be translated, in 2016 more than 700 million words were translated (one Harry Potter book contains roughly one million words). The business requirements for running the processes supporting these services were so unique that no standard off-the-shelf translation management software could deliver what SAP needed. There are two main aspects that make this process so unique and differentiating: Firstly, the ability to simultaneously ship localized versions of software products and features in a high number of languages is a key competitive advantage for SAP. Secondly, in order to meet this goal and maintain that advantage over time and in changing market conditions, SAP Language Services has developed a range of approaches and processes that are fairly unique in the localization industry. So the SAP Language Services team decided after an intensive evaluation phase to build their differentiating business processes following the PDA methodology and to run them using an SAP product called SAP Process Orchestration. The first step was to design an overall framework for the business processes. There were several factors to consider in building the business process framework. The services the SLS team has to deliver depend on specific translation scenarios. The process inputs can vary widely, as can the requirements of the process outputs, and the process must be able to produce the required result from these different inputs. There can also be variations within one project. For example, very high linguistic quality is required for the Japanese version—so in this case, machine translation will need to be reviewed by language experts. The Italian version however is only needed for test purposes; the quality need not be perfect, but it is needed much sooner—so here, just using machine translation without expert review would meet the goal better. The source text might be in German, so for Japanese it would make sense to produce an English version first and then translate into Japanese from English, as this will considerably lower the cost of translation into Japanese

3 Effectively and Efficiently Implementing Complex Business Processes

47

(German-Japanese translators are much rarer and consequently significantly more expensive). In addition, every translation project must consider different types of text sources (there are a wide variety of formats and system types that need to be processed, such as typical software file formats, ABAP systems, Microsoft Office files, video files, etc.) as well as different types of texts, such as user interfaces for software, marketing materials, texts for internal communication, and even official financial statements. As a result, each translation process must factor in those requirements by variants in their execution—this was a key influencing factor in the design of the business process framework. In addition, the primary goals of the project included achieving a high degree of automation to improve operational efficiency and allowing for flexible adjustments of the services to accommodate new or changed requirements. This difficult constellation, consisting of a multitude of different translation requirements, an overly complex IT landscape with several hundred systems to be integrated, and inefficient process implementations with many redundant manual tasks, caused the valuable and highly skilled people at SAP Language Services to spend the majority of their time just to keep the processes alive and running (“keeping the lights on”). Their capacity was obviously not available for innovations (compare Fig. 3.4, left side). Therefore, one of the key goals of the project is to relieve the team from timeconsuming, inefficient, redundant tasks and give them more room for business innovations in new language technologies such as neural and statistical machine translation and other natural language processing technologies (compare Fig. 3.4, right side). Optimizing the IT landscape by consolidating systems, high reuse of linguistic assets, and, last but not least, process automation are the main measures that support this goal. During many workshops, a list of requirements for the new solution was collaboratively worked out and consisted of the following three main items: • Best practices, which have been established over many years through cooperation with partners and customers, have to be considered in the new application as

Business innovation Business innovation

“Keeping the lights on” “Keeping the lights on”

Fig. 3.4 Key goal of the SLS project: freeing capacity for business innovations

48

V. Stiehl et al.

well. It requires the right balance between standardization and flexibility without compromising high service levels. • Agile and sustainable process adaptations must be possible (e.g., adding new translation technologies to a process), even on short notice. • Process documentation must always be up-to-date and must not be a separate step in a project’s life cycle: process documentation must correspond 1:1 to the running processes. The goal is, on the one hand, to minimize effort and, on the other hand, to facilitate the exchange of best practices. With so much complexity, easily understandable process models were essential to the project. Hence, it was decided to implement the business processes using the process-driven approach including the BizDevs collaboration model. Figure 3.5 shows, for example, the result of the collaboration between business and IT in a BPMN model that could be created using any business-friendly BPMN modeler, and Fig. 3.6 shows the resulting executable BPMN model in SAP Process Orchestration. Note that the two models are identical—exactly what we wanted to achieve using the process-driven approach. The process-driven approach is generic and independent from specific tools and environments, so it works with any BPMN-based modeling tool. The PDA methodology, with its uncluttered and collaborative approach to process modeling, is an ideal fit for the SAP Language Services project, enabling business users, BPMN specialists, developers, and user interface designers to discuss process logic, business functionality, user interfaces, and services very precisely. The project took the approach of educating business users both in BPMN 2.0 and the PDA methodology. As a result, the business users quickly became BPMN specialists in their own right, capable of using the full BPMN palette. The only restrictions on shape sets used were those imposed by the process engine itself, where the palette was not completely implemented. For those cases, the business users were able to use the implemented shapes to achieve the same outcome, but of course a full implementation would eliminate the need for such workarounds. As a result, the executable process is truly business driven; and thanks to the early, intensive involvement of key users, acceptance of the model is very high. Communication between the IT and business units is standardized through BPMN and is highly efficient because it virtually eliminates the risk of misunderstandings and decreases the time between concept and implementation. In fact, although this project required a standardized approach to handling multiple different language technologies, the creation and implementation of 65 process models following the PDA approach was achieved within 9 months. Compared to traditional methodologies for implementing processes using programming languages such as ABAP and Java, the PDA approach has an implementation time savings of roughly 75% (status in January 2016). One additional time accelerator in the project was to start directly with the design of to-be processes instead of struggling with legacy as-is processes. In theory, it would have been possible, following the bottom-up approach, to first create process models to reflect the as-is processes and then to use those as the basis for improvement. The team rejected this approach, as it was seen as very

Fig. 3.5 The BPMN-based process model from the business perspective

3 Effectively and Efficiently Implementing Complex Business Processes

49

Fig. 3.6 The executable BPMN model in SAP Process Orchestration is identical to the structure of the business process model which was defined before

50 V. Stiehl et al.

3 Effectively and Efficiently Implementing Complex Business Processes

51

effort intensive with little to no benefit. Since the business experts were directly involved, they were already perfectly aware of the shortcomings of the existing processes, even without well-defined models or in-depth analysis. The consensus was also that spending time and effort to create those models would have a negative impact on their ability to define to-be processes and that they would risk carrying over undesirable elements and patterns from the as-is processes for the sake of expediency. Therefore, the approach selected was to start with the to-be processes and then to use those as the basis for further optimizations. The team found that they also faced the fairly common difficulty of abstracting from the given infrastructure and translation tools. From the first iteration, while it was relatively easy to define generic processes that could be used for all translation types for aspects such as project management, it was significantly harder to do so for the parts dealing with the actual translation itself in all the different tools. In fact for these processes, the first iteration did not succeed in completely separating the business process from the systems used. However, after gaining some experience in the practical application of the methodology, the team was able to achieve this goal, so that now changes can be made in either the business or the technical layer, without impacting the other layer. For example, it is possible to add or replace translation tools without changing the business process. Where changes need to be made that impact both layers (e.g., a new business activity is added and requires a new system), these changes can be implemented in parallel, increasing development efficiency. The project is still ongoing and has evolved since then. The latest numbers after a total implementation time of 33 months are impressive (December 2017): 206 really complex nontrivial business processes, 169 integration processes (SCIL implementations), and 126 user interfaces speak for themselves. Process execution times have also been noticeably reduced: for example, the execution time for the end-to-end order process for marketing materials was reduced to one-third of the original execution time. As microservice architectures have become more common, SLS has also seen additional benefits to the PDA approach. On the one hand, it is easy and efficient to integrate new microservices into the process as they become available. On the other hand, the process-driven approach provides a highly effective framework for orchestrating diverse microservices to create business value in a range of different scenarios. Overall, SLS achieved the following: • • • • •

Improved efficiency in process execution Improved user experience Higher automation rate Increased flexibility and adaptability Increased transparency in operational business Besides that, two more goals have been reached:

1. SLS took a major step toward active process management, where business and IT work closely together and can adjust their processes more quickly and consistently. 2. This in turn gives the team opportunities to expand the services offered and to develop and even commercialize business models relevant to the digital era.

52

V. Stiehl et al.

Table 3.1 Aggregated metadata for the SLS process collection Collection name Process count Domain Geography Time Boundaries Relationship Scope Process model purpose People involvement Process language Execution engine Model maturity

SLS 206 models, up to 3 versions/model Managing of translation projects Worldwide 03-2015–12-2017 (ongoing) Cross-organizational 26%, intraorganizational 24%, within department 50% Is being called/calls another 100% Core 206, technical 169 Executable None 45%, partly 55% BPMN 2.0 SAP Process Orchestration 7.5 206 productive

Table 3.1 gives an overview of the implemented processes using the process collection template for categorizing business processes described in Chap. 2. It should be noted that the process version count does not reflect the number or frequency of changes to the models. A new version is only created when this is technically necessary, e.g., in case of interface changes. Managing version compatibility can be a challenge—in addition to keeping the number of versions low, the team has developed mechanisms to automatically upgrade running process instances to the latest version on a new feature release (e.g., at key points, the process checks if it is running in the latest available version; if not, it cancels itself and restarts at the same process point in the latest version, with the same data). In terms of assessing maintainability or sustainability, the number of process instances is perhaps the more telling figure. For business processes alone, there have been a total of almost three million instances, with on average around 16,500 instances running at any given time. The figures for technical processes are significantly higher. Application support is facilitated by a dedicated process that provides support staff with relevant error data in the form of a human task in case of technical or business errors. One team role has dedicated responsibility for operations/maintenance, mainly in terms of oversight on tickets/tasks; remaining operations activities are carried out by all team roles on the fly—a BizDevOps model. One further insight that the team has gained is that it seems much easier to manage this kind of implementation project using agile methodologies. A new team was set up for the project, consisting of business and IT specialists from SAP Language Services, supplemented with PDA and BPM experts from consulting partner itelligence AG. Initially, a hybrid development management approach was selected; however, as the team has matured and gained experience, this has evolved over time to an adapted version of agile development methodologies. While some elements (e.g., teams of ten) are not practical for our purposes, clearly defined user stories, sprints, and feature-based deliveries have proven valuable. The team

3 Effectively and Efficiently Implementing Complex Business Processes

53

has found it helpful to describe how they work in detail using a process model. Overall, requirements from stakeholders are continuously added to the concept backlog. After prioritization, they are grouped into user stories that form coherent units of business value. Part of the concept work includes carefully examining dependencies (both business and technical) between the user stories—failure to do this early on can block deliveries of features that are in themselves complete, but cannot be deployed separately from parallel developments. In addition to creating process models and UI prototypes, the concept team (consisting of business and IT specialists) also prepares backlog items for implementation. Developers attach effort estimates to the backlog items; and these, together with the known dependencies, are the basis for development sprint planning. This approach has provided a great deal of flexibility in terms of delivering features as soon as they are ready, as well as providing greater transparency for all team members around the status. Delivery frequency is weekly, with larger updates reaching production on average approximately every 2 months. In summary, therefore, we find that we have now been able to answer our five questions: • Complex real-life scenarios can be completely covered using a model-based approach. • BPMN models developed by business departments using the BizDevs collaboration model can be preserved 1:1 during implementation. • The process-driven approach provides an effective methodology for mastering complexity, both business and technical. • The benefits are listed above. • The BizDevs collaboration model has proved to be a vital tool in addressing alignment issues, and the benefits proposed in theory are observed in practice.

3.4 Conclusions and Outlook 3.4.1 Conclusions for Researchers and Practitioners This chapter of the book has outlined the fundamentals of the process-driven approach (PDA). As result of applying the process-driven approach, you get process-driven applications. These are defined as business-oriented applications that support differentiating end-to-end business processes spanning functional, system, and organizational boundaries by reusing data and functionality from platforms and applications. We can summarize the key aspects of the process-driven approach as follows: • Process-driven collaboration between business and IT (BizDevs) • Process-driven thinking that considers shape semantics and the process engine while modeling

54

V. Stiehl et al.

• Process-driven methodology that develops process models top-down without considering restrictions—no analysis of the current process implementations • Process-driven architecture including a reference architecture for process-driven applications • Process-driven development that rigorously applies the “separation of concerns” principle to achieve a maximum of parallelism during development • Process-driven technologies comprising a BPMN engine, business rules engine (or decision management system), integration software such as an ESB, and ESP software for scenarios relying on events The approach is being applied in a complex project at SAP SE. SAP Language Services (SLS), part of the Globalization Services department at SAP, has to solve the challenge of standardizing their differentiating end-to-end language production processes while retaining broad flexibility to meet a wide range of changing requirements. So far, a total of 206 complex processes have been implemented within 33 months. The advantages gained to date by the application of the processdriven approach for this project can be summarized as follows: • Time – Shorter development time due to parallel independent development – Shorter innovation cycle and faster time to market – Shorter strategy-to-reality cycle • Money – No additional documentation necessary (modeled process = documented process = executed process) – Cost benefits during development and maintenance – No need to buy additional software for process mining or business activity monitoring if the process engine collects comparable data and provides relevant analytical tooling (depends on the engine used) • Higher-quality implementation output (more precise, gets it right the first time) • Increased flexibility on both sides: business process flexibility and flexibility regarding the integration of various IT landscapes • Increased implementation efficiency as the first implementation is immediately fitting requirements due to early end user involvement resulting in an increased acceptance • Transparency – Increased transparency during process execution – Increased transparency by analyzing automatically collected process execution data (via BPMN execution engine)

3 Effectively and Efficiently Implementing Complex Business Processes

55

• Ability to act: PDA offering the best-possible management support in driving a company’s strategy It is advisable to use the process-driven approach in the following cases: • Alignment of business and implementation requirements in a single BPMN model is important (only one common BPMN model for both sides, business and IT). • Independence from the system landscape is critical for the resulting application. • More than one system needs to be integrated. • The system landscape on which the processes of the solution must run is not stable. • The solution is complex and justifies the effort involved. • The solution will provide a competitive advantage. • The processes in the solution are expected to change frequently. • The processes in the solution will be used in other organizational units, areas, regions, or other subsidiaries or companies. However, if none of these statements apply to a development project, it is certainly worthwhile to consider alternatives. The application of the process-driven approach has proven (at least for the SLS project) the following: • Real-life, complex business processes can be completely modeled and executed using a graphical notation (BPMN). • BPMN-modeled business processes can really be executed as they were initially planned by the business (preservation of the business BPMN model during implementation). • Business and technical complexities can be controlled using the right methodology and just one notation (BPMN). • The BizDevs collaboration model achieves unprecedented efficiency and eliminates misunderstandings. (Thinking in process engines executing business processes forces a new level of precision as it requires that all details have to be made explicit. As a result, companies understand much better how they really work). • The BizDevs collaboration model requires a thorough understanding of the complete BPMN shape set on both sides—both business and IT. Experience at SLS has shown that while this does require a learning effort especially on the business side, this investment more than pays off in terms of the results. The process-driven project has been invaluable in providing practical experience of the kind of lifelong learning that is fundamental to success in the digital era.

3.4.2 Outlook The process-driven approach is still in its early stages. However, the results achieved in a first really complex real-life project are more than promising. It seems as if implementation efficiency can be significantly increased compared to common

56

V. Stiehl et al.

programming approaches. Additionally, the process-driven approach is not only a solution for the first implementation. Due to its modular design, it also helps to reduce the maintenance effort after going productive. The transparencies gained during process execution and after finalization are further key arguments in favor of the approach. For sure, the results have to be confirmed in more projects of this complexity, and both aspects need to be analyzed in more detail: the initial development effort/efficiency and the maintenance effort/efficiency. Besides the mentioned aspects which are worth more research effort, the following list summarizes some ideas for further research questions: • How suitable are current BPMN engines and their development environments for the development of applications following the process-driven approach? • What does the ideal development environment for the process-driven approach look like? • Which additional development guidelines can be given to PDA developers? • Can the promises of the process-driven approach be confirmed by further projects? • Can the BizDevs collaboration model be further detailed? • Are BPMN choreography diagrams useful in the process-driven approach? • Can BPMN collaboration diagrams be utilized to explicitly visualize the vertical process collaboration between the layers? • What are the influences of latest IT trends (e.g., in-memory DBs, big data, cloud computing, mobile, Internet of Things, machine learning, NoSQL DBs) on the process-driven approach? • How can the extensibility of process-driven applications be achieved (e.g., by extension points which are also applied if a new version of a process-driven application is shipped by a vendor)? • Which prerequisites must be fulfilled for a roundtrip of BPMN models between business-oriented modeling environments and developer-oriented IDEs? • How can customizing of process-driven applications be achieved? • The process-driven approach involves a learning effort on the part of project team members that is representative of the type of lifelong learning needed to succeed in the digital era. How can organizations best harness the experience from process-driven projects as they seek to establish a culture and methodology of lifelong learning that fits their unique situation and needs? • Successful process-driven projects result in substantial efficiency gains for an organization, creating room for further innovation and new business models. New business models will likely require then new process-driven projects to implement them, which the organization is now equipped to do. How can organizations structure these innovation cycles to maximum benefit? • How can the PDA approach best be combined with agile software development techniques? Obviously, there is more to explore in the domain of the process-driven approach. We hope that the publication of the results gained by the complex SAP Language Services project and the application of the process-driven approach for differen-

3 Effectively and Efficiently Implementing Complex Business Processes

57

tiating business processes has provided interesting insights for researchers and practitioners alike and motivates them to invest more into this promising approach. It can be a starting point for a new wave of business process implementations helping companies to prepare themselves for the digitalization era.

References 1. M. Heiler, Managing modern business processes: how to use a process-driven architecture to achieve flexible, efficient processes. SAPinsider 17(1) (2016). Also available online at http://sapinsider.wispubs.com/Assets/Articles/2016/January/SPI-managing-modernbusiness-processes. Accessed 28 Dec 2017 2. OMG, Business Process Model and Notation (BPMN), Version 2.0 (2017), http://www.omg.org/ spec/BPMN/2.0/PDF. Accessed 27 Dec 2017 3. B. Silver, BPMN Method and Style, 2nd edn. (Cody-Cassidy Press, Aptos, 2011). ISBN 978-09823681-1-4 4. V. Stiehl, Process-Driven Applications with BPMN, 1st edn. (Springer, Heidelberg, 2014). ISBN 978-3-319-07217-3

Chapter 4

Analysis of Data-Flow Complexity and Architectural Implications Daniel Lübke, Tobias Unger, and Daniel Wutke

Abstract Service orchestrations are frequently used to assemble software components along business processes. Despite much research and empirical studies into the use of control-flow structures of these specialized languages, like BPEL and BPMN2, no empirical evaluation of data-flow structures and languages, like XPath, XSLT, and XQuery, has been made yet. This paper presents a case study on the use of data transformation languages in industry projects in different companies and across different domains, thereby showing that data flow is an important and complex property of such orchestrations. The results also show that proprietary extensions are used frequently and that the design favors the use of modules, which allows for reusing and testing code. This case study is a starting point for further research into the data-flow dimension of service orchestrations and gives insights into practical problems that future standards and theories can rely on.

4.1 Introduction The usage of analytical business processes is common in practice and has been the subject of many research projects. The logical next step, the execution of business process models, is nowadays catching up on both practical usage and a research subject.

D. Lübke () FG Software Engineering, Leibniz Universität Hannover, Hannover, Germany e-mail: [email protected] T. Unger Opitz Consulting Deutschland GmbH, Gummersbach, Germany D. Wutke W&W Informatik GmbH, Ludwigsburg, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2019 D. Lübke, C. Pautasso (eds.), Empirical Studies on the Development of Executable Business Processes, https://doi.org/10.1007/978-3-030-17666-2_4

59

60

D. Lübke et al.

So far, most research has focused on the control flow of processes, e.g., the graphbased structures in BPMN [20] or the usage of activities in BPEL. For example, Hertis and Juric [7] and Lübke [9] analyzed control-flow dimensions of industrial BPEL processes. However, for executable processes, especially those that orchestrate multiple services, the data-flow dimension is also important: data needs to be transferred between different activities in the process and needs to be converted into a format consumable by the services being orchestrated. So far, we know no publications that deal with the implementation and the complexity of data flow in executable business processes and their relationship to the control flow. Without knowing the data-flow dimension, existing approaches to model, test, and verify business processes cannot judge whether and to what extent they must include the data-flow dimension. Also, there are no reliable sources for practitioners working in implementation projects to estimate implementation and testing effort with regard to the data flow. In order to fill this gap, we conducted a case study of executable business processes implemented in BPEL that is presented in this chapter. This study aims at providing metrics of data flow and comparing it to the control-flow dimension of processes collected from a number of industry projects. This is a first step to better comprehending the challenges modelers and developers face when developing executable business processes. Research into data flow has proved difficult because all vendors of BPEL engines provide proprietary implementations and extensions. Without knowing the exact causes, this can be a sign that the technologies provided by the BPEL standard are insufficient and/or that the data-flow implementation is an important development task that vendors chose to optimize in order to better sell their products. The case study presented in this paper was conducted based on a collection of executable BPEL processes from three companies from different domains, ranging from processes for system-internal service integration to cross-organizational business processes. The analyzed process models target one out of three different BPEL engines and are built using the respective vendor-specific modeling tool and employ the vendor-specific BPEL extension supported by the target platform. This paper is structured according to the suggestion by Runeson et al. [16]: First, related work is discussed in Sect. 4.2 before BPEL as the modeling language of the process models used in the case study is shortly introduced in Sect. 4.3. The case study design is outlined afterward in Sect. 4.4, and its results are presented in Sect. 4.5. The latter contains subsections for detailing the metrics and their interpretation as well as possible threats to their validity. Finally, conclusions and possible future work are given in Sect. 4.6.

4 Analysis of Data-Flow Complexity and Architectural Implications

61

4.2 Related Work 4.2.1 Earlier Studies There are not that many but still some empirical studies on the practical usage of BPEL and BPMN. Cardoso [4] tried to empirically validate process-flow metrics for BPEL processes with a complexity metric defined by him. However, no data-flow dimensions are discussed. zur Muehlen and Recker [20] did the first study into the practical usage of BPMN: they studied which visible BPMN elements were used by different stakeholder groups. Because the executable information, especially data input and output, is stored in non-visible attributes, the study does not contain any information about it. In addition, the analyzed process models are descriptive only. Hertis and Juric [7] did a much larger study into metrics of BPEL processes: however, they collected process-flow-related data only, e.g., different activity counts and activity usage patterns. No data-flow metrics were described nor gathered. Also, Lübke [9] analyzed timelines of static BPEL metrics in an industry project. These metrics were process flow related, and no insights about data flow could be taken from those. Thus, data-flow dimensions of industry BPEL processes are not known. Song et al. [18] conducted an empirical study on data-flow bugs in BPEL processes. However, the authors did not characterize the data-flow dimension itself but concentrated on three data-flow bug categories. All in all, no empirical studies into the characterization of data-flow dimensions of executable business processes in BPEL or BPMN2 have been made to the authors’ knowledge as of today.

4.2.2 Theory One of the reasons no empirical studies about the data-flow dimension of processes have been made might be that in the history of the research into business processes. Empirical research has mainly concentrated on analytical models. Even with the rise of standardized executable languages, namely, BPEL and BPMN2, research has mostly concentrated on the already existing properties of analytical models: process-flow complexity. As a result, not many publications about data flow are available which in turn might explain the missing empirical evidence: if no theories are created that need to be verified, empirical research has no research questions to answer. Cardoso [5] first raised the question on how to measure data-flow metrics. The metric that would measure the code complexity of data transformations is called “interface integration complexity” by him. However, the paper is only a position

62

D. Lübke et al.

paper that concludes with the question “[h]ow to calculate the interface integration complexity of BPEL4WS containers.” Parizi and Ghani [15] also raised the question of measurements for data-flow complexity. However, they only cite Cardoso’s original question and offer no further theory or answers themselves. Some related work is available from the GRID domain, in which BPEL processes have been used to orchestrate academic workflows. For example, Slomiski [17] compared different approaches and GRID-specific challenges like handling large data sets and streaming data. However, in usual business application domains, data is not that large but structured in a more complex manner: data often needs to be converted between heterogeneous data models, and conversion frequently involves conditional logic to determine the attributes that need to be copied and possibly converted. The importance of considering data flow in addition to control flow in the context of formal verification of BPEL processes has been recognized by Moser et al. [13] and Zheng et al. [19] where the authors describe algorithms for deriving the data flow of BPEL processes and incorporating it into formal process representations, such as Petri nets or automata. A related area to service orchestrations is the design of service choreographies, in which services are not centrally orchestrated but instead call each other. Meyer et al. [12] present an approach that relies on a global data model that is mapped to the local data model of each service. The approach visualizes mappings by the use of UML diagrams and references existing standards like OWL and XPath but does not try to assess which data transformation technique would fit the approach and how much development effort this layer requires. Nikaj et al. [14] also present an approach to derive a REST service design from BPMN choreographies. While the approach helps to identify resources and appropriate verbs, the data model is clearly marked as out of scope.

4.3 Business Process Execution Language (BPEL) The Business Process Execution Language (BPEL) is a language for modeling executable business processes and standardized by OASIS [8]. It is focused on orchestrating web services that are described by WSDL and XML Schema. BPEL is defined by a set of activities that is split into basic activities and structured activities. Basic activities perform a function, e.g., calling a service (invoke), doing data transformations (assign), or waiting for an inbound message (receive). Structured activities contain other activities and define the control flow between them, e.g., executing one activity after another (sequence) or looping (forEach, repeatUntil, while). The structured activity flow allows graph-based modeling and parallel activity execution. All other activities are block based and may only be nested hierarchically.

4 Analysis of Data-Flow Complexity and Architectural Implications

63

For handling XML data, BPEL mandates XPath and—via an XPath extension function—XSLT. XPath can be used for conditions (e.g., in an if or as a loop condition) or for copying data. Data copies are defined in copy elements inside an assign activity. A small code snippet copying data from a received message to a response message, for example, is implemented as follows (namespace prefixes have been omitted for clarity): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

... < v a r i a b l e messag eTy p e = " s a y H e l l o " name= " s a y H e l l o R e q u e s t " / > < v a r i a b l e messag eTy p e = " s a y H e l l o R e s p o n s e " name= " s a y H e l l o R e s p o n s e " / > < v a r i a b l e messag eTy p e = " name " name= " s t r i n g " / > ... < sequence >

< a s s i g n name= " P r e p a r e R e s p o n s e "> < co p y > < f r o m> b p e l : d o X s l T r a n s f o r m ( ’ prepareSayHel l oResponse . x s l ’ , $ sayHelloRequest . parameters ) < / f r o m> < t o p a r t =" p ar amet er s " v a r i a b l e =" sayHelloResponse " / > < / co p y> < co p y > < f r o m>$ s a y H e l l o R e q u e s t . p a r a m e t e r s / l a s t N a m e ) < / f r o m> < t o v a r i a b l e = " name " / > < / co p y > < / a s s i g n>

< / sequence >

A message is received by the receive activity (line 7) and copied to the variable sayHelloRequest. The variable sayHelloRespnse is prepared by an assign activity (line 13). The assign has two copy blocks. The first copy (line 14) uses XSLT via BPEL’s built-in doXslTransform XPath function and copies the result to the response. The second copy (line 21) simply copies the result of an XPath expression to an atomic variable. The reply (line 26) sends the newly created message to the caller. Like most WS-* standards, BPEL is designed to be extensible: new query and expression languages besides XPath can be referenced by the use of URNs, and a placeholder extension activity can contain vendor-specific activities. Although BPEL has been superseded by the BPMN standard, it is still used, and many companies have large repositories of BPEL processes that contain lessons learned that apply not only to BPEL but to executable business processes and service orchestrations in general.

64

D. Lübke et al.

4.4 Case Study Design 4.4.1 Research Questions We formulate our research goal according to the goal/question/metric (GQM) method [1]:

The purpose of this study is to characterize the implementation of data flow in BPEL processes from the point of view of a solution architect in the context of executable business process development projects.

We refined this overall research goal into the following questions: RQ1: Which data-flow modeling choices are preferred on specific tools? The BPEL standard itself supports XPath and XSLT (via an XPath extension function). However, BPEL is designed with many extensibility points. One of those extensions can be the use of other languages to formulate expressions and queries on XML data. For example, many BPEL engines support XQuery (e.g., Apache ODE and its derivatives, ActiveVOS, and Oracle BPM Suite), while others allow to embed Java code or offer custom XML data mappings (both options, e.g., IBM WebSphere Process Server (WPS)). When several data-flow implementation choices are available, the question which ones are preferred (or being pushed upon) by developers arises. We hypothesize a) that the developers prefer to use the proprietary extensions provided by the tools, which in general should be more prominently offered in the development tools and should be more powerful than the standard ones because otherwise the tool vendors would have had no incentive to implement those, and b) that the most powerful options XQuery and Java are preferred over other implementation choices. We measure this by counting lines of code and expect XQuery and Java to have the largest amount of lines on ActiveVOS/Oracle and WebSphere Process Server, respectively. RQ2: What amount of data flow is portable, i.e., standards compliant? Because we expect the proprietary data-flow implementation choices to be preferred by the developers, our hypothesis is that no BPEL process is fully standards compliant with regard to its data-flow implementation. Because we expect XPath and XSLT to be used nevertheless in some spots (e.g., for formulating conditions and transforming XML data, where XSLT can excel in some circumstances), some portions of the data flow are expected to be standards compliant, i.e., use XPath and/or XSLT. Because we expect most XML messages to be produced by non-standards-compliant code and we expect those to make the bulk of data-flow implementation code, we hypothesize that less of 10% of

4 Analysis of Data-Flow Complexity and Architectural Implications

65

the lines of data-flow code are implemented in one of the languages offered by the BPEL standard. RQ3: Is the data flow in executable business processes larger than the process flow? Because executable business processes possibly connect to many different systems exposing different services and business objects (BOs), we expect data transformations to be an integral and large part of a business process solution. As metrics for measuring complexity, we use the number of conditional branches (e.g., if and switch) and the number of iterations (e.g., for, while, and repeat until loops). For measuring the size, we use the number of lines of code (data flow) and the number of basic activities (process flow). For all these metrics, we hypothesize that the data-flow dimension is larger than the process-flow dimension: (1) We estimate that more lines of code are needed than there are basic activities because the XML messages usually contain more than ten elements. (2) We also expect that there are more conditions and loops. The more statements exist (regardless of the abstraction level), the more conditions and loops are required to order them. Following of (1)+(2), we expect more data-flow conditions and iterations, although we doubt any direct relationship. Originally, we planned to compare not only LOCs and counts but also the complexity of BPEL control-flow structures and the complexity of the dataflow implementations. However, while McCabe’s cyclomatic complexity [11] can be easily applied to XQuery, no adaption to BPEL nor XSLT is available. Cardoso’s complexity metric [3] has many weights in formulas and is not well defined for all graph-based processes (BPEL’s flow activity). The weights forbid direct comparisons to McCabe’s unweighted complexity metric. Therefore, we decided to use counts of iterations and conditions instead of a more sophisticated complexity metric. RQ4: Are data-flow implementations mainly large but linear or mainly complex? From an architecture perspective, an interesting question is where complexity is located. Thus, one important question is whether the code concerned with the data flow is not only large compared to the process flow but whether it is mainly linear, thus “easy” code, or whether it contains many controlflow structures. We expect the data-flow code to be simple because we expect most of the code to simply insert values into XML templates. Only at some points we expect decisions for optional elements or loops for lists. However, we expect more conditions than loops. Therefore, we hypothesize that we have maximum one condition per four lines of data transformation code and maximum one iteration per five lines of code. RQ5: What are possible factors for increased complexity of data flow? From an architecture point of view, it is important to know and identify drivers of data-flow complexity to better plan and estimate implementation and testing. Because we think that data flow is mainly needed to prepare messages, we hypothesize that the number of message exchange activities (receive, reply, invoke, onMessage, onEvent) correlates linearly to the lines of data-flow code. If this correlation holds, it can be used on analytical models, which contain the message exchanges, to better judge the technical implementation later on. We

66

D. Lübke et al.

also hypothesize that the data-flow complexity measured by counting conditions and iterations also correlates linearly to the number of message exchange activities: the usage of conditions and iterations is probably dependent on the differences in the schemas being integrated but should behave the same within one project.

4.4.2 Case and Subject Selection For answering the outlined research questions, we conduct a case study on processes from three different companies. All processes are BPEL processes so that the choices and metrics are comparable and influences of product choices can be isolated from the modeling language. The processes target different BPEL engines (Informatica ActiveVOS 9.2, IBM WebSphere Process Server 7.1 and WebSphere Business Process Manager (BPM) 8.5, Oracle Business Process Management Suite 12c) and are modeled using the respective vendor-supplied modeling tool. Informatica ActiveVOS is a BPEL engine which supports the full WS-BPEL 2.0 standard but also has proprietary extensions for modeling BPEL processes and visualizing them as BPMN. One of these extensions is the support of XQuery for expressions and queries, i.e., in all places where XPath is allowed. XQuery as a superset of XPath is more powerful and can be used to fully replace XSLT transformations. IBM WebSphere Process Server (WPS) and its successor Business Process Manager (BPM) are workflow engines on top of a JEE application server. In addition to BPEL, IBM BPM also supports modeling and execution of processes modeled in BPMN. As this study focuses on BPEL processes, only the BPEL-specific aspects of BPM are discussed. Besides WS-BPEL 1.1 processes, WPS/BPM supports the execution of state machines and business rules. Service integration is performed via an integration solution (WebSphere ESB) that comes with the workflow engine. Regarding data flow, WPS/BPM supports standard XPath expressions as well as the vendor-specific business object maps, XML maps, and Java code embedded in the BPEL process model. Oracle Business Process Management Suite 12c is a toolset and integration platform for development and execution of SOA-based applications. Among other components, the BPEL Process Manager supports the execution of BPEL 2.0 processes. Oracle also provides a set of vendor-specific extensions like XQuery integration in XPath, a replay activity for restarting scopes, and human tasks. In the following, we present an overview of the processes used in the case study, following the categorization proposed in Chap. 2. The first project that is contained in our analysis is Terravis (see Table 4.1). Terravis is a process integration platform that allows to conduct cross-organizational processes between land registries, notaries, and banks [2]. The project uses the ActiveVOS BPEL engine, which is developed by Informatica. We analyzed a

4 Analysis of Data-Flow Complexity and Architectural Implications

67

Table 4.1 Aggregated metadata for the ActiveVOS process collection (Terravis, classification according to [10]) Collection name Process count Domain Geography Time Boundaries Relationship Scope Process model purpose People involvement Process language Execution engine Model maturity

Terravis 86 Land register transactions Switzerland 12-2017 Cross-organizational 23%, intraorganizational 17%, intra-system 60% Calls another 17%, calls another/is being called 24%, event triggered 16%, is being called 1% Core 34%, technical 38%, auxiliary 28% Executable Mostly 17%, partly 3%, none 79% WS-BPEL 2.0, BPEL4People, plus vendor extensions ActiveVOS 9.2.x Productive 100%

Table 4.2 Aggregated metadata for the WPS/BPM process collection Collection name Process count Domain Geography Time Boundaries Relationship Scope Process model purpose People involvement Process language Execution engine Model maturity

Banking and Insurance 75 Banking, insurance Germany 05-2017 Cross-organizational 4%, intraorganizational 67%, within system 29% Calls another 12%, is being called 64%, is being called/calls another 5%, no call 19% Core 13%, technical 87% Executable None 92%, partly 8% WS-BPEL 1.1 plus vendor extensions IBM WebSphere Process Server 7.1, IBM Business Process Manager 8.5 Illustrative 3%, productive 73%, retried 24%

snapshot taken from the repository, which was taken in December 2017. We needed to exclude processes from this project that use XQuery 2.0 features that are not supported by our analysis tool. The second process collection contains processes from the banking and insurance domain (see Table 4.2) with a strong focus on technical integration processes. The processes use the IBM WebSphere Process Server and IBM Business Process Manager BPEL engines and integration solutions. The analyzed snapshot was taken in May 2017.

68

D. Lübke et al.

Table 4.3 Aggregated metadata for the Oracle SOA Suite process collection Collection name Process count Domain Geography Time Boundaries Relationship Scope Process model purpose People involvement Process language Execution engine Model maturity

Wholesale and Retail Trade 23 Commerce Europe 2015 Cross-organizational 17%, intraorganizational 83% Calls another 96%, is being called 4% Technical 100% Executable None 96%, partly 4% WS-BPEL 2.0 plus vendor extensions Oracle SOA Suite 12.1 Illustrative 4%, productive 96%

Table 4.4 Proprietary extensions for data-flow definition Informatica ActiveVOS Oracle BPM Suite IBM WebSphere Process Server/Business Process Manager

XQuery in assign activities, import of XQuery modules for usage in XQuery statements embedded into the assign activities XQuery in XPath Java, business object (BO) maps, XML maps

The Oracle process collection shown in Table 4.3 is used to integrate retailers with their suppliers.

4.4.3 Data Collection and Analysis Procedure In the first step, the data transformations of the collected processes have been analyzed: the main goal is to identify the proprietary extensions used for defining the data flow. The extensions found are presented in Table 4.4. The data collection for the static metrics itself was done with a custom static code analyzer named BPELStats that had been developed under the umbrella of the BPELUnit project and is now developed as a standalone project. BPELStats has been originally developed for gathering the metrics presented in [9] and is available as open source.1 Among other metrics, BPELStats can count BPEL activities by type and has been extended as part of this study to compute the number of occurrences, iterations,

1 https://github.com/dluebke/bpelstats.

4 Analysis of Data-Flow Complexity and Architectural Implications

69

conditions, and LOCs for XPath, XSLT, XQuery, BOMaps, XMLMaps,2 and Java code. The calculation is based on whole files: if a file is imported into the BPEL process, all functions, templates, etc. are counted toward the metrics, and no check is made whether a certain piece is actually called by the process or not. All of our extensions have been contributed to the BPELStats project and are freely available and can be reviewed. Clean checkouts of the process projects have been done first. The total sample size contains 184 executable BPEL processes. Where necessary, a full build has been triggered before the analysis when the build is necessary to copy all external dependencies (e.g., WSDLs, XML Schemas, reused XSLT and XQuery files) to their correct positions enabling BPELStats to also follow and resolve imports of those files from the BPEL processes.

4.4.4 Validity Procedure In order to ensure the correctness of the measurements, the BPELStats tool was tested with both unit tests and manual tests. The whole gathering routines were automated by using shell scripts in order to eliminate human error. These shell scripts were also tested by all researchers in this study in order to show that the results are correctly computed. For allowing other researchers to replicate this study, the used scripts have been made available.3 This also allows other researchers to check their correctness. We tried to cover as many different BPEL toolsets as possible to strengthen external validity. With three completely different BPEL engines used in three large industrial projects at different organizations, we are confident to be able to distinguish influences imposed by the tooling and the project from those that are inherent to the problem of service orchestrations and executable business processes.

4.5 Results Within this section, we present the plain results of our case study, which mainly include the metrics being gathered as part of our measurements. The interpretation of these measurements is presented in the following section.

2 As XMLMaps are compiled to XSL transformations during build time, their metrics are calculated using the XSLT sublanguage parser and hence show up as XSLT metrics in the results with their occurrences being counted separately. 3 The files are accessible at http://www.daniel-luebke.de/files/bpm-dataflowcomplexity.tgz.

70

D. Lübke et al.

4.5.1 Metrics For answering our research questions, we collected the metrics required to answer them. The first analysis computed the occurrences of data flow in the process collection and the lines of code of data-flow code. An occurrence is a place at which data-flow code is embedded into the process. If the process has five assign activities, there are at least five occurrences of data-flow mappings, depending on the number and type of copy statements. Each data-flow occurrence contains at least one line of code but can contain multiple ones. This means that the number of occurrences is equal to or less than the number of lines of code. We aggregated the metrics as shown in Table 4.5 according to the different languages that we found in the process set. The language which has the most occurrences in our process set is XPath. However, XQuery has more lines of code. Java is the third most often used language followed by XSLT. We aggregated this data further and clustered the languages into portable (standards compliant, i.e., XPath, XSLT) and non-portable (not mandated by the BPEL standard, i.e., XQuery, Java, BOMaps, and XMLMaps). The results are shown in Table 4.6: nearly half of the occurrences (46%) of data-flow code are portable, but only 8.96% of lines of code are written in standard-mandated languages. In the next step, we analyzed the relationship between the process flow and the data flow. Figure 4.1 depicts the relationship between the number of basic activities and the lines of data-flow code. The plots indicate a linear relationship between basic activities and data flow. Therefore, we computed Spearson’s linear correlation coefficient between these two dimensions, which is c = 0.8162 (Terravis), c = 0.9035 (Wholesale and Retail Trade), c = 0.6962 (Banking and Insurance), and Table 4.5 Data-flow occurrences and LOCs by engine and implementation choice Metric XPath occurrences XPath LOCs XSLT occurrences XSLT LOCs XQuery occurrences XQuery LOCs Java occurrences Java LOCs BOMap occurrences BOMap LOCs XMLMap occurrences XMLMap LOCs Total occurrences Total LOCs

ActiveVOS 2419 2865 4 4437 4173 21, 888 0 0 0 0 0 0 6596 29, 190

Oracle BPM 2380 2380 0 0 108 10, 010 0 0 0 0 0 0 2488 12, 390

IBM WPS/BPM 738 738 0 0 0 0 2193 13, 298 32 3676 0 0 2963 17, 712

4 Analysis of Data-Flow Complexity and Architectural Implications Table 4.6 Percentages of used data-flow technologies

Portability Portable Not portable

Percentage (S)LOCs 8.96% 91%

Oracle

0 1000

3000

(S)LOC Count

5000 3000 0 1000

(S)LOC Count

Percentage occurrences 46% 54%

5000

ActiveVOS

71

0

100

200

300

400

500

0

100

Basic Activities

200

300

400

500

Basic Activities

3000 0 1000

(S)LOC Count

5000

WPS

0

100

200

300

400

500

Basic Activities

Fig. 4.1 Data-flow (S)LOC count and number of basic activities

c = 0.848 combined for the whole data set. Except for the Banking and Insurance data set, the relationship between the basic activities and the data-flow lines of code is linear. As such, we added regression lines to Fig. 4.1. In order to judge whether the process-flow or data-flow dimension is larger, we computed a two-sided, paired Wilcoxon test: for the Terravis data set p = 1.101 × 10−10, for the Wholesale and Retail Trade data set p = 2.886 × 10−5 , and for the Banking and Insurance data set p = 5.947 × 10−14 . If all data sets are combined, the Wilcoxon test results in p = 1.946 × 10−31. All computed p-values are much smaller than 0.01, which is a commonly accepted threshold for highly significant results. In the next step, we drilled down into the nature of the control flow and data flow and computed the number of iterations and conditions within each language. We plotted the conditions of each dimension against each other in Fig. 4.2 and the iterations against each other in Fig. 4.3. For each project alone and all combined, we again computed Spearson’s linear correlation coefficient for

72

D. Lübke et al.

80 100 120 60 40 20

40

60

BPEL Conditions

80 100 120

Oracle

0

0

20

BPEL Conditions

ActiveVOS

0

50

100

150

200

250

300

Data−flow conditions

0

50

100

150

200

250

300

Data−flow conditions

80 100 120 60 40 0

20

BPEL Conditions

WPS

0

50

100

150

200

250

300

Data−flow conditions

Fig. 4.2 BPEL conditions and data-flow conditions

both conditions (ccond = {0.75, 0.5671, 0.6053, 0.6538}) and iterations (cit = {0.591, 0.9503, −0.1554, 0.6444}). Because |c| < 0.75| holds for correlation coefficients with two exceptions (conditions for Terravis and iterations for Wholesale and Retail Trade), our data cannot support any linear correlation, especially because there are also correlation coefficients with different signs (cit is negative for WPS). We also conducted a two-tailed, paired Wilcoxon test for differences between the process-flow and data-flow dimension on this data: for conditions of the three projects and all projects combined pcond = (0.5215, 0.0001651, 0.6319, 0.07883) and for iterations pit = (4.997 × 10−7 , 0.0009128, 0.6768, 1.59 × 10−8 ) While for all p-values concerning the conditions p > 0.05 holds, nearly all p-values—except for the WPS project—hold p < 0.01. These differences concerning the different projects and BPEL engines led to another drill-down into the data. As shown in Fig. 4.4, we plotted the distribution of conditions and iterations in relation to the lines of data-flow code. We split this data by projects and additionally combined the processes using XQuery from the Terravis process set and the Wholesale and Retail Trade process set. The plot suggests that the number of conditions and iterations in the data flow is significantly different

4 Analysis of Data-Flow Complexity and Architectural Implications Oracle 20 15 10

BPEL Iterations

0

5

15 10 5 0

BPEL Iterations

20

ActiveVOS

73

0

10

20

30

40

50

60

Data−flow iterations

0

10

20

30

40

50

60

Data−flow iterations

15 10 5 0

BPEL Iterations

20

WPS

0

10

20

30

40

50

60

Data−flow iterations

Fig. 4.3 BPEL iterations and data-flow iterations

between the data sets, also when the data sets use the same language, which is the case in Terravis and Wholesale and Retail Trade for XQuery. The median values of conditions per lines of data-flow code are below 0.15. Except for the Wholesale and Retail Trade data set, they are even below 0.10. All values for the iterations are below 0.075, and the median values are all below 0.025. The data indicates that there are more conditions per lines of code than iterations. For answering the last research question, we computed the number of message exchange activities in the processes and computed the correlation coefficient to the lines of data-flow code. The results are summarized in Table 4.7: For two of the three data sets, we get a correlation coefficient c with c > 0.75, which hints at a linear relationship. The exception is the WPS data set, which mainly uses Java. The mean of the message activities per lines of data-flow code is between 0.026 and 0.054 in the data sets.

74

D. Lübke et al.

0.05

0.05

0.00

0.00 BOMap

0.10

Java

0.10

XSLT

0.15

XQuery combined

0.15

BOMap

0.20

Java

0.20

XSLT

0.25

XQuery combined

0.25

XQuery Oracle

0.30

XQuery ActiveVOS

0.30

XQuery Oracle

Iterations

XQuery ActiveVOS

Conditions

Fig. 4.4 Number of data-flow conditions and iterations per LOC Table 4.7 Message exchanges per process collection and (S)LOC Metric Message activities (S)LOC Message activities/(S)LOC Correlation coefficient

ActiveVOS 2238 41,100 0.054 0.792

Oracle BPM 316 12,390 0.026 0.907

IBM WPS 750 17,712 0.042 0.693

4.5.2 Interpretation 4.5.2.1 RQ1: Which Data-Flow Modeling Choices Are Preferred on Specific Tools? The results presented in Table 4.5 confirm hypothesis (a) as well as hypothesis (b). However, the result is most succinct for the IBM WPS collection. One reason is that Java can be used almost anywhere in IBM Process Server. Furthermore, IBM’s tooling supports the Java extensions very well. ActiveVOS and Oracle BPM do not support Java in conditions. In the case of Oracle, the interpretation is not that easy. Although XQuery is a popular option for data mapping, the numbers in Table 4.5 show that also XPath is very popular. One reason is that Oracle integrates XQuery as an XPath extension function. Whenever XQuery is used, XPath is also used. This means from a

4 Analysis of Data-Flow Complexity and Architectural Implications

75

BPEL perspective you always use a BPEL standard option (XPath), because Oracle decided to extend XPath. The fact that there are much more XQuery LOCs than XPath LOCs confirms that XQuery is the preferred option for data transformation. Furthermore, Oracle provides a modeling tool for XPath. Compared to IBM and Oracle, ActiveVOS is the most standard compliant engine. However, the disproportionately larger number of XQuery LOCs proves that the vendor extension XQuery is preferred over XPath. Form a skill perspective, we quite often observe that BPEL processes are modeled by developers. For them, XQuery might be easier to learn than XSLT; and due to Java being a common language used for developing enterprise software, a large number of those developers are probably already familiar with Java, so they do not need to learn new languages like XPath or XSLT.

4.5.2.2 RQ2: What Amount of Data Flow Is Portable, i.e., Standards Compliant? Table 4.6 shows that the amount of non-portable data-flow code is over 90%, i.e., porting a process would lead to a nearly complete reimplementation of the data flow. However, the occurrences of BPEL-compliant data-flow implementations and implementations using vendor extensions are nearly equal. One reason is that ActiveVOS and Oracle BPEL only support XML-based implementations in conditions. In addition, developers try to make implementation of conditions easy, i.e., XPath is sufficient by moving complicated transformation to assign activities. Ironically, the possibility to use Java in WPS also makes things easier for developers. Java provides a larger set of operators that allow to formulate conditions more easy and compact. For example, the Exclusive OR (XOR) is not supported in XPath but in Java. Another aspect of portability is that each vendor chooses a different implementation of an extension, even if the extension itself is provided by multiple vendors. For example, the ActiveVOS XQuery extension is not compatible with Oracle’s extension, and IBM Java activity is not compatible with Oracle’s Java activity. Thus, portability would also not increase if comparing the process collections pairwise. Our results also support the conclusion that there is something wrong with the standard. Maybe the standard fell victim to its extensibility because every vendor used the extensibility to differentiate its product. Maybe the vendors also only approved the standard because of its extensibility because portability was not really an issue. For architects, this raises the question which language is the best option to model data flow. This also raises a lot of research questions: What is the best, most productive, easy-to-maintain set of data transformation languages? How to decide on a language in a concrete project setting?

76

D. Lübke et al.

4.5.2.3 RQ3: Is the Data Flow in Executable Business Processes Larger than the Process-Flow? The highly significant p-values (all p 3? BPMN model Use case spec. Both