Advances and New Trends in Environmental Informatics: ICT for Sustainable Solutions [1st ed. 2020] 978-3-030-30861-2, 978-3-030-30862-9

This book is an outcome of the 33rd International Conference EnviroInfo 2019, held at the University of Kassel, Germany.

561 30 6MB

English Pages VIII, 178 [180] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Advances and New Trends in Environmental Informatics: Digital Twins for Sustainability 3030619680, 9783030619688

This book is an outcome of the 34th International Conference EnviroInfo 2020, hosted virtually in Nicosia, Cyprus by the

18,473 21 9MB Read more

Environmental Informatics: Challenges and Solutions 9789811920837, 9789811920820, 9811920834

This interdisciplinary book incorporates various aspects of environment, ecology, and natural disaster management includ

107 18 51MB Read more

International Conference on Mobile Computing and Sustainable Informatics : ICMCSI 2020 [1st ed.] 9783030497941, 9783030497958

Sustainability and mobile computing embraces a wide range of Information and Communication Technologies [ICT] in recent

3,094 123 23MB Read more

New Trends in Medical and Service Robotics: MESROB 2020 [1st ed.] 9783030581039, 9783030581046

This book contains the papers of the 7th International Workshop on Medical and Service Robots (MESROB) that was planned

590 110 60MB Read more

Advances in Probiotics for Sustainable Food and Medicine [1st ed.] 9789811567940, 9789811567957

This book focuses on probiotics as sustainable foods and medicines, discussing issues such as screening and identificati

940 103 4MB Read more

Sustainable Solutions for Elemental Deficiency and Excess in Crop Plants [1st ed.] 9789811586354, 9789811586361

This book covers all aspects of deficiency of essential elements and excess of toxic ones in crop plants. The metal defi

1,274 161 9MB Read more

New Advances in Mechanisms, Mechanical Transmissions and Robotics : MTM & Robotics 2020 [1st ed.] 9783030600754, 9783030600761

This volume gathers the proceedings of the Joint International Conference of the XIII International Conference on Mechan

1,609 124 94MB Read more

Advances in Green and Sustainable Nanomaterials: Applications in Energy, Biomedicine, Agriculture, and Environmental Science 9781774911679

Sustainable development has been gaining momentum in the modern world, and the use of nanomaterials in various applicati

347 92 6MB Read more

IoT and ICT for Healthcare Applications [1st ed.] 9783030429331, 9783030429348

This book provides an insight on the importance that Internet of Things (IoT) and Information and Communication Technolo

1,625 410 8MB Read more

Advances in Probiotics for Sustainable Food and Medicine 9811567948, 9789811567940

This book focuses on probiotics as sustainable foods and medicines, discussing issues such as screening and identificati

627 52 4MB Read more

Advances and New Trends in Environmental Informatics: ICT for Sustainable Solutions [1st ed. 2020]
978-3-030-30861-2, 978-3-030-30862-9

Author / Uploaded
Rüdiger Schaldach
Karl-Heinz Simon
Jens Weismüller
Volker Wohlgemuth

Table of contents :
Front Matter ....Pages i-viii
Assessing the Sustainability of Software Products—A Method Comparison (Javier Mancebo, Achim Guldner, Eva Kern, Philipp Kesseler, Sandro Kreten, Felix Garcia et al.)....Pages 1-15
Estimate of the Number of People Walking Home After Compliance with Metropolitan Tokyo Ordinance on Measures Concerning Stranded Persons (Toshihiro Osaragi, Tokihiko Hamada, Maki Kishimoto)....Pages 17-35
Gamification for Mobile Crowdsourcing Applications: An Example from Flood Protection (Leon Todtenhausen, Frank Fuchs-Kittowski)....Pages 37-53
MoPo Sane—Mobility Portal for Health Care Centers (Benjamin Wagner vom Berg, Aina Andriamananony)....Pages 55-65
Platform Sustainable Last-Mile-Logistics—One for ALL (14ALL) (Benjamin Wagner vom Berg, Franziska Hanneken, Nico Reiß, Kristian Schopka, Nils Oetjen, Rick Hollmann)....Pages 67-78
Scientific Partnership: A Pledge For a New Level of Collaboration Between Scientists and IT Specialists (Jens Weismüller, Anton Frank)....Pages 79-89
Emission-Based Routing Using the GraphHopper API and OpenStreetMap (Martin Engelmann, Paul Schulze, Jochen Wittmann)....Pages 91-104
Digitally Enabled Sharing and the Circular Economy: Towards a Framework for Sustainability Assessment (Maria J. Pouri, Lorenz M. Hilty)....Pages 105-116
Exploring the System Dynamics of Industrial Symbiosis (IS) with Machine Learning (ML) Techniques—A Framework for a Hybrid-Approach (Anna Lütje, Martina Willenbacher, Martin Engelmann, Christian Kunisch, Volker Wohlgemuth)....Pages 117-130
Graph-Grammars to Specify Dynamic Changes in Topology for Spatial-Temporal Processes (Jochen Wittmann)....Pages 131-148
Online Anomaly Detection in Microbiological Data Sets (Leonie Hannig, Lukas Weise, Jochen Wittmann)....Pages 149-163
Applying Life Cycle Assessment to Simulation-Based Decision Support: A Swedish Waste Collection Case Study (Yu Liu, Anna Syberfeldt, Mattias Strand)....Pages 165-178

Citation preview

Progress in IS

Rüdiger Schaldach Karl-Heinz Simon Jens Weismüller Volker Wohlgemuth Editors

Advances and New Trends in Environmental Informatics ICT for Sustainable Solutions

Progress in IS

“PROGRESS in IS” encompasses the various areas of Information Systems in theory and practice, presenting cutting-edge advances in the field. It is aimed especially at researchers, doctoral students, and advanced practitioners. The series features both research monographs that make substantial contributions to our state of knowledge and handbooks and other edited volumes, in which a team of experts is organized by one or more leading authorities to write individual chapters on various aspects of the topic. “PROGRESS in IS” is edited by a global team of leading IS experts. The editorial board expressly welcomes new members to this group. Individual volumes in this series are supported by a minimum of two members of the editorial board, and a code of conduct mandatory for all members of the board ensures the quality and cutting-edge nature of the titles published under this series.

More information about this series at http://www.springer.com/series/10440

Rüdiger Schaldach Karl-Heinz Simon Jens Weismüller Volker Wohlgemuth •

•

•

Editors

Advances and New Trends in Environmental Informatics ICT for Sustainable Solutions

123

Editors Rüdiger Schaldach CESR University of Kassel Kassel, Germany Jens Weismüller Leibniz Supercomputing Centre Bavarian Academy of Sciences and Humanities Munich, Germany

Karl-Heinz Simon CESR University of Kassel Kassel, Germany Volker Wohlgemuth Department of Engineering - Technology and Life HTW Berlin - University of Applied Sciences Berlin, Germany

ISSN 2196-8705 ISSN 2196-8713 (electronic) Progress in IS ISBN 978-3-030-30861-2 ISBN 978-3-030-30862-9 (eBook) https://doi.org/10.1007/978-3-030-30862-9 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

This book presents the main research results of the 33rd edition of the long-standing and established international and interdisciplinary conference series on environmental information and communication technologies (Envirolnfo 2019). The conference was held from 23 to 26 September 2019 at the University of Kassel. It was organized by the Center for Environmental Systems Research (CESR), under the patronage of the Technical Committee on Environmental Informatics of the Gesellschaft für Informatik e.V. (German Informatics Society—GI). Combining and shaping national and international activities in the field of applied informatics and environmental informatics, the EnviroInfo conference series aims at presenting and discussing the latest state-of-the-art development on information and communication technology (ICT) and environmental-related fields. A special focus of the conference was on potential contributions of ICT technologies and tool to achieve the sustainability goals (SDGs) of the United Nations in context of the Agenda 2030 and to support societal transformation processes. Accordingly, the articles in this book not only present innovative approaches and ICT solutions related to a wide range of SDG-relevant topics such as sustainable mobility, human health and circular economy but also to other questions that are central for environmental informatics research, including advanced methods of environmental modelling and machine learning. The editors would like to thank all the contributors to the conference and these conference proceedings. Special thanks also go to the members of the programme and organizing committees. In particular, we like to thank the organizers of the GI Informatik 2019 conference that took place as a parallel event, for their support in providing local logistics. Last but not least a warm thank you to our sponsors who supported the conference. Kassel, Germany Kassel, Germany Garching, Germany Berlin, Germany July 2019

Rüdiger Schaldach Karl-Heinz Simon Jens Weismüller Volker Wohlgemuth

v

Contents

Assessing the Sustainability of Software Products—A Method Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Javier Mancebo, Achim Guldner, Eva Kern, Philipp Kesseler, Sandro Kreten, Felix Garcia, Coral Calero and Stefan Naumann Estimate of the Number of People Walking Home After Compliance with Metropolitan Tokyo Ordinance on Measures Concerning Stranded Persons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Toshihiro Osaragi, Tokihiko Hamada and Maki Kishimoto Gamification for Mobile Crowdsourcing Applications: An Example from Flood Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leon Todtenhausen and Frank Fuchs-Kittowski

1

17

37

MoPo Sane—Mobility Portal for Health Care Centers . . . . . . . . . . . . . Benjamin Wagner vom Berg and Aina Andriamananony

55

Platform Sustainable Last-Mile-Logistics—One for ALL (14ALL) . . . . . Benjamin Wagner vom Berg, Franziska Hanneken, Nico Reiß, Kristian Schopka, Nils Oetjen and Rick Hollmann

67

Scientific Partnership: A Pledge For a New Level of Collaboration Between Scientists and IT Specialists . . . . . . . . . . . . . . . . . . . . . . . . . . . Jens Weismüller and Anton Frank

79

Emission-Based Routing Using the GraphHopper API and OpenStreetMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Engelmann, Paul Schulze and Jochen Wittmann

91

Digitally Enabled Sharing and the Circular Economy: Towards a Framework for Sustainability Assessment . . . . . . . . . . . . . . 105 Maria J. Pouri and Lorenz M. Hilty

vii

viii

Contents

Exploring the System Dynamics of Industrial Symbiosis (IS) with Machine Learning (ML) Techniques—A Framework for a Hybrid-Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Anna Lütje, Martina Willenbacher, Martin Engelmann, Christian Kunisch and Volker Wohlgemuth Graph-Grammars to Specify Dynamic Changes in Topology for Spatial-Temporal Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Jochen Wittmann Online Anomaly Detection in Microbiological Data Sets . . . . . . . . . . . . 149 Leonie Hannig, Lukas Weise and Jochen Wittmann Applying Life Cycle Assessment to Simulation-Based Decision Support: A Swedish Waste Collection Case Study . . . . . . . . . . . . . . . . . 165 Yu Liu, Anna Syberfeldt and Mattias Strand

Assessing the Sustainability of Software Products—A Method Comparison Javier Mancebo, Achim Guldner, Eva Kern, Philipp Kesseler, Sandro Kreten, Felix Garcia, Coral Calero and Stefan Naumann

Abstract As part of Green IT, the field of green software engineering has seen a rise in interest over the past years. Several methods for assessing the energy efficiency of software were devised, which are partially based upon rather different approaches and partially come to similar conclusions. In this paper, we take an in-depth look at two methods for assessing the resource consumption that is induced by software. We describe the methods along a case study, where we measured five sorting algorithms and compared them in terms of similarities, differences and synergies. We show

J. Mancebo · F. Garcia · C. Calero Institute of Technology and Information Systems, University of Castilla-La Mancha Ciudad Real, Ciudad Real, Spain e-mail: [email protected] F. Garcia e-mail: [email protected] C. Calero e-mail: [email protected] A. Guldner (B) · E. Kern · P. Kesseler · S. Kreten · S. Naumann University of Applied Sciences Trier, Environmental Campus Birkenfeld, Birkenfeld, Germany e-mail: [email protected] URL: http://green-software-engineering.de/en P. Kesseler e-mail: [email protected] URL: http://green-software-engineering.de/en S. Kreten e-mail: [email protected] URL: http://green-software-engineering.de/en S. Naumann e-mail: [email protected] URL: http://green-software-engineering.de/en E. Kern Leuphana University, Lueneburg, Germany e-mail: [email protected]; [email protected] URL: http://green-software-engineering.de/en © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_1

1

2

J. Mancebo et al.

that even though the methods use different measurement approaches (intrusive vs. non-intrusive), the results are indeed comparable and combining the methods can improve the findings. Keywords Green software · Sustainable software · Software energy consumption · Energy measurements

1 Introduction In a study by Huawei Technologies on the total energy consumption of all consumers in 2017, it was predicted that the entire information and communications technology (ICT) will consume around 2,800 TWh of energy in 2025 at best. In the worst case, the consumption could be more than double. By 2018, the energy consumption of ICT had already risen to 1,895 TWh, or around 9% of total global energy consumption [3]. In order to counteract this development, it is reasonable to review the energy and resource consumption of a wide variety of software to provide ICT users and developers with recommendations regarding sustainable software applications. There are different approaches regarding energy measurements, which differ in several aspects. Thus, in this paper, we describe and compare two methods on how to measure the resource consumption of software.

2 Related Work It is possible to identify different tools and techniques for measuring or estimating software energy consumption. They can be classified into two approaches. (i) Software-based approaches, which are easy to adopt and use. However, they give only a vague and global estimation of the power consumption of different components [12]. In contrast, (ii) Hardware-based approaches are more difficult to implement, but yield more accurate results. This is because they use physical energy meters, connected directly to the hardware [13].

2.1 Software-Based Approaches This type of method uses mathematical formulas to estimate the energy consumption of the component under test. One of the best-known tools for estimating consumption is Microsoft’s Joulemeter.1 It is a software tool for estimating the energy 1 cf. https://www.microsoft.com/en-us/research/project/joulemeter-computational-energy-measur ement-and-optimization/ [2019-04-25].

Assessing the Sustainability of Software Products—A Method Comparison

3

consumption of the hardware resources used to execute a software application on a given PC. Joulemeter uses mathematical models to estimate the consumption on the basis of the information obtained from resources such as CPU usage, monitor brightness, etc. [10, 12]. Another tool that works in a similar way, estimating the energy consumption of all CPU-intensive processes is Intel Power Gadget (IPG) 3.0 [16]. In [1], the authors propose a tool called TEEC (Tool to Estimate Energy Consumption), to estimate the power consumption of a given software at run time by taking into account CPU, memory, and hard disk power consumption. Green Tracker is a similar tool [2], which calculates the energy consumption of the software by estimating its CPU usage. Unlike the previously presented tools, Green Tracker requires an external device to register the energy consumption and CPU usage. Jalen [18] uses mathematical models to estimate the software consumption, dividing this amount by the hardware resources used. It was created to control the energy consumption of software applications, taking into account the granularity of the code. In an earlier work that originated at the Environmental Campus Birkenfeld, powerstat2 was used to estimate the energy consumption of a system and the results were compared to the results from a hardware-based approach [4].

2.2 Hardware-Based Approaches Hardware-based approaches use a hardware device to measure the power consumption of a specific component or overall system. There are different hardware tools that provide accurate measurements of energy consumption, such as the WattsUpPro power meter.3 In [21], the authors indicate that this device makes it possible to assess the overall consumption of a computer, providing only a measurement error rate of 1.5%. Another tool is PowerPack [9], which consists of several hardware components such as sensors or meters that allow direct measurement of power. It also includes software components that control the creation of power profiles and the synchronization of codes. GreenHPC [20] is a framework for measuring energy consumption on High Performance Computing (HPC) environments. It has three components: a sensor board which is responsible for current sensing; a data acquisition board which collects the sensor board data and the voltage from the power source; and a virtual instrument which is responsible for data processing, visualization, and distributed clock synchronization. Rasid et al. [19] used a Raspberry Pi in their measurement approach accessing the energy impact of algorithm execution using different programming languages (ARM assembly, C, Java). Within their Software Energy Footprint Lab (SEFLab) [7], Hankel et al. [11] present a hardware based approach which allows measuring the energy consumption of components of the motherboard. They refer to other hardware-based approaches for analyzing 2 cf. 3 cf.

https://github.com/ColinIanKing/powerstat [2019-04-19]. https://www.wattsupmeters.com/secure/index.php [2019-04-24].

4

J. Mancebo et al.

the power consumption of software [11], e.g. the PowerScope architecture by [8], addressing the energy consumption of mobile applications.

3 Methods In the following, we describe the two methods that we compared. Method A was developed at the Institute of Technology and Information Systems at the University of Castilla-La Mancha, and Method B was developed at the Institute for Software Systems at the University of Applied Sciences Trier, Environmental Campus Birkenfeld. These methods have been selected for comparison as they both follow a hardware-based approach. Therefore, Method A as well as Method B allows us to obtain accurate measurements of the energy consumed by a software when it is running. In addition to these similarities, there are also some differences between the two methods. The possibility of combining the results obtained by both measurement methods provides us with the best information on the energy consumption.

3.1 Method A The FEETINGS [17] (Framework for Energy Efficiency Testing to Improve eNviromental Goals of the Software) is used to measure the energy consumed by a software product running on a PC, to collect consumption data, and its subsequent visualization and interpretation of the information. This framework is divided into two main components, as shown in Fig. 1. – The Energy Efficient Tester (EET) is the device that has been developed to measure the energy consumption of a set of hardware components used by a software product during its execution. EET is composed of different sensors that support the measurement of three different hardware elements: processor, hard disk, and graphics card. It also includes two sensors that provide both the total power consumption of the PC and the power consumption of the monitor connected to the equipment on which the software being evaluated is running. Once the measurements are completed, they are stored in a removable memory, so that they can be used for analysis. – The Software Energy Assessment (SEA) is responsible for processing the data, the analysis and generation of an appropriate visualization of the data collected by EET. This part is currently under development. In order to carry out the measurements, it is necessary to configure the computer (Device Under Test, DUT) to which EET is connected. The DUT should only include the applications necessary for its operation, such as the operating system, so that there are no other applications running in the background that affect consumption

Assessing the Sustainability of Software Products—A Method Comparison

5

Fig. 1 FEETINGS overview

measurements. In addition, to ensure the consistency of the energy consumption measurements, the measurements will be repeated 30 times for each of the test cases that have been defined, so that the distribution in the sample will tend to be normal.

3.2 Method B This assessment method was mainly developed in the course of two research projects: – The measurement method was first developed by Dick et al. [5], in order to gain insight into the hardware- and energy efficiency of software products. It was devised, based upon ISO/IEC 14756, as described in [6].

6

J. Mancebo et al.

Fig. 2 Setup for measuring the energy consumption of software (cf. [5])

– The method was updated in [15], which focused on the development of a criteria catalog4 to assess the sustainability of software on a broad level. Figure 2 depicts the implemented measurement setup. It mainly consists of four parts: – The System Under Test (SUT), which executes the software that is to be assessed and measures its own hardware usage, – a Workload Generator (WG), which generates the load on the SUT by executing a scenario on it and logging the measurement timestamps, – a Power Meter (PM), which measures the consumed energy of the SUT,5 and – a central Data Aggregator and Evaluator (DAE), which is used to analyze the results of the measurements. To compare the energy and resource consumption of two software products (e.g. word processors, browsers, etc.), we first devise a usage scenario with a duration of approximately 10 min that produces the same result for each software product from the same product group. Depending on the software to be assessed, we use an automation software (e.g. WinAutomation,6 a bash-script or benchmark program) to ensure the scenario is executed in a reproducible and automated manner. We currently also develop an Arduino-based mouse- and keyboard emulator to execute inputs directly through USB on the SUT. 4 The resulting catalog can be accessed at http://green-software-engineering.de/en/kriterienkatalog

[2019-04-16]. we use a Janitza UMG 604. 6 cf. https://www.winautomation.com/ [2019-04-16]. 5 Currently,

Assessing the Sustainability of Software Products—A Method Comparison

7

The SUT is then set up with all the required software installed for the execution of the software product. This includes the operating system, frameworks, packages, etc. Afterwards, we perform a baseline measurement without the software product. Then, the products are successively installed on the SUT and we measure the devised scenarios, at least 30 times to receive a normally distributed sample. To analyze the results with the DAE, we implemented an open source consumption analysis calculator (OSCAR).7 So far, it is only available in German, which is why, for the analysis of the case study presented in Sect. 4 of this paper, we created a separate analysis script in R Markdown for the purpose of this paper. It can be found, together with the measurement results for the algorithms, in the replication package for this paper.8

4 Case Study: Sorting Algorithms To compare both systems, we measured the execution of a Java Program with Oracle Java 8 on a Windows 10 System. Table 1 shows the specifications of both systems. The code was set up to run a loop for 30 times (test runs). Each loop sorted an item array with 50,000 random numbers from 1 to 1,000, using five sorting algorithms, namely bubble sort, cocktail sort, insertion sort, quicksort and mergesort. To increase the quality of the recorded data, the test runs should take between 5 and 10 min. Thus, the sorting algorithms were executed multiple times, so that the execution of every sorting algorithm takes approximately 2 min. Therefore,

Table 1 System specifications Component Research group 1 (Spain) Processor

Mainboard Graphics card Power supply

AMD Athlon 64 X2 Dual Core 5600+ 2,81 GHz 4 × 1 GB DDR2 Seagate barracuda 7200 500 Gb Asus M2N-SLI Deluxe Nvidia XfX 8600 GTS 350 W AopenZ350-08Fc

Operating system Java version

Windows 10 Enterprise Oracle Java 8u201

Memory Hard disk

7 cf.

Research group 2 (Germany) Intel Core 2 Duo E6750 2,66 GHz 4 × 1 GB DDR2 320 GB WD 3200YS-01PBG0 Intel Desktop Board DG33BU Nvidia GeForce 8600 GT 430 W Antec EarthWatts EA-430D Windows 10 Pro Oracle Java 8u201

https://www.oscar.umwelt-campus.de (German only) [2019-04-18], source code available at https://gitlab.umwelt-campus.de/y.becker/oscar-public [2019-04-18]. 8 cf. https://doi.org/10.5281/zenodo.3257517.

8

J. Mancebo et al.

Table 2 Measurement results of the test runs Result Method A Average power per second Average scenario duration Average energy consumption Efficiency factor (sorted items per Joule)

– – – – –

104.565 W 779.347 s 22.631 Wh 18,612.72

Method B 109.610 W 603.846 s 18.386 Wh 22,910.65

Bubble Sort was executed 18 times (900,000 items sorted), Cocktail Sort was executed 30 times (1,5 million items sorted), Insertion Sort was executed 280 times (14 million items sorted), Quicksort was executed 20,000 times (1 billion items sorted) and Mergesort was executed 10,000 times (500 million items sorted).

Between every sorting algorithm there was a break of 10 s and after every loop run there was a break of 60 s. The pauses between the execution of the algorithms was added to allow the SUT to return to its idle state, before starting the next task, in order to capture irregular patterns in the consumption, like CPU ramp-up and RAM allocation. While the loop was running, two log files were generated for further analysis with the power consumption data. In those log files, the starting and ending timestamps of every test run, and every sorting algorithm loop were recorded.

4.1 Results This section presents the results obtained from consumption measurements for the methods described in Sect. 3. Table 2 shows an overview of the results measured with both methods. The efficiency factor is calculated, based upon the metrics proposed in [14], where, in this case, we use the number of sorted items (1.5156 × 109 ) as the “useful work done”: Useful work done Energy efficiency = Used energy As can be seen from Table 1, the execution of the algorithms was carried out on computers with different hardware specifications for Methods A and B. To be able to compare both measurements, it is necessary to check the consumption of the computers when they are in idle mode, running only the operating system (consumption baseline). Table 3 shows the results of the baseline measurements. The mean adjusted baseline energy consumption is calculated by dividing the average energy of the baseline by the average duration of the baseline and multiplying it by the average duration of the measurement:

Assessing the Sustainability of Software Products—A Method Comparison Table 3 Measurement results of baseline Result Method A Average power per second Average scenario duration Average energy consumption Mean adjusted baseline energy consumption

9

Method B

73.395 W 302.910 s 6.176 Wh 15.888 Wh

78.785 W 599.999 s 13.133 Wh 13.217 Wh

Table 4 Mean measurement results for the individual actions from Method A Action Mean power (W) Energy cosumed Efficiency factor Software induced (Wh) ( items energy Joule ) consumption (Wh) Bubble sort Cocktail sort Insertion sort Quick sort Merge sort

96.743 97.497 98.679 97.692 114.075

3.427 3.425 5.903 2.374 3.299

72.95 121.64 658.76 116.987 42.106

0.826 0.847 1.512 0.612 1.176

Table 5 Mean measurement results for the individual actions from Method B Action Mean power (W) Energy consumed Efficiency factor Software induced (Wh) ( items energy Joule ) consumption (Wh) Bubble sort Cocktail sort Insertion sort Quick sort Merge sort

109.689 110.322 116.709 108.469 111.860

3.357 3.473 4.554 2.419 3.404

74.47 119.97 853.95 114.834 40.802

Mean adjusted baseline energy consumption =

0.946 0.992 1.484 0.661 1.005

E Baseline ∗ t Testrun t Baseline

Finally, the adjusted mean reference energy is subtracted from the mean energy of the scenario measurements (the execution of the sorting algorithms), resulting in the additional energy needed to operate the software. Hence, the energy consumption induced by the execution of the sorting algorithms in Method A is 6.742 Wh, and for Method B is 5.169 Wh. In addition, both methods allow for subdividing the scenario to analyze specific parts. In this case, we divided the scenario according to the run time of the sorting algorithms to calculate their efficiency factor. Tables 4 and 5 show the results.

10

J. Mancebo et al.

Fig. 3 Comparison of the software induced energy consumption for both methods

Because of the differences in the hardware used in the approaches, the results cannot be compared directly. However, there is a correlation between the software induced energy consumption of both methods for each algorithm, as can be seen in Fig. 3. Furthermore, the data can also be interpreted. As shown in Table 4, the mean consumption of insertion sort is 98.679 W, with a software induced consumption of 1.512 Wh. If we compare this with the corresponding values of insertion sort from Table 5, it can be seen, that the mean consumption there is higher with 116.709 W, whereas the software induced consumption is significantly lower than in Table 4 with 1.484 Wh. So there is a difference of +18.03 W between the systems A and B considering the mean power and a difference of −0.028 Wh between the software induced consumptions of A and B. If we assume a continuous and permanent use of both systems, there is a difference of 18.03 Wh between the two systems which can be compensated by the difference of the software-induced power consumption. If we now consider the multiple successive execution of insertion sort on both systems, it is noticeable that the difference in the mean power consumption between systems A and B would be consolidated after 644 test runs. For this reason, it can be assumed that even with less resource-efficient hardware, software induced power consumption can be reduced over time. This would require an amortized analysis for each system and algorithm.

Assessing the Sustainability of Software Products—A Method Comparison

11

Table 6 Energy consumption per hardware component obtained by Method A HW component Mean power (W) Energy consumption (Wh) Processor Hard disk Graphics card

4.941 17.106 1.154

1.069 1.520 0.249

Fig. 4 Data from the measurement of the total energy consumption (Method B)

4.2 Remarks Method A also allows us to obtain the power and energy consumed by some hardware components of the computer when a software is running. Table 6 shows the consumption data of the processor, the hard disk and the graphics card during the execution of the sorting algorithms. Method B includes the DAE for analyzing the measurements. It produces several statistical analyses and graphs, like the power measurement data that is visualized in Fig. 4. Here, the gray lines show all measurements and the red line the average power for each second. Method B also includes the measurement and analysis of the hardware usage (CPU load [%], RAM usage [%], HDD activity (reading and writing) [kB], and Network traffic (sending and receiving) [kB]). It comes as little surprise that the results show a correlation between the CPU load and the energy consumption. The software induced CPU load was 45.91%. RAM, HDD and Network usage did not exceed the baseline measurements and thus cannot be ascribed to the scenario.

12

J. Mancebo et al.

5 Comparison and Discussion This section presents the similarities and differences that exist between Method A and Method B.

5.1 Approach Similarities The main similarity between the two methods is that both follow a hardware-based approach for the energy consumption measurements. For this reason, both Method A and Method B allow us to obtain accurate measurements of the energy consumed by a software when it is running. This is one of the most important advantages over other measurement tools that perform consumption estimates. Another similarity of the two methods is that both use a computer where the software to be evaluated is installed and executed. In Method A, this computer is known as the Device Under Test (DUT) and in Method B it is called the System Under Test (SUT). To carry out the analysis of the consumption data, recorded by the measurement hardware (EET, in Method A; PM, in Method B), the two methods use a software application that is capable of processing and analyzing the large amounts of data. In Method A, the software component that collects data, analyses it, and produces a suitable visualization of the results is called Software Energy Assessment (SEA). At the moment, SEA is under development. In a Method B, this system is called Data Aggregator and Evaluator (DAE), which uses an open source software tool to generate the analysis reports. In addition, Method A and B allow a baseline measurement to be performed without any software product running. This baseline is used to obtain the energy consumption induced by the software product. In this way, we can estimate the consumption of the software, excluding the consumption for just running the hardware. Finally, both methods allow analyzing the consumption for the individual actions (the different sorting algorithms in the case study) of the usage scenario.

5.2 Approach Differences Even though the hardware of the SUT/DUT is different, looking at the data recorded during the case study, the measurements seem to be comparable (as can be seen e.g. in Fig. 3 and the detailed results in the replication package). EET, the device used in Method A, is composed of a set of internal sensors, connected to the power supply of the DUT, which are responsible for measuring the power consumed by some hardware components of the DUT. It also includes two external sensors to measure the total energy consumption of the DUT and the

Assessing the Sustainability of Software Products—A Method Comparison

13

monitor. In contrast, Method B uses only an off-the-shelf power meter and combines the energy consumption measurements with software-based measurements of the hardware usage (CPU, RAM, etc.). Thus, the cost to build a usable measurement workstation is different for each method. The development of the measuring device of Method A entails an approximate cost of 100 e. To fit a workstation with the requirements for Method B, the cost is approximately 450 e. With regards to the time involved in building and configuring the measuring workstation, approximately five hours are required for Method A, while Method B can be configured in about one hour. One of the most significant differences is that Method A, in addition to measuring the energy of the whole system, allows us to obtain the measurement of energy from individual parts of the DUT. The hardware components that can be measured are the processor, the hard disk and the graphics card. Additionally, it allows the evaluation of the energy consumed by the monitor. On the other hand, Method B provides information about the hardware utilization when running the scenario. One other thing to emphasize about the measurements is that Method B requires longer measurements, because the power meter already averages the values over one second, while Method A obtains the raw data from the sensors approximately every 10 ms. Regarding the process of carrying out the measurements, Method B proves a bit more flexible, as you can measure the energy consumption of any device, including mobile and IoT devices. Because the reference system can easily be swapped out, is also updated every year, to allow the comparison of the consumption of software over time and hardware systems. Nevertheless, if Method A is used only to assess the power consumption, it can, with some effort, be applied to computers where the power supply can be replaced. Furthermore, Method B has different automation tools to ensure that the usage scenario is executed in a reproducible and automated manner.

5.3 Approach Synergies Despite the difficulty of comparing two measurement methods with different hardware, the use of both measurement methods allows us to corroborate that the measurements made by each method are correct, since the results obtained in the case study are very similar. In addition, the possibility of combining the results obtained by both measurement methods gives us more information on the energy consumption of the software analyzed. In this way it is possible to complement the results obtained. For example, it will be possible to analyze how the percentage of CPU use, extracted with Method B, compares to the specific energy consumption of the processor, measured by Method A.

14

J. Mancebo et al.

6 Conclusion and Outlook Software technology plays an important role in global energy consumption. When software is running, it causes energy consumption of the hardware, on which it is executed. With this in mind, building tools that are able to measure energy consumption accurately becomes increasingly important. This is because a small change in the consumption of a software can cause a big increase if we consider the number of times that software is executed. In this paper, two methods of consumption measurement with a hardware-based approach have been presented and compared. Both methods make it possible to obtain accurate measurements of the power consumption of a software when it is executed on a particular computer. Moreover, each of the methods has its own characteristics. Method A, also called FEETINGS, provides information on the power consumption of some individual hardware components, while Method B records information about the hardware utilization. In order to compare the two methods, a Java Program running different sorting algorithms was measured with each one. The results obtained from the case studies reveal that, despite the fact that each method has different characteristics, the measurements of the total energy consumption of the DUT/SUT are very similar, which encourages us to think that both are correct. It also allows us to conclude that both are complementary, so using both methods we can obtain more information about the energy efficiency of the software. As future work, we will continue to perform more measurements of software energy consumption, with the aim of ensuring compatibility between the two methods. In addition, we will study how to perform joint measurements in order to obtain more detailed results on the energy consumption of a software when it is executed. Furthermore, it may be interesting to extend the study with new different measurement methods, which follow the hardware-based approach in the future. Acknowledgements This work was, in part, supported by the Spanish Ministry of Economy, Industry and Competitiveness and European FEDER funds through the GINSENG-UCLM (TIN201570259-C2-1-R) projects. It is also part of the SOS project (SBPLY/17/180501/000364), funded by the Department of Education, Culture and Sports of the Directorate General of Universities, Research and Innovation of the JCCM (Regional Government of the Autonomous Region of Castilla La Mancha).

References 1. Acar, H., Alptekin, G., Gelas, J.-P., Ghodous P.: TEEC: improving power consumption estimation of software. In: EnviroInfo 2016 (2016) 2. Amsel, N., Tomlinson, B.: Green tracker: a tool for estimating the energy consumption of software. In: Extended abstracts on human factors in computing systems, CHI’10, pp. 3337– 3342. ACM (2010) 3. Andrae, A.: Total consumer power consumption forecast. In: Nordic Digital Business Summit, vol. 10 (2017)

Assessing the Sustainability of Software Products—A Method Comparison

15

4. Becker, Y., Naumann, S.: Software based estimation of software induced energy dissipation with powerstat. In: From Science to Society: The Bridge provided by Environmental Informatics, pp. 69–73. Shaker Verlag (2017) 5. Dick, M., Kern, E., Drangmeister, J., Naumann, S., Johann T.: Measurement and rating of software induced energy consumption of desktop PCs and servers. In: Pillmann, W., Schade, S., Smits, P. (eds.) Innovations in Sharing Environmental Observations and Information: Proceedings of the 25th International Conference on Environmental Informatics October 2011, Ispra, Italy, vol. 57, pp. 290–299. Shaker Verlag (2011) 6. Dirlewanger, W.: Measurement and Rating of Computer Systems Performance and of Software Efficiency. Kassel University Press, Kassel (2006) 7. Ferreira, M.A., Hoekstra, E., Merkus, B., Visser, B., Visser, J.: Seflab: a lab for measuring software energy footprints. In: 2013 2nd International Workshop on Green and Sustainable Software (GREENS), pp. 30–37. IEEE (2013) 8. Flinn J., Satyanarayanan, M.: Powerscope: a tool for profiling the energy usage of mobile applications. In: Proceedings of Second IEEE Workshop on Mobile Computing Systems and Applications, WMCSA’99, pp. 2–10. IEEE (1999) 9. Ge, R., Feng, X., Song, S., Chang, H., Li, D., Cameron, K.W.: Powerpack: energy profiling and analysis of high-performance systems and applications. IEEE Trans. Parallel Distrib. Syst. 21(5), 658–671 (2010) 10. Godboley, S., Panda, S., Dutta, A., Mohapatra, D.P.: An automated analysis of the branch coverage and energy consumption using concolic testing. Arab. J. Sci. Eng. 42(2), 619–637 (2017) 11. Hankel, A., van den Hoed, R., Hoekstra, E., van Rijswijk, R.: Measuring software energy efficiency: presenting a methodology and case study on DNS resolvers. In: 2016 18th Mediterranean Electrotechnical Conference (MELECON), pp. 1–6. IEEE (2016) 12. Jagroep, E., van der Werf, J.M.E., Jansen, S., Ferreira, M., Visser, J.: Profiling energy profilers. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 2198–2203. ACM (2015) 13. Jagroep, E.A., van der Werf, J.M., Brinkkemper, S., Procaccianti, G., Lago, P., Blom, L., van Vliet, R.: Software energy profiling: comparing releases of a software product. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp. 523–532. ACM (2016) 14. Johann, T., Dick, M., Kern, E., Naumann, S.: How to measure energy-efficiency of software: metrics and measurement results. In: Proceedings of the First International Workshop on Green and Sustainable Software (GREENS), 3 June 2012, Zurich, Switzerland, pp. 51–54. IEEE (2012) 15. Kern, E., Hilty, L.M., Guldner, A., Maksimov, Y.V., Filler, A., Grger, J., Naumann, S.: Sustainable software productstowards assessment criteria for resource and energy efficiency. Future Gener. Comput. Syst. 86, 199–210 (2018) 16. Kim, S.-W., Lee, J.J.-S., Dugar, V., De Vega, J.: Intel® power gadget. Intel Corporation, 7 (2014) 17. Mancebo, J., Arriaga, H.O., García, F., Moraga, M., de Guzmán, I.G.-R., Calero, C.: EET: a device to support the measurement of software consumption. In: Proceedings of the 6th International Workshop on Green and Sustainable Software, pp. 16–22. ACM (2018) 18. Noureddine, A., Rouvoy, R., Seinturier, L.: A review of energy measurement approaches. ACM SIGOPS Oper. Syst. Rev. 47(3), 42–49 (2013) 19. Rashid, M., Ardito, L., Torchiano, M.: Energy consumption analysis of algorithms implementations. In: 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–4. IEEE (2015) 20. Rostirolla, G., da Rosa Righi, R., Rodrigues, V.F., Velho, P., Padoin, E.L.: GreenHPC: a novel framework to measure energy consumption on HPC applications. In: 2015 Sustainable Internet and ICT for Sustainability (SustainIT), pp. 1–8. IEEE (2015) 21. Xu, Z., Tu, Y.-C., Wang, X.: PET: reducing database energy cost via query optimization. Proc. VLDB Endow. 5(12), 1954–1957 (2012)

Estimate of the Number of People Walking Home After Compliance with Metropolitan Tokyo Ordinance on Measures Concerning Stranded Persons Toshihiro Osaragi, Tokihiko Hamada and Maki Kishimoto Abstract The Metropolitan Tokyo Ordinance on Measures Concerning Stranded Persons (hereafter, the Tokyo Metropolitan Government Ordinance (TMGO)) was enacted in April, 2013 as part of efforts to encourage employers and schools to keep their employees, students, or young children in place, rather than freeing them set out for home when public transportation has been paralyzed by a major earthquake or other disaster. However, whether a person decides to return home or not depends on his or her individual attributes and other factors such as the presence of family members at home, and it is uncertain whether the TMGO will actually be effective in reducing the number of people attempting to return home. Therefore, a survey on individual courses of action after a major earthquake was conducted. This data was employed to construct a probabilistic model for predicting whether an individual will “try to walk home”, “try to return to, or stay in, his/her workplace or school”, or “try to take some other action”, based on the individual’s attributes and the walking distances. This model was applied to person-trip survey data in the Tokyo metropolitan area to estimate the numbers of people walking home after a major earthquake, after which a quantitative prediction of the effectiveness of the TMGO on attempts to return home was made. Keywords Ordinance covering measures for stranded persons · Questionnaire survey · Intention of returning home · Nested logit model

1 Introduction Just after the Great East Japan Earthquake (the Pacific coast of Tohoku Earthquake, March 11, 2011), large crowds of workers left their workplaces and set out for home on foot, causing turmoil in the vicinity of railroad stations and on highways and other routes. When significant numbers of people in a city begin returning home all at once T. Osaragi (B) · T. Hamada · M. Kishimoto Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8550, Japan e-mail: [email protected] M. Kishimoto e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_2

17

18

T. Osaragi et al.

just after a major earthquake, not only do they expose themselves to the dangers of fires, building collapses, or crowd disasters, such crowds are apt to pose obstacles to high priority vehicles on their way to fires and other emergencies. Accordingly, the Tokyo Metropolitan Government has released the Metropolitan Tokyo Ordinance on Measures Concerning Stranded Persons [1] (hereafter, the Tokyo Metropolitan Government Ordinance (TMGO) published April 1, 2013), which was created using the lessons learned during and in the aftermath of the Great East Japan Earthquake. “Stranded person” is the person who are unable to return home due to the interruption of public transportation, traffic congestion, and other problems on the same day after the disaster occurs. The TMGO obligates employers, schools and kindergartens to make efforts to keep their employees, students, or young children in place at times when public transportation has been paralyzed by a major earthquake, rather than allowing them to set out for home immediately (Fig. 1). They are instructed to stand by for 72 h in safe facilities, since these 72 h are quite important for the rescue and firefighting activities. It is intended that many people staying on the streets do not interfere with rescue operations or firefighting activities in the aftermath of a large earthquake. However, a person’s decision whether or not to set out for home will depend on numerous factors, including his/her commute attributes, access to information about the safety of family members, and the extent and types of damage to the city. Thus, in order to estimate the effectuality of the TMGO, i.e., the extent to which it will be followed or disregarded by people who insist on attempting to walk home, it is necessary to make quantitative predictions of people’s specific actions in the event of a disaster that are based on theoretical factors.

Fig. 1 An ordinance for the comprehensive promotion of measures concerning stranded people has been enacted by the TMG (2012)

Estimate of the Number of People Walking Home After Compliance …

19

For the present study, a survey was conducted of people who commute to work or educational institutions in central Tokyo to ask what they would do in the immediate aftermath of a major earthquake, and that data was used to construct a probabilistic model predicting commuters’ movements after a major earthquake. This model was used together with person-trip survey data (PT data) collected in 2008 in the Tokyo metropolitan area to estimate the number of people walking home after a major earthquake. The effectiveness of the TMGO was then examined by analyzing those factors with respect to the time of the earthquake occurrence and the attributes of the individuals surveyed.

2 Survey on Individual Course of Action After Major Earthquake 2.1 Comparison with Previous Surveys and Research Previous studies [2, 3] have presented observations based on surveys of people about their intentions with respect to going home after a major earthquake. The data obtained in those surveys have also been used in another study [4] to model the intentions of people in or traveling through Tokyo in a simulation of the two groups’ actions after the occurrence of an earthquake directly below the capital. The above studies were quite advanced, predicting the numbers of stranded persons and the turmoil on highways and other routes immediately after the Great East Japan Earthquake. However, even though those surveys [2, 3] were created on the basis of written and photographic records of the immediate aftermath of the Great Hanshin-Awaji Earthquake, they were answered by people who had not yet personally experienced the confusion of those disasters. This fact that limits the trustworthiness of the obtained results. In contrast, the surveys by Nakabayashi [5, 6] and Hiroi et al. [7–11] were intended for respondents who had actually gone through major earthquakes because it is believed that the data provided from these surveys has more credibility than that from respondents with no such experience who were simply asked to imagine themselves surrounded by a damaged city and panic-stricken refugees. In the present study, a survey was conducted in which the respondents were residents of the Tokyo metropolitan area who had experienced the Great East Japan Earthquake. They were asked to assume an earthquake had occurred directly below the capital. The results obtained were then used to examine the intentions of typical people who will be in or traveling through Tokyo in such a situation. In addition, we constructed a model of conditions before and after the TMGO showing homewardbound human traffic flow conditions, while assuming that systems had been created in accordance with the TMGO to encourage people to remain in place. The efficacy of the TMGO in suppressing the number of people walking home was then examined.

20

T. Osaragi et al.

2.2 Outline of Survey The survey was conducted online [12, 13]. Figure 2a, presents a summary of the survey. First, the respondents were asked about their intentions with respect to movements (where to go) after a major earthquake, assuming they were inside their building (work or school) after a major earthquake (time assumed: 09:00, 14:00 and 18:00). In order to encourage respondents to visualize the scenario as specifically as possible, we selected 18:00 in the winter, which is well after sunset. Results for that time must be interpreted for the appropriate post-sunset times when applying this model to the summer. Three conditions were assumed with the combinations of factors involving whether the facility had instituted a system to encourage people to stay at work or school in accordance with the TMGO and the availability of information (Conditions (1), (2) or (3) in Fig. 2b. The respondents’ intentions are shown in Fig. 2d. The respondents were also asked about their intentions should a major earthquake occur while they were commuting to work or school (Fig. 2c)).

Fig. 2 Outline of survey

Estimate of the Number of People Walking Home After Compliance …

21

Fig. 3 The information of the most significant influence on respondents’ decisions after a major earthquake

2.3 Information Affecting Decisions to Stay or Leave The information that had the most significant influence on respondents’ decisions after a major earthquake was found to relate to the presence or absence of family members at home. Figure 3 shows these results. About 80% of the respondents who lived with family members reported that “information about the safety of the family members” was the most influential factor for them. Meanwhile, respondents who did not live with family members said that the most influential information concerned the safety conditions in their home neighborhoods and along their intended return route.

2.4 Decrease in Frequency of Intention to Return Home with Distance and Time Earthquake Occurrence A person’s intention to return home would be expected to vary significantly with the distance that must be covered on foot from the place that one would otherwise stay after a major earthquake. Respondents classified by their distance to home who had said they would “try to walk home” (the fraction of the population indicating the intention to return home: PFh) were separated into those with or without family members at home, and then further divided according to the time of earthquake occurrence (Fig. 4, upper portion). The reader can see that PFh increased as the distance from home decreased and with earlier occurrence times. Additionally, this increase was seen in a higher fraction of respondents with family members at home than the fraction of those without. Next, we examined the variation in PFh with Conditions (1)–(3) as described in Sect. 2.2 using Fig. 2b (Fig. 4, lower portion). In situations where a system complying with TMGO had been established, it was found that PFh could be reduced by about 20% among those living with family members. It was also found that a further reduction was possible when the respondents could confirm that their family members were safe. In such conditions, PFh reduced by about 30%. In contrast, although the

22

T. Osaragi et al.

Fig. 4 Decrease in frequency of intention to return home with distance and time earthquake occurrence

PFh of persons with no family members at home could be reduced by about the same high amount as those with family members by the use of a system complying with TMGO, when information indicated it was safe in their home neighborhoods, and once they learned that it would be safe to go home along their intended routes, PFh rose back to about the same level as if no system had been established. Thus, it is clear that even though the TMGO can be expected to have some effect in reducing the number of refugees attempting to walk home, its effectiveness might be impeded by the content of the available information.

3 Construction of Motion/Action Model and Predictions 3.1 Motion/Action Model In a previous study [4], we attempted to describe PFh, whose intentions varied greatly with attributes and conditions, using a neural network model (perceptron). Here, people’s actions after a major earthquake were described with a logit model, with the goal not only of describing PFh, but also of explaining the rationale for such decisions. The logit model was also used for this analysis. First, we assume that commuters react in one of three broadly defined ways immediately after the earthquake strikes, (a) returning home, (b) continuing to or remaining

Estimate of the Number of People Walking Home After Compliance …

23

Fig. 5 Motion/Action model of commuters’ reaction immediately after the earthquake strikes

at work or school, or (c) setting out for somewhere else. These are considered “deliberate actions”. Such decisions are then described using the logit model (motion/action model), which has a good record of describing deliberate actions. There are two kinds of logit models, the multinomial logit model, which processes all the selections simultaneously in parallel, and the nested logit model, which assumes that selection behavior proceeds in a hierarchical manner. Here, we analyzed the survey data using the three models shown in Fig. 5 to determine which is most appropriate for the present study.

3.2 Model Predictions Model estimates were produced using the explanatory variables shown in Tables 1 and 2, which were expected to influence selection behavior. Model 3 was judged to be the most plausible since its estimated parameters, in addition to model compatibility and other considerations, were the most theoretically consistent (Table 1). For example, this result shows that when a major earthquake occurs while the respondent is at work or at school, he/she first determines whether or not to stay at work or at school. Table 1 Estimation result of the motion/action model

(a) Fitness of models Likelihood ratio (The degree of freedom adjusted) Level 1

Hit rate (%)

Level 2

Model 1

0.237

63.0

Model 2

0.243

0.162

63.0

Model 3

0.344

0.228

81.0

24

T. Osaragi et al.

Table 2 Explanatory variables and estimated parameter values of the motion/action model (b) Estimated parameters of Model 3 Behavior

Explanatory variables

Note

Estimates

t-value

Continuing to or remaining at work or school

β1

Walking distance to work/school

Real number (km)

−0.08

−42.07

β2

Living with family member(s)

Yes = 1, No = 0

−0.12

−5.93

β3

Living with family member(s) 12 years or younger

Yes = 1, No = 0

−0.15

−5.16

β

Sex

Female = 1, Male = 0

−0.30

−15.50

Age class

20 s

−0.28

−7.39

β6

30 s

−0.16

−3.51

β7

40 s

−0.20

−4.42

β8

50 s or over

−0.21

−4.48

Permanent employee

0.18

4.58

Temporary employee

−0.12

−3.07

Work/school at 14:00

−0.27

−7.85

Work/school at 18:00

−0.14

−3.93

Commuting to work/school

−1.16

−35.12

Commuting home

−1.27

−38.01

Instructions to “do not return to immediately home”

0.32

14.30

Information affecting the return behavior

0.09

4.11

−0.08

−69.78

4

β5

β9

Professional status

β 10 β 11 β 12 β

Present location and time at occurrence of earthquake

13

β 14 β 15

Conditions at workplace/school

β 16 Returning home

β 17

Walking distance home

Real number (km)

β 18

Living with family member(s)

Yes = 1, No = 0

0.62

20.20

β 19

Living with family member(s) 12 years or younger

Yes = 1, No = 0

0.40

9.43

β 20

Sex

Female = 1, Male = 0

−0.44

−16.20

β 21

Age class

20 s

0.33

6.17

30 s

0.60

9.82

β 22

(continued)

Estimate of the Number of People Walking Home After Compliance …

25

Table 2 (continued) (b) Estimated parameters of Model 3 Behavior

Explanatory variables

Note

β 23

40 s

0.88

14.16

β 24

50 s or over

0.65

10.28

Permanent employee

0.39

7.99

β 25

Professional status

β 26 β 27 β 28 β 29

Present location and time at occurrence of earthquake

β 30 β 31

Conditions at workplace/school

β 32 Others

Estimates

t-value

Temporary employee

0.17

3.23

Work/school at 14:00

−0.60

−8.11

Work/school at 18:00

−1.07

−14.23

Commuting to work/school

−1.47

−24.61

Commuting home

−1.54

−25.79

Instructions to “do not return to immediately home”

−0.15

−4.65

Information affecting the return behavior

−0.05

−1.52

−2.65

−33.88

0.64

47.24

β 33

Dummy variable

Other utility

λ

Composite variable

Log-sum coefficient

Then, if he/she decides not to stay, he/she determines whether to leave for home or for another destination. The lower portion of Table 2 lists the parameter values estimated by Model 3, which was the best of the three models. The parameters “Walking distance home” and “Walking distance to work/school” show negative values. This expresses the tendency of increasing difficulty in getting home, or to work or school, as walking distances increase. The parameters “Living with family member(s)” and “Living with family member(s) 12 years or younger” have a positive value for (a) the action of leaving for home, thus indicating a high potential for people living with family members, especially with younger children, to attempt to walk home. Correspondingly, these parameters have a negative value for (b) the actions of attempting to walk to work or school (or to stay there). The presence of family members reduced the potential of a person to go to or to stay at work or school. These parameters also had negative values cases involving some other action, thus showing that the respondents were relatively less likely to take this action compared to attempting to walk to work or school (or to stay there) or to leave for home. The final likelihood ratios after adjusting the degrees of freedom were all in the range 0.228–0.344. These were not particularly high, but the hit rate was 81.0%, and the t-values of the estimated parameters were all high enough to indicate statistical significance.

26

T. Osaragi et al.

3.3 Structure of Decision-Making Discerned from Motion/Action Model Figure 6 shows how PFh varied with the homeward walking distance, as estimated by the motion/action model, while also accounting for the influence of various other factors. First of all, examining the typical respondent in the left portion of Fig. 6 (Example: Earthquake at 09:00; man in his thirties; permanent employee; family including child 12 or under at home), the reader will note that the sample respondent was less and less likely to express the desire to return home in the order Condition (1) (no system complying with TMGO, no information), Condition (2) (system complying with TMGO, no information), and Condition (3) (system complying with TMGO, information available). In other words, the establishment of a system complying with TMGO and the availability of information about the safety of family members (information that the family is safe) tend to suppress the desire to go home. The reader is also directed to the center portion of Fig. 6, which shows a lower PFh if the earthquake occurs at 18:00 than at 09:00 or 14:00. It was assumed in this survey that a major electric power outage would occur, so this result probably reflects a reluctance to walk home in the dark. Turning to the right portion of Fig. 6, the reader can see how PFh varied with age for a typical woman with the same attributes. Thus, a respondent’s decision on whether or not to return home varied significantly with attributes including age and sex, as well as the earthquake occurrence time, the existence of an established system for keeping employees at work/school, and information about the safety of family members. Next, Fig. 7 shows the fraction returning home, namely, the probability for respondents intending to turn around and return home after an earthquake strikes while they are commuting to work or school (going trip), or to continue home if it strikes while they are commuting home (returning trip), as it varied with the walking distance from their current location to home (D1 ) and to work or school (D2 ). It also shows the ratio between the probability of deciding to continue to work or school versus the probability of deciding to return to work or school, versus D1 and D2 (Example:

Fig. 6 Examples of the estimated fraction of the population indicating the intention to return home (PFh)

Estimate of the Number of People Walking Home After Compliance …

27

Fig. 7 Probability for respondents intending to turn around and return home after an earthquake strikes while they are commuting to work or school, or to continue home if it strikes while they are commuting home

28

T. Osaragi et al.

Earthquake at 09:00; man in his thirties; permanent employee; family including child 12 or under at home). Using the fraction of respondents wishing to go home if the earthquake strikes while they en route from home (Fig. 7, upper portion), under Condition (1) [no system complying with TMGO, no information], when D1 < 20 km, this fraction was 80–100% of respondents, comparatively high. However, PFh decreased under Conditions (2) (system complying with TMGO, no information] and (3) (system complying with TMGO, information available). Thus, this model expresses how the creation of a system for restraining the entire population from attempting to return home, and how the provision of information about the safety of family members actually does suppress PFh. Additionally, the opposite trend appears among people who intend to continue to work or school (2nd row in Fig. 7). The fraction of such respondents increased in the order Condition (1) (no system complying with TMGO, no information), Condition (2) (system complying with TMGO, no information), and Condition (3) (system complying with TMGO, information available). Next, we calculated the difference between the ratios while respondents were commuting to work or school (en route) and while they were commuting home (returning) in order to compare these cases (Fig. 7, lower 2 rows). First of all, examining the difference in PFh, the reader can see that this was actually negative for small values of D1 and D2 , while the intention to return home was higher while commuting home (returning). However, a positive value resulted when D1 and D2 both had high values, and this value was high during (the early part of) the commute to work or school. This may be attributable to the respondents’ considering the amount of time they have to get home before nightfall. Turning to the fraction who said they would continue their commute (Fig. 7, bottom row), this response was always positive in value, but the reader can see that it was more likely while the respondent was commuting to work or school (en route) than while they were commuting home (returning). This tendency was strongest when D1 was high and D2 was low, and weakened as D1 decreased or D2 increased. This indicates that people are more likely to continue their commute while en route to work/school than on their return trip if their homes are far away and they are still close to their work or school when an earthquake strikes.

4 Estimation of Numbers of People Classified by Behavior and Personal Situations at the Time of Earthquake 4.1 Personal Situations and Numbers of People in or Traveling Through Tokyo Below, the personal situations of the respondents in or traveling through Tokyo immediately before the assumed time of earthquake are grouped into six following classifications: (1) In their work or school building; (2) traveling from home to work or school (en route); (3) traveling home from work or school (returning); (4) temporarily outside their employer or school building; (5) away from home

Estimate of the Number of People Walking Home After Compliance …

29

to shop or eat; or (6) at home. People in classification (6) are omitted from consideration here as their actions have no direct relationship to activities involved in walking home. The region used for PT data analysis was the Tokyo metropolitan area consisting of Tokyo and four neighboring prefectures (Kanagawa, Chiba, Saitama, and a part of Ibaraki). The spatiotemporal distribution of people in or traveling through Tokyo, under conditions (1)–(5) above, was estimated. The unit for tabulation in the PT data was a delineated space about 10 city blocks in size (small zone). Intersections were selected at random from the small zones enclosing the respondent’s home and work or school location and those intersections were designated as their home and work or school location, respectively. Respondent’s location when the earthquake strikes was also designated as an intersection by the same method. Distances home and distances to work/school were calculated using road network distances. Since the actual locations of people on railways cannot be ascertained using the information in the PT data, we used the procedure outlined in Osaragi [14] to estimate the numbers and locations of people in transit on railroad lines, walking inside stations, and walking between stations.

4.2 Numbers of People as Classified by Action The population distribution data classified by personal situations and the motion/action model were employed to estimate the populations reacting in different ways to the occurrence of an earthquake (Condition (2), system complying with TMGO, no information). Predicted numbers are given for occurrences at 09:00, 14:00 and 18:00 in the motion/action model, but the reactions to an earthquake striking at other times during the day were estimated by interpolating between those values. No details about actions taken after an earthquake by individuals who are away from home to shop or eat (classification (5)) are available, as these were not collected in this survey. It was assumed here, therefore, that they would make decisions following the same pattern as those in classification (4) (temporarily outside their work or school building). However, since people who are away from home to shop or eat are not generally in a position to take refuge at an office or educational institution, they have only two options, (a) return home, or (c) set out for somewhere else, so the motion/action model was then applied based on that assumption. Figure 8 shows the results of this analysis. Now, let’s turn to an examination of the reactions of people in or traveling through Tokyo who elect to stay or leave in the wake of an earthquake striking during the daytime on a weekday. Classifications (1) through (5) are combined in the upper left chart of Fig. 8. Here, the reader can observe that over 10 million people are predicted to leave for home rather than stay at work or school. Examining further by personal situations, we find that although over half those in classification (1) (in a work or school building) would stay where they are, a far greater fraction of those in classification (4) (temporarily outside their work or school building) would

30

T. Osaragi et al.

Fig. 8 Predicted numbers by the motion/action model according to occurrences time of earthquake

opt to go home rather than return to work or school. Turning next to classification (2) (traveling from home to work or school), which peaks around 08:00, many of these commuters would return home. However, this decision is dependent on the distance from the individual’s present location to work/school at the time of the earthquake, and the reader can see that a certain number of commuters would continue to work or school. This trend also appears in classification (3) (traveling home from work or school), which peaks at about 19:00. It is quite likely that those who attempt to return to work or school will find themselves moving in the opposite direction to those walking home from central Tokyo in the directions of the suburbs. They will also be in the precarious situation of forcing their way “upstream” against a much larger crowd. Additionally, those in classifications (2) and (3) (traveling from home to work or school, and vice versa) showed relatively higher fractions of people deciding to “try to take some other action”, as the distances home or to work or school were often long. Furthermore, those in classification (5) (people who are away from home to shop or eat) are not generally in a position to take refuge, which makes it easier for them to decide to go home. If an earthquake strikes in the middle of the day, their

Estimate of the Number of People Walking Home After Compliance …

31

number may reach close to 4 million. More detailed analyses will be needed in the future, as the survey did not directly address the intentions of this segment of the population, and which has not been considered in the TMGO.

4.3 Reductions in the Number of People Walking Home by the System for Encouraging People to Stay at Work/School The reduction in the number of people walking home due to the system for encouraging people to stay at work or school was estimated by calculating the numbers of persons doing so in the presence or absence of such a system. This calculation was carried out for various clock times. The results are given in Fig. 9. The reader can see that the number of people in classification (1) (in the work or school building) was reduced by about 20% (about 1,350,000) of those walking home at the peak period. In contrast, those in classification (4) (temporarily outside

Fig. 9 Reductions in the number of people walking home by the system for encouraging people to stay at work/school

32

T. Osaragi et al.

their work or school building) were relatively unaffected by the existence of a system encouraging people to stay because their numbers were reduced by only about 10% (about 50,000). Deductions in classifications (2) (traveling from home to work or school) and (3) (traveling home from work or school) also remained low at around 10% (about 250,000 and about 170,000, respectively). There was very little reduction in numbers of classification (5) (away from home to shop or eat), since they were not in a position to take refuge, only 2% (about 70,000). Thus, the TMGO alone will probably not be sufficient to deal with people who are in or traveling through Tokyo to shop or eat, so this must be discussed in further planning. Shopping districts are congested with crowds fitting classification (5) (away from home to shop or eat) on weekends and holidays, and strategies for their welfare must be discussed [15].

4.4 Number of People Walking Home, Considered by Attribute Figure 10 shows the calculated numbers of people walking home, classified by attributes (sex, professional status) and by walking distance home. First, comparing men and women (uppermost row in Fig. 10), we find that men have relatively long trips home. The lowest row in Fig. 10 shows the fraction by which the numbers of men and women were reduced by the presence of a system encouraging them to stay at work or school (reduced fraction). The reduced fraction tended to increase with the walking distance home. Next, considering professional status, although permanent employees had longer distances to travel home than temporary employees, there was no marked difference in the reduced fraction for these groups. Students showed reduced fractions that were several points higher than those of permanent or temporary employees. From the above results, we find that even when systems have been established to encourage people to stay at work or school, this appears to influence only a relatively low number of men and permanent employees to do so, and many may yet decide to walk a long distance home.

5 Summary and Conclusions In the present study, a probabilistic model was constructed to predict where people in or traveling through Tokyo will go when the public transportation is paralyzed by a major earthquake. The numbers of people walking home or deciding to stay at work were estimated and the effectiveness of the TMGO for reducing the number of people attempting to return home was examined. It was found that the TMGO can be expected to be effective in reducing the number of those attempting to walk home among the groups who were the main target of the ordinance, those who are in their

Estimate of the Number of People Walking Home After Compliance …

33

Fig. 10 Estimated numbers of people walking home, classified by attributes (sex, professional status) and by walking distance home

34

T. Osaragi et al.

work or school building. However, many of those who are outside their work or school buildings, or who are away from home to shop or eat, are expected to walk home, and the TMGO may have only a limited influence on their actions. Also, the survey and the simulation results show that if family members can’t confirm safety each other, it results in more serious situations and confusion. It is, therefore, necessary to build efficient and effective countermeasures and technologies that enable family members to mutually confirm safety. In this paper, we performed simulations by combining some models that describe various human behaviors in the aftermath of a large earthquake. However, there is no large-scale dataset that records human behavior in the aftermath of a major earthquake. It is, therefore, difficult to validate the results estimated by the proposed model at this moment. We consider this is one of our important future work to be examined. Currently, we are attempting a simulation that incorporates a motion/action model in order to analyze the movements of people walking home, and are examining the number and spatial distribution of temporary refuge centers and aid stations to support people walking home. We will verify the consistency between the motion/action model based on the activities of actual people who walked home after the Great East Japan Earthquake and the survey data used for the present study. Acknowledgements This paper is a part of the research outcomes funded by JSPS KAKENHI Grant Number JP17H00843. A portion of this paper was published in the Journal of Architectural Planning and Engineering (Architectural Institute of Japan), in an article entitled “Estimating the Number of People Returning Home on Foot under the Tokyo Metropolitan Government Ordinance” [16].

References 1. Tokyo Metropolitan government: Tokyo Metropolitan Government Disaster Prevention Guide Book. https://www.bousai.metro.tokyo.lg.jp/_res/common/guidbook_pocketguide/2019guid_ e.pdf. Accessed 21 June 2019 2. Tanaka, S., Osaragi, T.: Willingness of returning home when a big earthquake occurred; Data from questionnaire, Summaries of Technical Papers of Annual Meeting Architectural Institute of Japan, F-1, pp. 591–592 (2007) 3. Tanaka, S., Osaragi, T.: Modeling of decision making on returning home after devastatingearthquakes. In: Papers and Proceedings of the Geographic Information Systems Association, vol. 16, pp. 183–186 (2007) 4. Osaragi, T.: Modeling a spatiotemporal distribution of stranded people returning home on foot in the aftermath of a large-scale earthquake. Nat Hazards 68(3), 1385–1398. Springer, Dordrecht (2012) 5. Nakayashi, I.: Study for Figuration of damage in a metropolis during an earthquake disaster: (4) Estimation of difficulty to arrive home from each office, Summaries of Technical Papers of Annual Meeting Architectural Institute of Japan. Urban planning, building economics and housing problems, history and theory of architecture, pp. 361–362 (1985) 6. Nakabayashi, I.: Development of estimation method on obstructed homeward commuters after earthquake disaster. Compr. Urban Stud. (47), 35–75 (1992)

Estimate of the Number of People Walking Home After Compliance …

35

7. Hiroi, U., Yamada, T.: Questionnaire survey concerning stranded commuters in Metropolitan area in the East Japan great earthquake: (Part 1) Stranded commuter’s behavior after the earthquake, Summaries of Technical Papers of Annual Meeting Architectural Institute of Japan, F-1, pp. 873–874 (2011) 8. Yamada, T., Hiroi, U.: Questionnaire survey concerning stranded commuters in Metropolitan area in the East Japan great earthquake: (Part 2) Information for decision making, Summaries of Technical Papers of Annual Meeting Architectural Institute of Japan, F-1, pp. 875–876 (2011) 9. Hiroi, U.: Questionnaire survey concerning stranded commuters in metropolitan area in the East Japan great earthquake. The fall national conference of Operations Research Society of Japan, abstracts, 14–15 (2011). (in Japanese) 10. Hiroi, U., et al.: Questionnaire survey concerning stranded commuters in metropolitan area in the East Japan great earthquake. J. Soc. Saf. Sci. 15, 343–353 (2011) 11. Sekiya, N., Hiroi, U.: The problem of stranded commuters in metropolitan area in the East Japan great earthquake. J. Jpn. Soc. Saf. Eng. 50(6), 495–500 (2011) 12. Hamada, T., Osaragi, T.: Questionnaire survey of Tokyo Metropolitan Government ordinance covering measures for stranded persons, Summaries of Technical Papers of Annual Meeting Architectural Institute of Japan, E-1, pp. 675–676 (2013) 13. Hamada, T., Osaragi, T.: Questionnaire survey of Tokyo Metropolitan Government ordinance covering measures for stranded persons, In: Papers and Proceedings of the Geographic Information Systems Association (CD-ROM), vol. 22 (2013) 14. Osaragi, T.: Estimating spatio-temporal distribution of railroad users and its application to disaster prevention planning. In: 12th AGILE Conference on Geographic Information Science. Lecture Notes in Geoinformation and Cartography, Advances in GIScience, pp. 233–250. Springer (2009) 15. Hamada, T., Osaragi, T.: The number of people with difficulty in returning home in the aftermath of large earthquake on weekday and holiday. In: Papers and Proceedings of the Geographic Information Systems Association (CD-ROM), vol. 21 (2012) 16. Osaragi, T.: Estimating the number of people returning home on foot under the Tokyo Metropolitan Government ordinance. J. Arch. Plan. (Trans. AIJ) 81(721), 705–711 (2016)

Gamification for Mobile Crowdsourcing Applications: An Example from Flood Protection Leon Todtenhausen and Frank Fuchs-Kittowski

Abstract In recent years, the application field of mobile crowdsourcing has developed and expanded. Mostly volunteers take on small tasks to reach a larger goal or solve a problem using a mobile device. Mobile crowdsourcing applications exist in many different areas and are also used in environmental protection and disaster control. One of the biggest challenges of mobile crowdsourcing is to win volunteers for a project and keep their motivation high in the long run. Gamification is a concept that is often used to increase motivation. It uses game-typical elements in non-game contexts to address different motivators. The application possibilities of Gamification are versatile and various application areas have not yet been developed. In this paper, the motives of volunteers in mobile crowdsourcing projects are analyzed and gamification as a possibility to increase motivation is investigated. In doing so, different game design elements are analyzed with regard to their motivational influences on motives of mobile crowdsourcing. Finally, the findings will be used to develop a concept for the gamification of a mobile crowdsourcing application. Keywords Mobile crowdsourcing · Gamification · Motivation · Flood protection · Disaster management

1 Introduction The rapidly advancing development and distribution of mobile devices has significantly shaped the spread of mobile crowdsourcing [1]. Mobile crowdsourcing is a participation concept that enables volunteers to get involved in solving a wide variety of tasks [2]. There are also application potentials in various areas for mobile crowdsourcing. One of these application areas is disaster management, in which numerous mobile crowdsourcing applications have emerged in recent years [2]. L. Todtenhausen (B) · F. Fuchs-Kittowski HTW, Berlin, Germany e-mail: [email protected] F. Fuchs-Kittowski e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_3

37

38

L. Todtenhausen and F. Fuchs-Kittowski

Since the concept of mobile crowdsourcing is strongly dependent on the acquisition and motivation of volunteers, volunteer management is a central task of it [3]. The aim is to attract volunteers, retain them in the project and keep them available to solve tasks [2]. One of the biggest challenges is the motivation of the volunteers [4]. New technologies require the discovery, research and development of new motivational approaches [4]. The concept of gamification has the potential to provide an incentive for participants and encourage commitment [5]. Gamification describes the integration of game elements into non-game contexts [6]. Whether in learning foreign languages, using fitness apps through participation in bonus programs: Already in many areas of everyday life, which originally have nothing to do with games, people already encounter playful or game-typical elements. Although Gamification is already present in various areas, the potential—especially in the software area—is still huge and various areas of application for gamification have not yet been developed [7]. How exactly gamification needs to be applied to strengthen the motives of volunteers in mobile crowdsourcing in general—and in disaster management in particular—and to overcome motivational barriers still must be explored [5]. In particular, the effects of individual gamified elements on various motives of volunteers are of high relevance in order to be able to use them specifically to increase motivation in mobile crowdsourcing. The aim of this contribution is to harness the potential of gamification to increase motivation in mobile crowdsourcing. For this purpose, certain game elements which address volunteers’ motivation in mobile crowdsourcing projects are selected. This can encourage potential volunteers to join mobile crowdsourcing projects, remain part of the project and become active in relevant situations. In addition, a concept for the gamification of a mobile crowdsourcing application is presented, in which selected game mechanics are integrated into an existing application. This paper is structured as follows: Firstly, the current state of research on the use of gamification to increase motivation in mobile crowdsourcing projects is presented in order to identify the need for research of the concepts presented (Chap. 2). Subsequently, central motivators of volunteers in mobile crowdsourcing projects, which should be addressed by gamification, are investigated and presented in detail (Chap. 3). Afterwards, motivational effects of various game design elements are analyzed and assigned to the motivators already presented (Chap. 4). Based on this assignment, the specific motivation barriers in flood protection are identified in Chap. 5 and the concept for the gamification of the mobile crowdsourcing application “INUNDO” is developed. Finally, the results of this paper are summarized and an outlook on further development opportunities is given (Chap. 6).

2 State of Research and Research Methodology This chapter presents the current state of research on the use of gamification to increase motivation in mobile crowdsourcing projects. For the resulting research

Gamification for Mobile Crowdsourcing Applications …

39

question, the research design (methodological approach) for the investigation is presented.

2.1 Background—Mobile Crowdsourcing and Gamification Mobile crowdsourcing: Mobile crowdsourcing is a special type of crowdsourcing and refers to the use of mobile devices (smartphones, tablets etc.) to collect data as well as to coordinate volunteers involved in the data collection [8]. The tasks to be performed in mobile crowdsourcing consist mainly of generating measurement data and then sending the data via an existing infrastructure to third parties who evaluate and process the data [8]. There are already many different crowdsourcing applications, especially in nature conservation, environmental protection, and disaster management [8]. In disaster management, mobile crowdsourcing can be used to great effect both during and after a catastrophe [9, 10]. Volunteer management: The participants contribute the data voluntarily [8, 11]. Therefore, volunteer management is also an important part of mobile crowdsourcing projects. Important phases of volunteer management are the recruitment (also recruiting or acquisition), the care and commitment as well as the activation for the execution of tasks or measurements in time windows provided [2]. These phases are referred to in the following as the acquisition phase, the binding phase and the activation phase. The aim of the acquisition phase is the acquisition of as many volunteers as possible [2]. This can be done either on the volunteer’s own initiative or at the request of the organizer [2]. A target group should be identified at the start of the project and enquiries should be matched with this target group [12]. During the commitment phase it is important to keep the motivation of the volunteers high so that they are always motivated to spring into action if needed [2]. A good and regular communication between the organiser and the volunteers as well as sufficient information about the importance of the volunteers’ work are conducive to this [2]. The mobilization of volunteers in the activation phase is one of the central tasks of mobile crowdsourcing platforms [2]. During this phase, volunteers are supposed to carry out certain tasks when it is important for the project [2]. Gamification: Gamification describes the use of game design elements in a nongame context [6]. The central idea of gamification is to “implement” building blocks from games in real-life situations in order to evoke specific behaviors in these situations [13]. Gamification, like “Serious Games”, not only serves the entertainment of the user, but also aims at a rule-based and target-oriented behavior of the user [14]. However, Gamification differs from full-fledged “serious games” in that only game design elements are adopted or integrated and no entire game is developed for playing through learning content [6]. Since the potential of gamification is based on motivational support [15], gamification is used in many areas where people should be motivated to do something. The best researched application context of gamification

40

L. Todtenhausen and F. Fuchs-Kittowski

is its use in education [14] and work [16]. Fields of application also include the environmental and sustainability sector, so called Green Gamification [17]. Green gamification can be found in many areas of life: from applications that reward fuel-efficient driving behavior to applications that monitor recycling behavior in households [17]. Green gamification can cause behavioral changes, especially in the areas of resource consumption in households, passenger transport and waste disposal behavior [17]. Game design elements: Game design elements play a decisive role for gamification applications [14]. Game design elements do not represent necessary characteristics that define games, but specific features of certain games [6, 14]. These properties are transferred to non-game contexts through gamification. They can be used to influence the motivation and usage behavior of the player (in a game) or user (in a gamified application) [7]. The game design elements that are most often at the center of empirical studies [14] are: points, badges, leaderboards, levels and challenges.

2.2 Related Work—Use of Gamification in Mobile Crowdsourcing While the use of gamification has already been investigated in various areas, there are only a few studies and concepts regarding the usage in mobile crowdsourcing projects, especially for the motivation of volunteers. So, Rutten et al. [12] have already conducted a collection of motivators and barriers in mobile crowdsourcing. These motivators are assigned to extrinsic motivation, intrinsic motivation and amotivation. Some of these motivators, however, are very project-specific and cannot be generally transferred to projects or applications of mobile crowdsourcing. Also, Hossain [18] examined factors that motivate people to participate in crowdsourcing projects. He divides the motivators into the categories “Financial”, “Social” and “Organizational” and also mentions motivators from the collection of Rutten et al. As further important motivators he names selfdetermination, independence, self-fulfillment and pride as extrinsic motivators. Both Rutten et al. and Hossain deal with motivation and motivators of mobile crowdsourcing projects, but have no relation to gamification. Zeng et al. [5] on the other hand have dealt with the use of gamification in crowdsourcing projects. They first considered a small number of motivators from crowdsourcing projects and then checked which of these motivators are related to gamification and can be influenced by it. However, only hypotheses are made and a research design, which has not yet been carried out, has been created. In order to have a more comprehensive overview of the possible motivators of mobile crowdsourcing, a further investigation is needed to collect additional motivators and demotivating factors from the literature. Furthermore, no assignment of game design elements to motivators of mobile crowdsourcing can be found in the

Gamification for Mobile Crowdsourcing Applications …

41

literature. So there is still research to be done on which game design elements can be used in mobile crowdsourcing for the acquisition and motivation of volunteers.

2.3 Research Design (Methodical Approach) As the acquisition and motivation of volunteers are key tasks in mobile crowdsourcing [3, 4], the potential of gamification to solve or alleviate this problem needs to be examined and finally, suitable game design elements are to be determined. The motivators and circumstances which favor the participation of volunteers in mobile crowdsourcing projects are used as a basis for this. The comprehensive overview of various motivators in mobile crowdsourcing by Rutten et al. [12], which was compiled through literature research, serves as a starting point. These motivators are supplemented by further motivators from scientific literature. Once the motivators to be strengthened and the factors to be weakened have been determined, possibilities are sought to influence them through gamification. The effects of various game design elements on the different types of motivation are collected through literature research. Subsequently, game design elements, which have a positive influence on many motivators, are selected. These game design elements are therefore particularly suitable for integration and gamification in mobile crowdsourcing applications. Finally, a concept is created for the integration of selected game design elements into a mobile crowdsourcing application.

3 Analysis of Motivators to Be Activated in Mobile Crowdsourcing The recruitment and activation for participation of volunteers are essential problems in mobile crowdsourcing. Increasing the motivation of volunteers is one way to increase their productivity and participation [4]. In turn, general motivation can be increased by activating individual motivators [4]. Therefore, it was investigated which motivators of mobile crowdsourcing exist and which need to be activated. Table 1 shows the result of the literature research on the motivators of mobile crowdsourcing. The different motivators are arranged according to the source of motivation (from external to internal). Looking at the table, it clearly appears that most of the motivators for mobile crowdsourcing have an external source. In addition to the motivators to be activated, this table also shows demotivating factors that can lead to amotivation. This must be inhibited in order to keep the volunteers’ motivation high in mobile crowdsourcing projects.

Trust Self-determination Pride

Payment

External obligations or norms

Direct feedback

Self-promotion

Gain political influence

Depth of involvement

Recognition of contribution

Lack of confidence

Power gap between scientist and volunteer

Withholding of information

Low appreciation

Feeling control over scientific process

Complete work done by me

I feel needed

Gaining reputation

Self-control, egoinvolvement and internal rewards or punishments

Data not being used

Somewhat external

Motivators

Compliance, external rewards and punishments

External

Impersonal

Unintentional, incompetence and lack of control

Mechanisms

Extrinsic motivation

Source

Amotivation

Motivation

Table 1 Activating motivators in mobile crowdsourcing

Use it to teach others

Community impact involvement

Feedback on group contribution

Feel responsible to do so

Personal importance and conscious valuing

Somewhat internal

Task autonomy

Exchange knowledge

Discover new things

Learning new things

Congruence, awareness and synthesis with goals of self

Internal

Self-fulfillment

It is amazing/beautiful

Pastime

Fun

Interest, enjoyment and inherent satisfaction

Internal

Intrinsic motivation

42 L. Todtenhausen and F. Fuchs-Kittowski

Gamification for Mobile Crowdsourcing Applications …

43

Factors that can cause motivation include the fact that data is not used, information is withheld or the volunteer is underestimated. A lack of trust or a difference in power within the project can also demotivate the participants. Purely extrinsic motivators are payment, direct feedback, self-expression and appreciation of one’s own contribution. In addition, external duties and norms, the gain of political influence and the prospect of strong involvement in a community can also motivate volunteers externally. Some external motivators are the increase in prestige, the feeling of being needed and the feeling of control over scientific processes. The prospect of acting independently, taking on a complete task, pride and trust are also motivators that mainly have an external source. Extrinsic motivators, which have also a partly internal source, include increasing one’s own influence in society, feeling responsible, teaching others and getting feedback on one’s own contribution in the group. There are also extrinsic motivators that have an internal source. They include the motivators to discover and learn new things, share knowledge and work independently. Motivators of intrinsic motivation, which exclusively have internal sources, are fun, pastime and self-fulfillment. Among the collected motivators, motives and demotivating factors there are also some which are strongly or completely determined or influenced by the specific context and purpose of the crowdsourcing project and thus can hardly be reinforced by game design elements [5]. They were therefore not considered for gamification activation and are not included in this collection.

4 Assignment of Game Design Elements and Motivators This chapter examines the influence of game design elements on individual motivators of mobile crowdsourcing. This enables game design elements to be assigned to various motivators in mobile crowdsourcing. In the following, the results of the research of technical literature concerning gamification and motivation are presented. The results show the impact of the five most commonly used game design elements [14]: points, badges, leaderboards, levels, challenges and other relevant game design elements. During the research, direct influences of the game design elements on the selected motivators of mobile crowdsourcing were considered. In addition, attention was paid to other game design elements which, in addition to the five most frequently used elements, also have an effect on the motivators to be activated. Table 2 displays motivational impacts of several game design elements. It could be observed that influences and effects on different motivators of mobile crowdsourcing could be found for all investigated game design elements. Within the framework of literature research, only the effects of game design elements on motivators of extrinsic motivation were proven. Tinati et al. [19] define points in games and gamified applications as an element that serves to promote competition between individual players or users. According to Tinati et al., the aim of this competition is to gain recognition as a top player.

Direct Feedback [25, 21, 13] Recognition of contribution [22]

Direct Feedback [22]

Leaderboard

Level

Gaining reputation [27] Self-determination [28, 13, 15, 24]

Avatars

Self-determination [21] Whole work done by me [21]I feel needed [13] (S. 123)

Gaining reputation [25, 22, 15]

Gaining reputation [26, [19, 15]

Gaining reputation [20, 22, 23] Stolz [23]

Gaining reputation [19]

Somewhat external

Virtual gifts

Teams

Depth of involvement [26, 13]

Direct Feedback, Recognition of contribution [13, 21] Self-promotion [13, 22]

Challenges

Direct Feedback [13]

External

Extrinsic motivation

Badges

Source

Points

Amotivation

Impersonal

GDE

Table 2 Assignment of game design elements and motivators

Community impact involvement [27]

Feedback on group contribution; Exchange knowledge [15]

Feedback on group contribution [25]

Community impact involvement [13]Discover new things [24]

Somewhat internal

Task autonomy [13]

Internal

Intrinsic motivation Internal

44 L. Todtenhausen and F. Fuchs-Kittowski

Gamification for Mobile Crowdsourcing Applications …

45

According to [20], points can also be used to encourage the user immediately. They make it possible to assess the user’s behavior measurably and to give him permanent and immediate feedback [13, 20]. For the game design element badges, most of the direct influences on motivators were proven in the literature search. Sailer et al. [13] and van Roy and Zaman [21] describe a feedback function of badges in gamified applications. Badges show both the user himself and other users which services have been provided and which tasks have been accomplished [13]. Badges are also used to impart skills to outsiders [21]. Badges are also the digital counterpart to material status symbols in real life, which are often used to represent oneself [22]. In addition, badges obtained from their wearer trigger confirmation and remembrance of past successes and address the motivator “pride” [23]. Furthermore, badges can strengthen the wearer’s social influence, if they are rare or difficult to earn [13, 23]. According to Richter et al. [24], players and users of gamified applications also enjoy discovering new things by earning new badges. Massung et al. [25] attribute a feedback function to the game design element leaderboards, as players can use leaderboards to see how their performance compares to that of the rest of the community. According to Koch et al. [26], the motivator of increasing one’s own reputation is also reinforced by leaderboards. Players are often motivated to improve themselves and to distinguish themselves from others, which is significantly intensified by the direct comparison of players on a leaderboard [26]. In a study by Massung et al. [25], it was shown that leaderboards can also give feedback on one’s own contribution in a group or community. Leaderboards within a community show how high one’s participation is compared to the participation of other members. In video games and gamified applications, levels are used to show the player where he stands currently and give him feedback about his actions [22]. Blohm and Leimeister [15] describe the use of levels to represent a certain status. Thus, levels are also used for status acquisition, which in turn leads to an increase in reputation. According to van Roy and Zaman [21], challenges strengthen the feeling of independence and self-determination, as players may be able to choose challenges and then solve them independently. Furthermore, by taking on a challenge, a complete task is taken on and thus, also a motivator of mobile crowdsourcing is addressed. Challenges in gamified applications can also convey the feeling of being needed by experiencing social integration [14]. The choice between different challenges and the clearly defined goals of challenges also encourage independent work [13]. Koch et al. [26] describe the formation of groups and teams as a game design element that increases cooperation. The fact that the players or users are motivated to work together also increases the feeling of strong social integration [26]. According to Blohm and Leimeister [15], cooperation that is promoted by team building also promotes social exchange and the exchange of knowledge. Furthermore, working in a team gives users the opportunity to participate in solving big problems and thus make them aware of their own contribution to the group [15]. In virtual worlds and games, virtual gifts are used to express solidarity and are one of the most widespread methods of socialization in games [22]. A study by Goode

46

L. Todtenhausen and F. Fuchs-Kittowski

et al. [27] showed that virtual gifts in games are also used to enhance reputation and influence in society. Gifts increase the popularity of the player and thus his or her influence on society. Another frequently used game design element is avatars. They reflect the player as a digital figure in games or gamified applications [28]. In games or gamified applications in which avatars can be edited and individually adapted by the player, the feeling of self-determination is conveyed through freedom of choice and the promotion of creativity [14, 15]. No evidence was found for the promotion or mitigation of demotivation through specific individual game design elements. According to Proulx et al. [29], there are also no game design elements which cause a state of amotivation. Influences of individual game design elements on the intrinsic motivation could also not be proven. However, there is evidence of positive effects of complete gamified applications on intrinsic motivation. Studies by Flatla et al. [30], Bowser et al. [31], Kavaliova et al. [32] and Chapman and Rich [33] have shown that the use of game-based applications is more fun for users and the applications are therefore used more frequently. This finally leads to an increase in user performance and intrinsic motivation [34]. The results of this step show a picture of general motives of volunteers in mobile crowdsourcing projects. The gamification of specific mobile crowdsourcing applications is an individual process. For this reason, project-specific problems and goals should always be considered.

5 Gamification of the Mobile Crowdsourcing Application INUNDO This chapter uses the outcome of previous chapters to gamify a mobile crowdsourcing application. The application to be gamified is the mobile crowdsourcing application INUNDO (Latin for ‘to flood’ or ‘to inundate’), which was developed as part of the VGI4HWM research project. The aim of this project was to develop a flood forecasting system to support flood management in small catchment areas based on volunteered geographic information (VGI) data collected by volunteers. The collection of this data (water levels and precipitation) by the volunteers is carried out via the mobile smartphone application INUNDO [35]. The captured information (VGI)—e.g. water level measurements acquired via image-based measurement methods—are automatically incorporated into the flood forecasting system in order to minimize uncertainties and in order to support the situation monitoring during flood events [36]. A web application (dashboard) visualizes the received information and forecasts thus providing appropriate situation awareness [36].

Gamification for Mobile Crowdsourcing Applications …

47

5.1 Identification of Motivational Barriers Within the VGI4HWM project, various motivational barriers were identified by a field study, that make it difficult to acquire, retain and activate volunteers. These barriers have to be overcome with the help of gamification of the application INUNDO in order to keep the motivation high in all phases of the project. Figure 1 shows the motivational barriers of the individual project phases. • Acquisition phase: the motivation of potential volunteers can be reduced due to a lack of communication, unclear objectives or non-existent payment [10, 38, 39]. • Binding phase: there is a risk that users will forget or even uninstall the mobile application due to a lack of important tasks [37]. In addition, users may stop participating in the project if they feel that their measurements are not being used or are not producing [38, 39]. In the VGI4HWM project, this phase mainly concerns the period between two floods [37]. • Activation phase: If the user does not feel able to make measurements because he considers his knowledge and skills to be insufficient, he will not collect any data either [12]. Measurement methods that are not adapted to the user group, for example because they are too technical or demanding for beginners, are also an obstacle [39]. The user could also think that enough data had already been collected and that he no longer needs to act to support the project [37]. Another barrier is the extreme weather conditions during a disaster: Users are most likely to get wet when measuring precipitation or water levels, which may discourage them from measuring [37].

Fig. 1 Motivational barriers of the application INUNDO

48

L. Todtenhausen and F. Fuchs-Kittowski

Fig. 2 Assignment of game design elements and motivational barriers

5.2 Selection of Game Design Elements for Overcoming Motivational Barriers In order to overcome the identified barriers, various game design elements were identified which could become part of a possible gamification of the application INUNDO. The game design elements were selected for their effect on special motivators that counteract the barriers. Figure 2 shows the selected game design elements as well as their effects on motivators and the barriers to overcome. Points and badges were selected to act as currency substitutes against the barrier of non-existent payment. Similarly, badges serve as motivators to become active even in extreme weather situations through the appreciation of one’s own contribution associated with them. The element of team building promotes the exchange of knowledge and integration into the project and thus counteracts a lack of communication and forgetting the application. Progress bars, levels, points and badges all provide direct feedback to the user. This can weaken or even overcome various motivational barriers. The user is shown, for example, that his work is important and that the data he transmits is also used. In addition, the project objectives and the project progress can be clarified. The user also receives feedback on his qualifications and can better assess his abilities and the level of difficulty of the tasks to be performed.

5.3 Concept for the Integration of Game Design Elements This section presents a concept for the integration of already selected game design elements into the application INUNDO. For this purpose, mockups were created which show views of the application in which the game design elements are used. Figure 3 shows a gamified view of the user profile. In addition to the general information, this view also displays information about the user’s team membership, level and experience point (EP) level. In addition, the user can call up and view an

Gamification for Mobile Crowdsourcing Applications …

49

Fig. 3 Gamified view of the user profile

overview of his earned badges. In the context of this work several badges were developed, which should motivate the user to use the application and make measurements. The badges in Fig. 3 can be earned by performing various tasks. This includes the water level measurement at ten different locations, the advertising of five friends, the measurement of 100 rainfalls, a measurement in storms and measurements on ten consecutive days. The badges, which are based on the number of measurements made, can also be marked with different values (see Fig. 4). The low levels allow the user to quickly experience a sense of achievement. The higher levels keep the user motivated even after reaching a lower level and represent another goal for the user. Figure 5 (left) illustrates how earning points for a measurement made and obtaining a badge may look in the gamified application. After a measurement, the user

Fig. 4 Badges

50

L. Todtenhausen and F. Fuchs-Kittowski

Fig. 5 Leader board and rewards in the gamified app INUNDO

receives direct feedback in the form of earned points. In addition, a progress bar tells him how many points he needs to reach the next level. In order to further motivate the user, he also receives direct feedback regarding the badge he is able to earn next. If the user earns a badge, the earned badge and the associated activity are displayed. In this way, both the user himself and other users can see which task has been provided in order to earn the specific badge. This contributes to a better assessment of the user’s skills and competences. The experience points earned by the user serve as a criterion for creating the leader board. The leader board displays the users with the highest scores for each region (see Fig. 5 right). In addition to the experience points and the username, the user level and the number of measurements taken are also displayed. From this, other users can also draw conclusions about the abilities of the users on the leader board. Leaderboards can also be created according to temporal and local criteria. For example, a leader board that only evaluates measurements from the last seven days can differ greatly from a leader board that is valid for a complete month. Users who have made many measurements in a short period of time and therefore have earned many EPs will see themselves in a higher position on a leader board valid for a short period of time. The associated increase in reputation and prestige in the community is then regarded as a reward for the effort shown. Experience points can also be used as an incentive in situations where it is important for users to take measurements. The number of experience points a user receives for a measurement can be used variably

Gamification for Mobile Crowdsourcing Applications …

51

to motivate the user. If a call promises a high number of experience points for measurements taken, users are more motivated to take these measurements. A progress bar can also indicate how many measurements have already been taken and point out the importance of the measurement call (see Fig. 5).

6 Summary and Outlook Through the analysis of motivators of mobile crowdsourcing, it was possible to determine which motives volunteers pursue when participating in mobile crowdsourcing projects. This insight formed an important basis for the further work steps. By filtering out project-specific motivators, the motivators to be activated by gamification could be identified. Another elementary work step was the assignment of game design elements and their effects on the motivators of mobile crowdsourcing. Based on the parallels between the motivators activated by the game design elements and the relevant motivators of mobile crowdsourcing, it was finally possible to select game design elements adapted to the motivators of mobile crowdsourcing. Since gamification must always be very strongly adapted to the target group and general motivators of mobile crowdsourcing were considered in the context of this work, this selection of game design elements represents only a recommendation and not a generally valid regulation. Depending on the application area of the mobile crowdsourcing project, different game design elements are suitable for activating project-specific motivators. After this work identified potential game design elements for the gamification of mobile crowdsourcing applications and developed a concept for the gamification of the application INUNDO, the next step is to look at a possible continuation of the work. An obvious continuation of the gamification of the application INUNDO is the implementation of the concept developed in this paper. The implementation of a user survey would be a possible measure to prove or refute an increase in the number and motivation of users. To this end, users would have to be asked about their usage behavior and motivation before and after the gamification. The values determined would then be compared in order to detect a change. If the gamification of the application was successful and resulted in an increase in the number of users and motivation, the concept could also be transferred and applied to similar mobile crowdsourcing projects. The determination of game design elements, which were assessed as optimal for use in mobile crowdsourcing, would also have to be tested through implementation and user surveys. Since the success of gamification is strongly dependent on the user group, a compatibility check for a corresponding project is required before implementation. Only the application of gamification and user surveys can determine which game design elements actually have a positive or negative effect on the motivation of users of mobile crowdsourcing applications.

52

L. Todtenhausen and F. Fuchs-Kittowski

References 1. Shen, X.: Mobile crowdsourcing [Editor’s note]. IEEE Netw. 29(3), 2–3 (2015) 2. Fuchs-Kittowski, F.: Mobiles Crowdsourcing zur Einbindung freiwilliger Helfer. In: Reuter, C. (ed.) Sicherheitskritische Mensch-Computer-Interaktion, pp. 551–572. Springer Fachmedien Wiesbaden, Wiesbaden (2018) 3. Wang, Y., Jia, X., Jin, Q., Ma, J (eds).: Mobile Crowdsourcing: Architecture, Applications, and Challenges (2015) 4. Goncalves, J., Hosio, S., Rogstadius, J., Karapanos, E., Kostakos, V.: Motivating participation and improving quality of contribution in ubiquitous crowdsourcing. Comput. Netw. 90, 34–48 (2015) 5. Zeng, Z., Tang, J., Wang, T.: Motivation mechanism of gamification in crowdsourcing projects. Int. J. Crowd Sci. 1(1), 71–82 (2017) 6. Deterding, S., Dixon, D., Khaled, R., Nacke, L.: From game design elements to gamefulness. In: Proceedings of the 15th International Academic MindTrek Conference on Envisioning Future Media Environments—MindTrek ‘11. ACM Press, New York, p. 9 (2011) 7. Stieglitz, S., Lattemann, C., Robra-Bissantz, S., Zarnekow, R., Brockmann, T.: Gamification: Using Game Elements in Serious Contexts. Springer International Publishing, Cham (2017) 8. Fuchs-Kittowski, F., Faust, D.: Architecture of mobile crowdsourcing systems. In: Baloian, N., Burstein, F., Ogata, H., Santoro, F., Zurita, G. (eds.) Collaboration and Technology, pp. 121–36. Lecture Notes in Computer Science. Springer International Publishing, Cham (2014) 9. Middelhoff, M., Widera, A., van den Berg, R., Hellingrath, B., Auferbauer, D., Havlik D. et al.: Crowdsourcing and crowdtasking in crisis management: lessons learned from a field experiment simulating a flooding in the city of the Hague. In: 2016 3rd International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), p. 1–8. IEEE (2016) 10. Schimak, G., Havlik, D., Pielorz, J.: Crowdsourcing in crisis and disaster management—challenges and considerations. In: Idris, A.I. (ed.) Bone Research Protocols, pp. 56–70. Springer, New York (2019) [Methods in Molecular Biology] 11. Büro für Angewandte Hydrologie: VGI4HWM (2016). http://vgi4hwm.de/ 12. Rutten, M., Minkman, E., van der Sanden, M.: How to get and keep citizens involved in mobile crowd sensing for water management? A review of key success factors and motivational aspects. WIREs Water 4(4), e1218 (2017) 13. Sailer, M., Hense, J., Mayr, S., Mandl, H.: How gamification motivates: an experimental study of the effects of specific game design elements on psychological need satisfaction. Comput. Hum. Behav. 69, 371–380 (2017) 14. Sailer, M.: Die Wirkung von Gamification auf Motivation und Leistung: Empirische Studien im Kontext manueller Arbeitsprozesse. Springer, Wiesbaden (2016) 15. Blohm, I., Leimeister, J.: Gamification. Wirtschaftsinf 55(4), 275–278 (2013) 16. Neeli, B.: Gamification in the enterprise: differences from consumer market, implications, and a method to manage them. In: Reiners, T., Wood, L.C. (eds.) Gamification in Education and Business, pp. 489–511. Springer International Publishing, Cham (2015) 17. Froelich, J.: Gamifying green: gamification and environmental sustainability. In: Walz, S.P., Lantz, F., Deterding, S., Andrews, L., Benson, B., Bogost, I., et al. (eds.) Gameful World: Approaches, Issues, Applications, pp. 563–596. MIT Press, Cambridge (2015) 18. Hossain, M.: Users’ motivation to participate in online crowdsourcing platforms. In: 2012 International Conference on Innovation Management and Technology Research, pp. 310–315. IEEE (2012) 19. Tinati, R., Luczak-Roesch, M., Simperl, E., Hall, W.: An investigation of player motivations in Eyewire, a gamified citizen science project. Comput. Hum. Behav. 73, 527–540 (2017) 20. Sailer, M., Hense, J., Mandl, H., Klevers, M.: Psychological perspectives on motivation through gamification. Interact. Des. Arch. (19), 28–37 (2013). http://www.mifav.uniroma2.it/inevent/ events/idea2010/doc/19_2.pdf

Gamification for Mobile Crowdsourcing Applications …

53

21. van Roy, R., Zaman, B.: Need-supporting gamification in education: an assessment of motivational effects over time. Comput. Educ. 127, 283–297 (2018) 22. Zichermann, G., Cunningham, C.: Gamification by Design: Implementing Game Mechanics in Web and Mobile Apps. O’Reilly, Sebastopol, CA (2011) 23. Antin, J., Churchill, E.: Badges in Social media: a social psychological perspective. In: Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems. ACM, New York (2011) 24. Richter, G., Raban, D., Rafaeli, S.: Studying gamification: the effect of rewards and incentives on motivation. In: Reiners, T., Wood, L.C. (eds.) Gamification in Education and Business, pp. 21–46. Springer International Publishing, Cham (2015) 25. Massung, E., Coyle, D., Cater, K., Jay, M., Preist, C.: Using crowdsourcing to support proenvironmental community activism. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ‘13, p. 371. ACM Press, New York (2013) 26. Koch, M., Oertelt, S., Ott, F.: Gamification von Business Software—Steigerung von Motivation und Partizipation. 1. Aufl. Neubiberg: Forschungsgruppe Kooperationssysteme, Univ. der Bundeswehr München (2013). (Schriften zur soziotechnischen Integration; vol 3) 27. Goode, S., Shailer, G., Wilson, M., Jankowski, J.: Gifting and status in virtual worlds. J. Manag. Inf. Syst. 31(2), 171–210 (2014) 28. Alsawaier, R.: The effect of gamification on motivation and engagement. Int. J. Inf. Learn. Tech. 35(1), 56–79 (2018) 29. Proulx, J.-N., Romero, M., Arnab, S.: Learning mechanics and game mechanics under the perspective of self-determination theory to foster motivation in digital game based learning. Simul. Gaming 48(1), 81–97 (2017) 30. Flatla, D., Gutwin, C., Nacke, L., Bateman, S., Mandryk, R.: Calibration games. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, UIST ‘11, p. 403. ACM Press, New York (2011) 31. Bowser, A., Hansen, D., He, Y., Boston, C., Reid, M., Gunnell, L. et al.: Using gamification to inspire new citizen science volunteers. In: Proceedings of the First International Conference on Gameful Design, Research, and Applications, Gamification ‘13, pp. 18–25. ACM Press, New York (2013) 32. Kavaliova, M., Virjee, F., Maehle, N., Kleppe, I.: Crowdsourcing innovation and product development: gamification as a motivational driver. Cogent Bus. Manag. 3(1), 91 (2016) 33. Chapman, J., Rich, P.: Does educational gamification improve students’ motivation? If so, which game elements work best? J. Educ. Bus. 93(7), 315–322 (2018) 34. Landers, R., Bauer, K., Callan, R., Armstrong, M.: Psychological theory and the gamification of learning. In: Reiners, T., Wood, L.C. (eds.) Gamification in Education and Business, pp. 165–186. Springer International Publishing, Cham (2015) 35. Burkard, S., Fuchs-Kittowski, F., de Bhroithe, A.: Mobile crowd sensing of water level to improve flood forecasting in small drainage areas. In: Hˇrebíˇcek, J., Denzer, R., Schimak, G., Pitner, T. (eds.) Environmental Software Systems. Computer Science for Environmental Protection, pp. 124–38. Springer International Publishing, Cham (2017) [IFIP Advances in Information and Communication Technology] 36. Burkard, S., Fuchs-Kittowski, F., Muller, R., Pfutzner, B.: Flood management platform for small catchments with crowd sourcing. In: 2018 5th International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), pp. 1–8. IEEE (2018) 37. Richter, L.-K.: Maßnahmen zur Motivation freiwilliger Helfer in VGI-Anwendungen im Katastrophenschutz [Bachelorarbeit]. Hochschule für Technik und Wirtschaft, Berlin (2017) 38. Rotman, D., Preece, J., Hammock, J., Procita, K., Hansen, D., Parr, C. et al.: Dynamic changes in motivation in collaborative citizen-science projects. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, CSCW ‘12, p. 217. ACM Press, New York (2012) 39. Burkard, S.: Motivationale Barrieren des VGI4HWM-Projekts. Hochschule für Technik und Wirtschaft Berlin, 21 January 2019

MoPo Sane—Mobility Portal for Health Care Centers Benjamin Wagner vom Berg and Aina Andriamananony

Abstract In the era of advanced Telemedicine (Projekt-Telepflege Homepage [1]), outpatient and inpatient stay in health care facilities will continue to be indispensable in the frame of health care and services of general interest. Essential facilities of the Health services (clinics, medical specialists, etc.) are located at the middle-order and/or higher-order regional centers. The project “MoPo gesund—Mobilitätsportal für das Gesundheitswesen” (MoPo sane—mobility portal for the health care centers) as an example for a smart mobility system provides a solution for flexible and networked mobility in order to improve the accessibility to health centers, especially for rural areas. A web-based mobility portal will be used to collect the mobility needs of patients, visitors and employees with no or restricted individual mobility options (older people, young people without driving licenses, socially disadvantaged people) and connect them to sustainable mobility alternatives. The portal will offer a clearly defined regional backdrop with “mobility endpoints” that can be clearly located from a geographical location. The consideration of such system boundaries has significant advantages. Providing the necessary mobility services would be difficult if the sources and destinations of the paths covered by rural residents are weakly concentrated or widely dispersed. This project will concentrate the mobility endpoints onto various health care facilities. Keywords Sustainable mobility · Smart city · Mobility portal · Health care · Ride-sharing

B. Wagner vom Berg (B) · A. Andriamananony Cosmo UG, Butteldorf 10, 26931 Elsfleth, Germany e-mail: [email protected]; [email protected]; [email protected] A. Andriamananony e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_4

55

56

B. Wagner vom Berg and A. Andriamananony

1 Introduction The proposed solution generally can be classified as a smart mobility solution (Fig. 1). Smart mobility solutions are not only digital solutions but also facing sustainability issues in all three dimensions: social, economical, and environmental. This means they are environmental and climate-friendly, e.g., by using electro mobility or multimodal approaches to be mostly emission free [2]. The need to fulfill the mobility requirements of current and future generations in a fair way in terms of access and cost, e.g., by using sharing concepts like ride-sharing or car-sharing. In the ecological dimension, the need to be resilient, workers need fair payments and conditions (e.g., bus or taxi drivers). Digitalization is the main enabler to make the solution possible. The MoPo project addresses specifically the mobility to health care centers in a regional traffic system. Essential health care facilities, in Germany, are generally located in medium-sized and regional centers. The (digital) support of the reachability of these health facilities is the focus of this project. In the following, these facilities will also be subsumed under the term “health care centers” and will include, for example, hospitals, rehabilitation clinics, outpatient day clinics, specialist medical clinics, etc. The centralistic structures of modern health care systems impel citizens to be increasingly mobile, especially in rural areas. This also applies to the model region Oldenburg–Wesermarsch, which is the focus of the project. Medical specialist care in the district of Wesermarsch, e.g., is only available in the towns of Elsfleth, Berne and Lemwerder. Inpatient medical care is available within the Wesermarsch in the Brake middle center in the St. Bernhard Hospital.

Fig. 1 Conceptual portal with its main stakeholders and interfaces

MoPo Sane—Mobility Portal for Health Care Centers

57

More comprehensive medical care, especially in the form of special clinics, is only provided in the main centers of Oldenburg et al. [3]. In addition, demographic trends (due to declining population numbers and an increase in the proportion of the population over 65-years old) will lead to an increased number of journeys for medical purposes. In the more rural regions of the Wesermarsch, such as Elsfleth (Oberzentrum Oldenburg), the proportion of people over 65 will increase by more than twice [4]. This means that not only longer journeys to the health facilities can be expected, but also more journeys for medical reasons.

2 Problem Description 2.1 Actual Situation By 2016, there were approximately 700 million outpatient medical cases, and 19.5 million inpatient stays in Germany [5]. This is more than 8 “trips” to a health facility per citizen per year. If furthermore, it is assumed that most of the patients regularly receive visitors during their stay; a high number of further trips to health care facilities can be expected. At the same time, visitors and patients, especially from rural areas, are usually dependent upon their own cars. In the Wesermarsch model region, there is “rather poor public transport development” for health-oriented events [3]. Public transport is mainly provided for school transport. This means, by implication: young people are not able to reach health centers on their own. More than 800 employees work in clinics on average—thus there are corresponding mobility needs here as well. The hospital “Evangelisches Krankenhaus” in Oldenburg has 1,400 employees, which is above the average. Despite the fact that they have their own parking spaces for employees, the situation is exacerbated by the central location in Oldenburg’s city center. The situation is becoming more acute for the Evangelical Hospital, in particular, due to the practical lack of visitor and patient parking places, leading to considerable search traffic within the area. It can be clearly deduced from above that for certain population groups, the mobility, related to the accessibility of health centers, is very challenging: • The people in rural areas who have no access to a car, either without a driving license or without a car, • Children and adolescents without a driving license, • Older people that, due to health restrictions, are no longer able or willing to drive a car, • People with mobility impairments cannot drive themselves due to, e.g., physical impairments, • Health care employees commuting long distances to and from work on a regular basis and consequently exposed to additional financial and physical stress.

58

B. Wagner vom Berg and A. Andriamananony

The target groups within the framework of “Social Innovation” addressed by this project are, therefore: • Low-income people, • Children and adolescents with medical needs, but also as a visitor, e.g., from (elderly) relatives in hospital, • Older people with an increased need for care on the one hand and an increased need for visits on the other (e.g., to visit their partner in hospital), • Heavily strained health care workers with relatively low incomes, e.g., in the context of care professions.

2.2 Research Approach Based on this starting point, the project’s central objective is to noticeably improve the mobility of these target groups. This goes hand in hand with a reduction of single trips and the use of sustainable means of transport. As a positive side effect, the regional transport system will be reduced. To achieve this goal, a mobility portal tailored to the specific needs is to be designed and prototypically implemented. Furthermore, the development of permanent networks around the topic of patient mobility in the model region is in focus. Within the project, a detailed conception of the portal and the corresponding business model will be developed first, primarily including the outlined network (see below) but also the intended use. Based on the concept, a prototype will be developed which will be tested in a pilot project with clinics and municipalities in the district Wesermarsch. The mayors of Brake, Elsfleth and Ovelgönne committed their support and participation. For this purpose, the project focuses on city—surrounding countryside relationships and exemplifies mobility needs from the Wesermarsch district to the health facilities in the district and to the facilities in the “Mittelzentren” Stadt Brake and Stadt Nordenham. Characteristic here: The patients of the Wesermarsch can choose between Oldenburg, Bremerhaven, and Bremen so that they can influence not only the purely practical mobility but also aspects of the perception and self-representation of the clinics. Several institutions on municipality level, mobility providers, and health care centers are part of the project. Some of these institutions were already partners in the project “ZMo target grouporiented mobility chains in health care” [6], as part of the mFUND program, funded by the BMVI. The key partners from the project applied here were also the key partners in ZMo. The Wesermarsch is also a very suitable model region because it faces various challenges in the field of mobility resulting from topography, population structure, and distribution as well as the existing transport infrastructure [3]. Insofar it has already been the subject of various research projects. These include the BMVI model project “Long-term security of supply and mobility in rural areas” and the project

MoPo Sane—Mobility Portal for Health Care Centers

59

“NEMo—Sustainable fulfillment of mobility needs in rural areas” funded by the Volkswagen Foundation. (Prof. Dr.-Ing. Wagner vom Berg was the project leader of the CvO University Oldenburg until his transfer to the Bremerhaven University of Applied Sciences in April 2017.) However, a suitable mobility portal for the specific target groups here as well as a procurement system addressing the existing system boundaries of the health care system are still missing and also represent a real innovation at the national level. In this context, the system boundary is defined as the framework of the specific mobility system, which results from starting and destination points, mobility requirements, regional transport infrastructure, and mobility offers, etc. If one compares the regional mobility system (e.g., a city), a multitude of target groups resp. little specified target groups exist, a multitude of different mobility needs, no specific end points, and different means of transport options. An individual health center (e.g., a hospital) is considered in the case of health centers; this means a significantly increased ability to plan and increase efficiency with a focus on the limit parameters mentioned. For instance, specific passengers with the same aim can be identified regarding the organization of car pools. The primary goal of a corresponding mobility portal has to be the facilitation of the arrival and departure of employees, patients, and visitors. In general, this also requires relief of the traffic situation on the way to and in the direct surroundings of the clinics. That is why the portal has to support travel by public transport first. It should be considered, however, that there is usually no adequate public transport connection available, especially when traveling from rural areas (see above). Ridesharing, which must be supported by the portal, is an essential supplement. This also relieves the regional or local traffic situation compared to individual journeys with one’s own car and also reduces emissions of harmful substances and CO2 enormously.

3 Degree of Innovation 3.1 Portal Characteristics Ultimately, the project is innovative in several aspects as it has specific characteristics compared to other existing approaches: (1) It addresses the mobility needs of specific target groups in the health sector and improves them. (2) It addresses a clearly defined regional backdrop with “mobility endpoints” that can be clearly located in geographical terms. The consideration of such system boundaries has significant advantages. In case the sources and destinations (!) covered by rural residents are poorly concentrated or scattered over a large area, then it would be more difficult to provide the necessary mobility services. In this project, the mobility endpoints are focused on the respective health facility.

60

B. Wagner vom Berg and A. Andriamananony

Intermediary services are correspondingly efficient, and the available services are fully taken into account. (3) From points 1 and 2, we can expect strong participation. This participation is decisive for the success of a mobility portal, since prosumer approaches, such as ridesharing, require a critical mass of participants to ensure effective operation. (4) By means of the clearly defined target group and the specific environment, it is possible to effectively manage the marketing of the platform. For instance, directly via the communication channels of the clinic, but also, via affected General Practitioners and Specialists. The portal will also consider other specifics of health centers in order to provide optimal mobility. For instance, a distinction of sections in very large hospital areas (e.g., Klinikum Oldenburg with various entrances for the general clinic, children’s clinic, etc.).

3.2 Comparison to Existing Portals These four characteristics are considered to be highly innovative improvements over existing mobility platforms and portals resulting from practical experience and research. Many years of own research experience in this field have shown that the lack of these properties is the reason for the failure of many of today’s existing approaches or the fact that they are only carried out with moderate success. It also distinguishes itself from existing multimodal mobility portals or apps such as Quixxit from DB [7] or research projects such as the NEMo platform [8] in that it focuses on health centers (see P. 2 for advantages of system boundaries) and strives for a less complex but more robust and easy-to-use solution. As an example, Quixxit still has a relatively small number of users, due to the fact that it’s nationwide offering only includes large providers and thus only a small subset of the existing (regional) mobility offers. With ridesharing platforms such as flinc [9], on the other hand, the critical mass of users for a successful car-sharing offer is not reached, especially in rural areas. The lack of visibility and difficulties in marketing the service can be identified as problems (see P. 4). This approach is new, both at the regional level and at the national level. The regional importance is particularly significant in Lower Saxony because of its high number of rural regions. Nevertheless, the approach is also easily transferable at the national level.

MoPo Sane—Mobility Portal for Health Care Centers

61

4 Research Method and Conduct The project itself will follow a design thinking approach. The Design Thinking approach aims to bring together the most diverse experiences, opinions, and perspectives possible regarding a problem. The basic assumption of Design Thinking is that innovation arises in the intersection of the three equal factors: “man,” “technology,” and “economy.” The approach addressed in the project, therefore, combines attractiveness (for both end-users and operators), feasibility (from a technical and organizational point of view) and economic viability. From the understanding of existing needs and requirements of the target groups, prototypes will be implemented and evaluated at an early stage in the planned project by early creation and testing. At the same time, the focus here lies more on collecting such insights and experimenting situations that can lead to an improvement of the planned prototypical offer than on the detailed elaboration of all theoretically possible mobility portal offers. For the process model variant of Design Thinking, the following phases, partly alternating, will be run through. The approach essentially makes a contribution to improving access to health services and safeguarding services of general interest for the aforementioned target groups. This is where the mobility portal presented here comes in; it aims to secure access to physical medical services in the future, insofar as these cannot be covered by mobility-sufficient or virtual medical services. The architecture of the portal is shown in Fig. 2. The portal thereby represents a web page or an app through which suitable mobility offers can be found—and can be used. In the minimum case, a start location and a time must be specified. On this basis, a ride-sharing opportunity, as an example, will be identified.

Fig. 2 Portal architecture with its main stakeholders and interfaces

62

B. Wagner vom Berg and A. Andriamananony

In addition, to ride-sharing, public transport services, and other mobility options can also be made available. A particular success factor when compared with existing mobility platforms are the clearly defined deployment channels and the specific target groups for the platform’s services. To achieve the success of the mobility portal in terms of supporting the abovementioned target groups, the four characteristics must be consistently implemented (see Sect. 3.1). For this reason, the mobility portal should also enable communication through a call center in such a way that it can also be used by people with no digital infrastructure. These can be, e.g. elderly people or people living in outlying areas with no internet access. In particular, the portal promotes the mobility of people without a car (older people, young people, the socially disadvantaged) and primarily provides mobility services based on public transport and ride sharing. At the same time, it enables the inclusion of specific needs (e.g., taking a wheelchair with you). A typical use case could be illustrated as follows: Cataract surgery is among the standard operations in Germany with around 800,000 operations per year. When a patient is diagnosed with cataracts, the clouded natural lens is generally removed as part of an outpatient operation. The patients are not allowed to drive their own car on the day of the operation and also in the following weeks (due to limited or non-existent insurance cover). This often leads to difficulties in reaching the respective (partially distant) specialist clinic. In addition, renouncing one’s own car, especially in rural regions, is often seen as having a negative effect on existing opportunities to guarantee services in general. Such a circumstance usually creates interest in third-party mobility services. The mobility portal address this demand if necessary. Public transport represents an important alternative. It is planned to integrate the VBN/ZVBN app (public transport provider) within the project. In this way, the entire public transport of the model region would be covered. The user interface will be preserved in order to reduce the effort and also to improve the user-friendliness (no acclimatization for users already using the app). However, as shown above, public transport is not always available, especially in rural areas. Ride-sharing services are, therefore, an essential complement. Thanks to the clear system boundaries (see Property 2, Sect. 3.1), their probability of success remains high since the goal remains the same for all users (e.g., hospitals). At the same time, the involvement of “travelers” must be ensured, and appropriate reward systems established [10]. A general willingness to participate, on the other hand, can be derived from the same destination (see Property 3, Sect. 3.1). In addition, actions for the safety of minors must also be implemented (e.g., special identification of drivers). Otherwise, barriers against further participation will arise as a result of corresponding safety risks. In this context, the prevailing data protection law must also be taken into account, specifically for mobility portals. The Ridesharing system will be based on the system from the Smartway company. The system offers very powerful algorithms in comparison to other solutions—and enables the integration of diverse parameters. An adaptation of the system envisaged here is required within the framework of the project. The interfaces for connecting the Smartway and ZVBN services are designed to be so open in such a way that switching

MoPo Sane—Mobility Portal for Health Care Centers

63

to other providers is be easy. This enables both, the transferability to another area as well as the integration of further mobility providers (e.g., also taxi companies). As already mentioned, users of the portal will be mainly patients, visitors, and employees of the clinic. While patients and visitors tend mostly traveling to the clinics one-way, employees usually travel regularly. This circumstance has to be considered during the design phase. For instance, in the case of ridesharing services, they are generally suitable for all three user groups, but for employees, the formation of regular car pools must be supported. Furthermore, the access to the portal for employees should also become easier, e.g., by linking them to the corresponding operational information systems, so as to increase participation here as well (see Properties 3 and 4, Sect. 3.1). Through the involvement of employees, the project also makes a positive contribution to the topic of “The changing world of work.” The above-mentioned adaptations thus lead to an improvement in mobility to health centers for specific target groups/groups or even make it possible at all. According to property 4 (see Sect. 3.1), clinics represent not only the primary goal in this model; they are also an important distribution channel in conjunction with the medical profession. This ensures that the mobility portal for the specific health center gains as much attention and awareness as possible from patients and visitors, e.g., by placing an appropriate announcement on the clinic’s website, flyers or digital information on referral by the doctor. In addition to a Website address, leaflets also contain a telephone number (see above). Earlier research projects such as the “IKT Plattform” project within the framework of the “Schaufenster Elektromobilität Niedersachsen” have shown that the operator question should be the central point in order to transfer a portal or platform developed in the context of research to the real world and to establish a long-term basis on it [2]. Therefore, a thorough analysis of possible business models and the development of a specific business model within the framework of the project is also planned.

5 Conclusion and Future Outlook The here proposed portal is able to address the specific needs of different target groups in terms of mobility to health care centers. Especially the mobility demands of people with limited options in rural regions can be fulfilled. In the ecological dimension, a decrease of emissions can be expected because of sustainable mobility options like public transport and ride-sharing. The project is funded within the program “Soziale Innovation” of NBank and EFRE. A prototypical implementation and a pilot is planned in the Wesermarsch region.

64

B. Wagner vom Berg and A. Andriamananony

The results of the project should continue to be used after the end of the project and can be economically exploited. To this end, the project addresses different work contents and results that can be used separately and can also be integrated: (1) Evaluated mobility needs from specific target groups in the health sector and the derived requirements for a functioning mobility platform can be used in order to implement model development within the study region. It is also expected that the results will be easily transferable to other regional backdrops or scalable without major adjustments, as a multi-client capable system is to be designed. (2) The idea respectively, the technical implementation concept for the mediation of mobility services (to concentrated endpoints) can be transferred to other application contexts. It can be used for large events, amongst others, or in other sectors of general interest. (3) The prototype developed and evaluated in the project may be easily adapted not only by the actors involved in the project but also by other actors (in the study region and beyond). It represents a strong basis for developing a marketable mobility portal of its own as a follow-up to the project. The developed concepts will be freely made available. (4) An essential success factor in the implementation of an economically viable mobility portal will be the derivation resp. establishment of a suitable operator and business models. Various innovative supplier co-operations and operator models are conceivable, which will be investigated in the course of the project. The possible realisation and transferability of the models will also be closely examined. The adaptability and transferability of the models is also expected. (5) The participation and involvement of various stakeholders in the project itself is a requirement for the success of the project. The stakeholders provide important information on the optimal design of the offer and can be a starting point for achieving a critical mass of participants for successful operation. Especially prosumer approaches like ridesharing require a critical mass. The forms and contents of cooperation established in the project could be continued in the addressed research region as well as transferred to other regions. (6) COSMO and ecco will take up the experiences from the project in the datacentered mFund programme of the BMVI and incorporate them in planned projects: The mFund programme focuses on mobility-related data traffic flows and services within the scope of Open Data. The planned data platform should on the one hand provide data for regional traffic management and on the other hand provide forecast and real-time data for the improvement of mobility services. The mobility portal conceived here could be an important data provider for corresponding data platform services (of course, considering all data protection conditions). COSMO and ecco have already successfully completed a preparatory feasibility study in 2017/2018.

MoPo Sane—Mobility Portal for Health Care Centers

65

References 1. Projekt-Telepflege Homepage. https://projekt-telepflege.de/. Last accessed 15 May 2019 2. Wagner vom Berg, B., S.: Konzeption eines Sustainability Customer Relationship Management (SusCRM) für Anbieter nachhaltiger Mobilität, S.: Shaker Verlag, In: Aachen (2015) 3. S.: Langfristige Sicherung von Versorgung und Mobilität im Landkreis Wesermarsch. In: Modellvorhaben des Bundesministeriums für Verkehr und digitale Infrastruktur. S.: Modellvorhaben gefördert vom BMVI (2018) 4. Kröcher, U., Barvels, E., S.: Bericht zur kleinräumigen Bevölkerungsprognose im Landkreis Wesermarsch. Bericht zur kleinräumigen Bevölkerungsprognose im Landkreis Wesermarsch im Auftrag für den Landkreis Wesermarsch, FD 91 - Büro des Landrats im Rahmen des BMVIModellvorhabens„Langfristige Sicherung von Versorgung und Mobilität in ländlichen Räumen, Oldenburg (2017) 5. Statista Homepage. https://de.statista.com/. Last accessed 15 May 2019 6. Wagner vom Berg, B., Uphoff, K., Gäbelein, T., Knies, J.: S.: Zmo-Target group-oriented mobility chains in health sector, In: Enviroinfo (2018) 7. Qixxit Homepage. https://www.qixxit.com/de/. Last accessed 15 May 2019 8. Nemo-Mobilität Homepage. https://nemo-mobilitaet.de/blog/de/start/. Last accessed 12 May 2019 9. Flinc Homepage. https://www.flinc.org/. Last accessed 13 May 2019 10. Wagner vom Berg, B., Cordts, M., Gäbelein, T., Uphoff, K., Sandau, A., Stamer, D., Marx Gómez, J., S.: Mobility 2020 - IKT-gestützte Transformation von Autohäusern zum regionalen Anbieter nachhaltiger Mobilität, S.: Tagungsband der MKWI 2016, In: TU Illmenau (2016) 11. S.: Niedersächsische regionale Innovationsstrategie für intelligente Spezialisierung (RIS3), pp. 3 (2014)

Platform Sustainable Last-Mile-Logistics—One for ALL (14ALL) Benjamin Wagner vom Berg, Franziska Hanneken, Nico Reiß, Kristian Schopka, Nils Oetjen and Rick Hollmann

Abstract This research examines the idea of developing a logistics service platform. The innovation factor is the creation of the platform utilizing the cooperation of all necessary stakeholders like the associated cooperation of various logistics service providers, customers, and a regional logistics service provider. The aim is to reduce traffic flows within regional traffic systems since only one regional logistics service provider is responsible for the inner-city delivery. In addition, the delivery should be carried out with cargo bicycles to utilize an environmental and climate-friendly delivery service. With the help of a SusCRM approach, the customers and service providers are to be sensitized to minimize the environmental impact. The interplay of all these components should help to make inner-city delivery more sustainable. Keywords Sustainability · SusCRM · Last mile logistics · Platform economy · Digital transformation

B. Wagner vom Berg (B) · F. Hanneken · N. Reiß · R. Hollmann Hochschule Bremerhaven, An der Karlstadt 8, 27568 Bremerhaven, Germany e-mail: [email protected] F. Hanneken e-mail: [email protected] N. Reiß e-mail: [email protected] R. Hollmann e-mail: [email protected] K. Schopka Rytle GmbH, Schwachhauser Ring 78, 28209 Bremen, Germany e-mail: [email protected] N. Oetjen Weser Eilboten GmbH, Am Grollhamm 4, 27574 Bremerhaven, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_5

67

68

B. Wagner vom Berg et al.

1 Introduction An increasing volume of traffic with simultaneously limited resources and stricter environmental and climate protection requirements are challenging traffic systems in almost all major cities at the same time. This applies to individual traffic as well as to urban commercial trade. In the medium term, it can be assumed that there will be urban areas where only emission-free vehicles are allowed to move. The core subject of the research project coordinated by the platform provider Rytle GmbH [1] and the University of Applied Sciences Bremerhaven [2] is the development and piloted application of a platform for the complete coverage of innercity traffic with the help of sustainable electric vehicles within a hub structure. This includes the integration of all possible use cases in the last mile (package delivery, retail delivery, etc.) as well as the integration of all actors of the last mile on a single platform. Central players in this system are the system provider of the platform and a regional logistics service provider. In this context, the (international) platform operator provides all the necessary hardware and software resources, including sustainable transport systems, for mapping transport requirements and processes. Ideally, the regional logistics service provider, as the only transporter in a regional system (such as a city) using the platform, will perform all transport tasks on the last mile. Thus, an efficient and therefore sustainable supply on the last mile is possible. This project focuses on the implementation of a smart logistics solution (Fig. 1) that combines sustainability and digitalization. The concept of the smart logistics solution integrates the three dimensions of sustainability. Sustainability includes not

Fig. 1 Smart logistics solutions

Platform Sustainable Last-Mile-Logistics—One …

69

only ecological but also social and economic aspects. For example, in terms of ensuring fair employment and meeting potential economic dependencies and imbalances within the platform economy. Electromobility is a core technology in the ecological dimension to ensure environmental and climate-friendly transport. These three aspects are related to each other. The crowd-workers employed through the platform receive fair and safe working conditions and can also foster long-term employments because of a better planning base when crowd-workers are only used in load peak situations. The transports and logistics processes handled via the platform are environmentally and climate-friendly. Furthermore, natural resources need to distribute fairly between current and future generations. Digitalization is not the core of sustainable development, but sustainable sets the boundaries for digitalization. The here proposed platform-based system enables the utilization of the transport system and coordinates and combines the supply and demand of logistics services within a city or region. There is a function for bundling of transport requirements on the platform, both regarding the products delivered and the dispatching or logistics companies (for example, UPS, GLS, DPD, DHL, etc.). This can achieve maximum bundling effects and optimization of the utilization using the platform. Each shipping address is bundled both regarding the sender and the products. The regional transport system thus also experiences a fundamental relief regarding the flow of goods. This combination of providers and buyers of logistics services is common practice in nationwide logistics, for example, to reduce empty runs. However, such freight exchanges are only occasionally used in regional/inner-city logistics. In addition, the platform not only brings together buyers and suppliers but connects all types of last-mile operators, such as service technicians, logistic area providers (e.g., cities). The platform provider in this project is the start-up Rytle. Rytle was founded in 2017 as a joint venture between the “Maschinenfabrik Bernard Krone” (majority shareholder) and the consulting company “Orbitak.” The Rytle system consists of a cargo bicycle (MOVR) standardized roll container (BOX) and a swap container (HUB), which can accommodate 9 boxes. Rytle already has software components that enable the system to be used effectively [1]. The role of the regional logistics service provider is taken over by the project partner Weser Eilboten GmbH [3] (belonging to the Ditzen business group and the regional letter service provider CITIPOST Nordsee) from Bremerhaven. For example, the Weser Eilboten (WEB) is already working as a subcontractor for GLS. Within the project, various use cases are developed, implemented, and evaluated to evaluate the viability and universality of the system—for example, nationwide parcels, deliveries of regional retailers, etc. The pilot city for the project is the “Seestadt Bremerhaven,” where the system is piloted and evaluated. There is currently a project called “Sustainable Crowd Logistics (NaCl)” [4] under the direction of University of Applied Sciences Bremerhaven and participation of Rytle and Weser Eilboten carried out. The project is being funded by EFRE-funds under the program for “Angewandte Umweltforschung (AUF)” of the state of Bremen and will end in June 2020. NaCl focuses on the implementation of the crowd approach, the SusCRM approach and other relevant topics for this project.

70

B. Wagner vom Berg et al.

Both the platform and systems, e.g., in the disposition, for the regional logistics service provider, which are either connected via an interface or directly on the platform, are possible. The aim is to support all processes, including marketing, and to achieve maximum bundling effects in order to enable a sustainable and traffic-saving supply of goods in a city (or rural region). The system of the platform provider offers ideal conditions for this because the hub structure saves space within the city, parking in the second row is unnecessary, and no driver’s license is required to move the cargo bikes. In addition, marketing at the b2b and b2c level are addressed on a sustainability-based CRM approach based on the existing Sustainability CRM (SusCRM) approach from the mobility domain [4]. All in all, the described research project can be represented by the following research question: “How must a holistic and sustainable platform solution for handling the entire last mile logistics on a conceptual level be structured to achieve improvements in economic, environmental and social dimensions?”

2 Fundamentals and Innovations 2.1 Fundamentals of the Platform Rytle system: The logistics concept is based on the four components: MOVR, BOX, HUB and APP. The MOVR is a professional cargo bike with a retractable and exchangeable transport box (BOX) with the standard dimensions of an euro pallet. Up to nine of these can be stored in a specially developed 10-foot standard interchangeable container (HUB), which functions as a mobile micro-depot. The main innovation of the logistics concept lies in the close networking between these hardware components via an intelligent software solution (APP) within a cloud. The goal of RYTLE’s holistic logistics concept is to make the last mile environmentally friendly and quiet. The cargo bike moves the transport boxes by means of physical force and two electrically powered wheel hub motors in the rear wheels. The cargo bike can transport loads up to 180 kg and reaches speeds of 25 km/h. The APP offers integrated Track&Trace software, locking options for MOVR, BOX and HUB, optimized route planning for last mile service providers and the option of networking crowd workers and end customers, as well as over 200 other use cases [1]. SusCRM: SusCRM is a CRM-System that integrates sustainability into the customer relationship. On the one hand the system is supposed to motivate the customer to make a sustainable choice and on the other hand it promotes the marketing of sustainable offers. Based on the collected data customers receive specific offers that promote sustainable mobility. Through environmental data the customer is able to compare his options with regard to sustainability. To analyze the data, business intelligence tools are used. The interaction with the customer happens via an end user application [5].

Platform Sustainable Last-Mile-Logistics—One …

71

Crowd logistics approach: The crowd logistics approach consists of outsourcing logistics activities to actors supported by a technical infrastructure. The aim is to generate an economic benefit for all actors involved [6]. In this project, the personnel structure of the regional logistics service provider is extended by crowd workers who are used at peak times. This leads to a reduction in the workload and a significant improvement in the work situation. Bundling: The bundling of parcel deliveries is a significant consideration in this project. This includes the bundling of different products as well as the bundling of services for different customers. Furthermore, delivery and pick-up are bundled in one tour as mentioned before the cargo bike can transport loads up to 180 kg which results in about 40 parcels per Box. CEP Services usually transport cargo with low volume with a weight up to 30 kg. Due to bundling a box will contain not only parcels but also different products such as letters, parcels, food, or medicines. Therefore, the number of product in a box can vary. A business case at the regional logistics service provider showed that by using bundling effects and Rytle’s sustainable logistics system, cost savings of 30% and time savings of 20% could be achieved. The cost reduction leads to an increase in the company’s return on investment.

2.2 State of the Art New offers from Start Up’s from the USA such as “Starship” [7] or “NURO” [8] are based, for example, on the Drones deployed either in the air or on the ground for delivery. At CES 2019, Continental introduced a system that uses autonomous mobile hubs and mobile robots with the ability to carry packages that handle lastmile logistics [9]. Other solutions address peer-to-peer business models that connect customers and suppliers via a digital platform [10]. For two years now, Uber has been operating in the logistics sector with a cloud-based platform using the platform to receive freight orders, creating a large logistics company. After success in the US market, Uber Freight also wants to tackle the European market. An interesting service of the platform is “Uber Freight +”, which rewards users for their participation. Rewards can include discounts on tires or lower prices for vehicle maintenance [11]. Foodora is a German start-up based in Berlin and active in more than six countries. Food delivery for more than 9,000 restaurants is covered by the platform-based delivery service. The food is picked up and delivered by a Foodora driver by bicycle. This simple concept has helped to reduce the number of delivery vehicles in many cities and create a sustainable solution [12]. At the same time, these approaches are fraught with problems of employment and fair pay [13]. BRINGG is a lastmile platform for companies. The start-up has shown that the systematic platform approach can also work in the b2b area. Major retailers such as Walmart or the franchise McDonald’s use BRINGG to manage their logistics and last-mile deliveries. The platform supports customers in operator allocation, route tracking, and system

72

B. Wagner vom Berg et al.

optimization. In addition, BRINGG services such as click-and-collect and crowdbased delivery are used by large companies [14]. The research project “KoMoDo” currently being carried out in Berlin involves the cooperative use of micro-depots by the CEP sector for the sustainable use of cargo bikes. “Berliner Hafen- und Lagerhausgesellschaft mbH” has set up a logistics area with seven sea containers, each of them forming a stationary micro-depot. Five national logistics service providers participate in the test phase and each uses a micro-depot which serves as a distribution and assembly point. Each logistics service provider handles its deliveries with its own cargo bikes and drivers. So, no bundling of logistics service providers will occur. Furthermore, there is no crowd logistics approach in the project “KoMoDo” [15]. The already mentioned project NaCl, in which all three partners are involved, can be interpreted as substantial preliminary work for the project. The central focus is on the crowd logistics approach. The crowd logistics approach solves significant delivery problems resulting from a lack of human resources. Significant efficiency gains on the part of the logistics service provider are expected due to reduced personnel costs and better staff availability, especially at peak times. Staff shortages can be eliminated in the long term and working conditions can also be improved. The core team can be converted into permanent employment because of its predictability, and peak loads can be covered, for example, by student employees. Topics such as bundling and SusCRM are also considered in the project but implemented only to a limited extent in terms of information technology. The crowd approach will be piloted in early 2020 with students from the University of Applied Sciences Bremerhaven [4].

2.3 Innovations Central components of sustainable urban logistics are being solved at once with the platform in the dimensions of ecology, economy, and social issues. The logistics system of the platform provider, with its sustainable characteristics and efficient lifting structure, is an innovation. Information technology components to support bundling regarding consignors and products, optimal capacity utilization, dynamic tour, and route planning will yield enormous additional efficiency gains. The crowd approach pursued here enables flexible but fair employment relationships. The main innovation is the implementation of a central platform solution for lastmile logistics based on electromobility. The platform provides all technical resources to fulfill the transport requirements. At the same time, it communicates supply and demand and coordinates all processes and transactions. It is not a pure b2b platform, but also the end customers (b2c) are involved. The platform is said to provide only one regional logistics service provider, which takes over all packages of national and regional customers and logistics service providers based on a white label solution. The resulting bundling prevents, for example, the fact that an address is supplied by several CEP service providers in one day

Platform Sustainable Last-Mile-Logistics—One …

73

and unnecessary paths are minimized. Furthermore, optimal and efficient route optimization can be carried out. Competing solutions may include components such as cargo bicycles or electromobility. However, the solution approaches are usually only used in a single organization and not nationwide. Urban areas are still supplied by several CEP service providers. Everybody uses their own supply network, and synergy effects are not exhausted. Multiple deliveries to the same address and insufficient capacity utilization are the results. To optimize the transport volume, not only a bundling regarding various service providers in the last mile is done, but also in the range of various products, such as letters, parcels, food, or medicines. Various near-shoring offerings are also intended to promote regional retail. In this case, the retail trade can be regarded as a large warehouse, where the customer can order regionally. The goods are delivered via the platform service provider, and same-day delivery is possible. As a result, the customer is motivated to shop regionally, and retailers have the opportunity to compete with the major online retailers. In general, the platform addresses central problems of platform economics, such as the economic drain on online trading platforms such as Amazon. Sustainability and fairness are, therefore, included as elementary objectives in the development of the business model. The regional logistics service provider plays a central role and receives the opportunity for sustainable business development. The SusCRM approach raises awareness of the sustainability of consumption and logistics at the b2b and b2c levels. The exhausted efficiencies enable better working conditions and at the same time, reduce the burden on the urban or regional transport system.

3 Conceptual Model The platform supports the interaction between the actors involved and the platform coordinates the processes and orders. Figure 2 illustrates the relationships between the actors. The actors are divided into the group’s customers, fulfillment and Third Party. The group customers include private customers, retailers and nationwide logistics service providers (e.g., UPS, Hermes, DHL). The fulfillment group is the operational unit providing the last-mile delivery. The regional logistics service provider is the central, one-of-a-kind deliverer and is responsible for ensuring a high level of service and efficiency in delivery. The customer sets his concrete transport requirements on the platform and is then referred to the regional logistics service provider. The crowd workers are either deployed by the regional logistics service provider or (in special cases, see Fig. 3, use case 5) also commissioned directly by the platform. The last group is the third party. These are, e.g. (regional) service partners responsible for the maintenance of the vehicle fleet. However, interest groups such as cities that are interested in the platform and participate or favor the operation are also possible. The platform is used as an example of five concrete use cases of application (Fig. 3), which are developed within the framework of the project. By using different

74

Fig. 2 Relationship between the actors

Fig. 3 Use cases

B. Wagner vom Berg et al.

Platform Sustainable Last-Mile-Logistics—One …

75

Fig. 4 Conceptual model

use cases, the universality of the platform can be evaluated, and maximum bundling effects are generated. The first use case includes nationwide parcel shipments, which are connected to the delivery of the last mile and the processes between the logistics service providers (also in terms of data technology) are mapped efficiently. The second application relates to rural areas. The subject of this application is the efficient supply of rural areas. Regional retail represents the third use case. Pickup & Delivery are implemented efficiently and dynamically in one tour and over an entire fleet. The fourth use case handles private parcel shipments. The coordination of the orders and the b2c business relationship are presented. The fifth application deals with the efficient use of crowd-workers in expansion to the NaCl project using a platform. Here, both the use of the regional logistics service provider and a direct order from the platform is presented. Figure 4 visualizes the conceptual model of the platform. The platform provider is the operator of the digital platform regarding all software and hardware components. Other players, as already mentioned, are customers, fulfillment and third party. The hardware includes MOVR, HUBs and BOXES. The software components include a customer app, a disposition for order processing, a tour and route planning for the realization of bundling effects and for optimal delivery of the last mile as well as a driver app for the coordination of tour and deliverer. In addition to other business application services in the areas of transaction processing, workflow management, etc., the Big Data tool is also used in the platform to generate added value. It is used for tour optimization. There is a network among the driver, the cargo bike and the customer. The customer also has the option of real-time tracking of his consignment. Certain services are provided by third parties. To ensure that the cargo bicycles and other vehicles are ready for use, they must be regularly maintained. The customer can make use of the services offered by the platform on a modular basis via appropriate hardware and software subscription in the pay-per-use model. Crowd-workers are provided as part of the Human Resource. The acquisition service

76

B. Wagner vom Berg et al.

aims to win customers and fulfill orders for fulfillment. Here the SusCRM approach is applied and extended.

4 Approach The scientific monitoring of the project is taken over by the University of Applied Sciences Bremerhaven. In this context, the University is responsible for general methodological support and the elaboration of a concrete concept for the implementation. Scientific support also includes dissemination and exchange with the relevant scientific community. The University continues to act as a coordinator between the individual partners, ensuring a smooth process within the project. After the use-cases were specified, the existing systems at the platform operator and the regional logistics service provider were examined regarding their implementation on the platform. Afterward, the concrete business models for the platform operator and the regional logistics service provider were designed. The business model for Weser Eilboten was designed in the sense of a unique regional service provider based on Rytle’s digital platform. New potential products and services were identified and opportunities for customer acquisition were explored. This is where the elementary SusCRM approach is implemented. After the definition of the business models, the necessary software properties were determined based on the requirements. The technical interfaces are identified regarding customers, regional logistics service provider and third-party providers. The outcome is the software architecture and an extension of the software specifications. The further development of the software for the logistics system is carried out by Rytle based on their existing systems. The university is also providing significant development resources. The outcome of this is the software prototype. For the pilot phase, Rytle is providing hardware, software, hardware service and induction training. The university is responsible for the acquisition and support of student drivers. The university is also responsible for the scientific support during the pilot phase. The pilot phase is carried out at Weser Eilboten with integration into the regular business operations. The phase is designed for three months. The pilot phase is evaluated within different dimensions (e.g. Software, personnel deployment). The evaluation is carried out by all three partners from different perspectives. The goal is to be able to make a fundamental statement about the usability and feasibility of the system platform and, if necessary, to identify optimization potentials. The evaluation results are an essential part of the final report.

Platform Sustainable Last-Mile-Logistics—One …

77

5 Risks Participation: Participation is key for a successful platform. From an economic point of view, participants experience a reduction in the cost of delivery per unit served by bundling and generate greater cost efficiency. Another reason for participation is that the companies involved can incorporate this sustainable concept into their marketing, since the concept generates improvements in ecology and mobility (SusCRM). Labor law risks with crowd workers: In terms of employment law, many questions remain unanswered. In this context, various issues of German labor law are to be considered in the project at an early stage and to focus on the interests of crowd workers. Quality aspect service: A timely and quality delivery with such a system is demanding and must be tested intensively under pilot conditions. However, in order to avoid possible delivery failures or quality defects within the pilot project under real conditions and possibly a loss of customers, a backup solution with corresponding material and personnel resources must be provided. Complexity and scope: Complexity and scope are particularly demanding on a technical level in this project (e.g. dynamic route planning). Rytle already has presystems that can be used and, if necessary, adapted. Subcontracts were considered for purchasing special know-how. Ownership and operator: In many projects of this kind (e.g., ICT platform in “Schaufenster Elektromobilität”) the operator question within the utilization was not clarified and resulted in a non-utilization. Here, with Rytle, an actor from the beginning is clearly identified. Furthermore, in this context, the business model is subject to many uncertainties for both the platform provider and the regional logistics service provider. Legal aspects: The handling of legal aspects via the platform is a risk, as contracts are concluded between the regional logistics service provider and various players (retailers, national logistics service providers, private customers, and crowd workers). In addition, insurance must be taken out for all transport routes that arise and are handled via the platform. Interfaces: The risks when considering the interfaces are the necessary homogeneity of the data as well as the amount of data to be transferred. The platform has interfaces to the third-party providers, to the customers and to the regional logistics service provider. Customers who want to execute their deliveries via the platform using all different software systems to register their order data. The order data of all customers must be transmitted to the platform in a specific format with a specific data structure to ensure correct data transmission.

78

B. Wagner vom Berg et al.

6 Future Outlook A prototypical implementation and a pilot phase are planned within the named funding call. The aim of the pilot phase is to test the prototype in practice and to evaluate whether it can be successfully implemented. Thereby different parameters have to be considered. As shown in the Conceptual Model, it must be guaranteed that all components (hardware, software, services) of the platform will work and can be used without errors. It is also important to find out if the platform is accepted by the various logistics service providers and the platform can handle the planned requirements. The regional logistics service provider needs to be prepared for the need for additional staff to manage the increasing volume of parcels. In addition, the interaction of the various actors must be tested and evaluated later.

References 1. Rytle GmbH Website. https://rytle.de/. Last accessed 03 July 2019 2. Hochschule Bremerhaven Website. https://www.hs-bremerhaven.de/. Last accessed 03 July 2019 3. Wesereilboten. http://weser-eilboten.de/. Last accessed 03 July 2019 4. NaCl Website. https://www.hs-bremerhaven.de/nacl. Last accessed 29 Mar 2019 5. vom Berg, B.W.: Konzeption eines Sustainability Customer Relationship Management (SusCRM) für Anbieter nachhaltiger Mobilität. Shaker Verlag, Aachen (2015) 6. Mehmann, J., Frehe, V., Teuteberg, F.: Crowd-logistics—a literature review and a maturity model. In: Kersten, W., Blecker, T., Ringle, C. (eds.) Innovations and Strategies for Logistics and Supply Chains, pp. 118–145. epubli GmbH, Hamburg (2015) 7. Starship Website. https://www.starship.xyz/. Last accessed 03 July 2019 8. Nuro Website. https://nuro.ai/. Last accessed 03 July 2019 9. Continental corporation Website. https://www.continental-corporation.com/en/press/pressreleases/ces2019–157096. Last accessed 29 Mar 2019 10. Transmetrics Website Blog. https://transmetrics.eu/blog/logistics-of-the-future-best-last-miledelivery-startups/. Last accessed 29 Mar 2019 11. Uberfreight Website. https://www.uberfreight.com/. Last accessed 29 Mar 2019 12. Foodora Website. https://www.foodora.de/. Last accessed 29 Mar 2019 13. Carbone, V., Rouquet, A., Roussat, C.: The rise of crowd-logistics: a new way to co-create logistics value? J. Bus. Logist. 38(4), 238–252 (2017) 14. Bringg Website. https://www.bringg.com/. Last accessed 29 Mar 2019 15. KoMoDo Website. https://www.komodo.berlin/. Last accessed 25 June 2019

Scientific Partnership: A Pledge For a New Level of Collaboration Between Scientists and IT Specialists Jens Weismüller and Anton Frank

Abstract ICT technologies play an increasing role in almost every aspect of the environmental sciences. The adaption of the new technologies however consumes an increasing amount of the researcher’s time, time they could better spend on their actual research. Not adapting new technologies however will inevitably lead to biased research, since scientists will not know about all the possibilities and methods that are available from modern technology. This dilemma can only be resolved by close collaboration and scientific partnership between researchers and IT experts from i.e. a local computing center. In contrast to traditional IT service provision, IT experts have to understand the scientific problems and methods of the scientists in order to help them to select suitable services. If none are available, they can then consider adapting existing services or develop new ones according to the actual needs of the scientists. In addition, the partnership helps towards good scientific practice, since the IT experts can ensure reproducibility of the research by professionalizing the workflow and applying FAIR data principles. We elaborate on this dilemma with examples from an IT center’s perspective, and sketch a path towards unbiased research and the development of new IT services that are tailored for the scientific community. Keywords Escience · Computational science · Partnership · Collaboration · IT services

1 Introduction In almost every current scientific process, and in particular in the environmental sciences, ICT technologies play a vital and ever increasing role, and computer simulations have been widely accepted as the third pillar of research in addition to J. Weismüller (B) · A. Frank Leibniz Supercomputing Centre, Garching near Munich, Germany e-mail: [email protected] A. Frank e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_6

79

80

J. Weismüller and A. Frank

experiment and theory. In recent years, an ever-increasing amount of data is constantly produced, from modern experimental and observational methods, as well as from simulations for the implementation and verification of mathematical models. Therefore, data analytics has shifted into the focus of many scientists. Although one could argue that the evaluation and analysis of data is at the very center of the scientific process since humans first started to scientifically investigate their environment, some computer scientists now call data analytics the forth paradigm alongside experiment, theory and simulation [1]. All of these developments require an ever-increasing amount of computing power, data storage and data processing capabilities. Classically, IT centers have focused on providing these resources to the scientists on request, concentrating on the economical procurement, efficient operation and provisioning of high-quality services based on very modern IT hardware. However, despite high-level training courses, ticketbased support systems and educational measures, it is still mainly left to the scientists to put the tools to their best use. While science has come quite a long way since digitization started its triumph 50 years ago, the situation today has changed: The ease of using remote resources has increased dramatically, allowing users to access compute and storage systems all around the world. Thus, users can access many more systems than previously, and switch between compute centers quite easily. However, this requires portable code and data, rendering predictable and reproducible codes and workflows more challenging to design and implement. Furthermore, Moore’s law has held up for a remarkable amount of time, allowing high-end systems to increase their compute performance exponentially over many decades. The price for this growth is an ever-increasing complexity and diversification of the hardware: Up to the turn of the millennium, increases in compute performance for high-end systems have been achieved by increasing the clock frequency, with no significant impact on the programming model. When the peak of frequencies was reached, many-core architectures started to dominate, requiring shared-memory parallelization within the user codes. Today, many different hardware architectures are on the market, each designed for very specific tasks, and each requiring a highly specialized programming model: CPUs, GPUs, accelerators, hybrids, etc. For highend HPC applications, this means that every few years, codes have to be adjusted and optimized for every new hardware generation. This requires an ever-increasing amount of the researcher’s time, just to keep up with the complexity of modern hardware. The development is quite similar for research data: New technologies enable scientists to generate an ever-increasing amount of data from various sources, such as automated experiments, internet of things devices and smart sensors among many others. In addition, the rapid increase in computing power enables the generation of ever more processed and derived data sets as well as simulation output. Overall, the amount of research data is growing exponentially [2], with no end in sight in the near future.

Scientific Partnership: A Pledge For a New Level …

81

2 Knowledge and Know-How Biases The exponential increase in research data leads to a dilemma: Due to a lack of technical means to support discoverability and accessibility, typical researchers are using only a portion of the data that would actually be relevant for their research. In other words, only data that are known to the researchers influence their work. This means that the researchers are not able to select data according to their scientific relevance and quality, and their research will be unintentionally biased. The more data is available, the bigger the knowledge bias of the resulting research tends to get in comparison. We will call this the knowledge bias [3]. On the other hand, a know-how bias results from the growth in the sector of computational tools and methods for data analysis and simulation. Consider for example machine learning applications, natural language processing or high performance computing. Significant training is necessary to apply these techniques, and thus with a growing number of tools available, researchers concentrate on comparatively few tools they are familiar with. The resulting bias in the scientific output tends to be worst for researchers who cannot resort to scientific and private service providers to bridge the knowledge gap [3].

3 The Issue with IT Services Traditionally, IT centres offer a service portfolio and build all of their operations and services around this portfolio. For example, they may have a storage solution, a cloud computing service, an HPC cluster and so on. Scientist on the other hand work differently: Their work starts with a research idea, which may evolve, change, and eventually get complemented with a scientific workflow. The next step is now to extract those parts of the workflow where IT is required, and figure out the necessary services. If scientists go through this evolution without support by an IT expert, they will only rely on IT services that they are familiar with and implicitly bring a know-how bias into their scientific workflow. If on the other hand they involve an IT expert from the very beginning, right when they starts with their idea, they might be able to develop a much more powerful workflow together. Reference [4] describes a model called the “partnership initiative computational sciences”, which is essentially a series of workshops between the LRZ supercomputing center and some of their customers. These workshops focus on “partnership, relationship management and approaches that aim at maximizing the flexibility and ability to innovate” [4]. The goal is to foster mutual understanding and provide tailored support on top of the regular IT services provided by the center.

82

J. Weismüller and A. Frank

The Netherlands eScience Center is pursuing a similar approach: In [5] they describe their efforts in bridging the gap between IT centers and scientists by collaborative projects, tailored research software and the coordination and management of events and workshops.

4 The IT Expert’s Perspective A typical situation in the computational sciences is the following: A computer scientist develops a new method, maybe a numerical scheme, an algorithm or a data management solution. Once the method is mature, he starts looking for applications that could benefit from the method. More often than not, he finds some application he can use to demonstrate the method, but nobody from the application side ever actually applies it, since the IT expert is mostly interested in his research, and does not have the time or the incentive to invest into the accessibility for an application scientist. By working together in a collaborative manner, both the IT expert and the application scientist could profit greatly: The application scientist would have access to new methodologies, while the IT scientist would learn about problems that are actually relevant to the application. He could then focus his research into these directions. An example for this gap between the IT specialist and the application is urgent computing: Quite some effort has been put into the development and evaluation of operation modes for supercomputers that allow for urgent access [6–8]. While i.e. natural disaster management could actually take quite some advantage of an urgent mode on an HPC system, disaster management authorities are often not interested in working together with a supercomputing center, but rather invest into their own high-availability hardware. On the other hand, the collaborative approach has been working quite well in a different area, namely in Math and Physics: Physicists keep discovering new phenomena that have to be explained by new mathematical models. Mathematicians then develop the necessary theories, advancing their own field in the process. This only works because Mathematicians and Physicists have a solid understanding of the field of their counterparts, backed up with centuries of institutionalized collaboration. At the IT center it is obvious that the focus of interest concentrates on the technological developments in the ICT sector. In order not to miss an important development and because of the volatility of the market, it is important to concentrate on the ongoing developments. The provisioning of services is often very close to the services provided by industry. In many cases, this works very well, since the services requested by the users match perfectly. However, the existence of a service and its strong promotion by the IT industry does not necessarily mean that it is a perfect match for science. There are many examples where technologies have not been taken up by scientists as it would have been expected by the hype status it has had with industry. This related to cloud

Scientific Partnership: A Pledge For a New Level …

83

computing a few years ago, and currently we can see a similar notion with machine learning. On the other hand, grid computing and its federation of computing resources had a great potential for scientists, if it would have been possible not only to share data, but also compute power or even any other resource. However, due to its distributed, federated, non-centrally managed nature, it did not qualify for a business model. In contrast, cloud computing first started off in IT industry with a sound business model, which could easily finance the infrastructure and did not need subsidizing. Looking at today’s scientific collaboration processes with worldwide distributed teams and a seamless use of data and resources, the grid would have been the infrastructure more suited to science. A very good example is the Large Hadron Collider Computing Grid, which is a real success story. The approach now is what is called a federated cloud infrastructure. It is turning a commercial technology into something that could potentially serve to a certain extent the needs of researchers. However, this means to take quite a detour until scientists can be provided with what their IT needs would actually be. This can result in a substantial delay in the scientific progress.

5 An Analogy from Medicine We would like to introduce an analogy: Let us take a look at a patient with an uneasiness—not really sick, but not feeling well. Such a patient will most likely go to a pharmacy and order some drug, which from her/his point of view will make the pain go away. Often the patient is only interested in fighting the symptoms by taking pain killers, hoping that time will heal the cause (which indeed often is the case). However, if this is not sufficient, she/he will go see a physician, who will do an anamnesis, then perform an examination, and finally subscribe a treatment or a medication. The patient will then go to his local pharmacy to obtain the drug based on the subscription of the physician. Transferring this example to the scientific process, the scientist who runs into IT barriers in research relates to the patient. What he will probably do as a first try is to experiment with some technology he has heard of, and probably go to an IT service provider to order some services that he believes can solve his problem. However, this is comparable to a patient who first tries some drugs he finds in his shelf, and if none helps she/he goes directly to a pharmacy to get some other, maybe stronger drugs (i.e., the scientist looking for bigger, more powerful machines). In this scenario, one obvious building block is missing: it is the physician. The scientist should first see an expert who understands the science as well as IT technology, get an anamnesis by describing his problem to the expert, and then have the expert examine his problem in detail. Only then should he go and order services that he knows will help solve his problem. What works so well for diseases does not work at all for deficiencies in digital science. Why? It seems that IT support is comparable to the medical treatment in

84

J. Weismüller and A. Frank

the 17th century. Many people know about medicinal herbs or natural ointments; others swear that prayers or spells help as well. A structured process of anamnesis, diagnosis and therapy, which today is taken for granted, did not yet exist. The most obvious deficit is the “physician”, who is currently missing in the scientific process in many occasions. One may argue that there are many IT experts around, who intensively support the scientists when using high-level IT resources. However, if we take a closer look, than most of the time the scientist defines the “therapy”, i.e., the expert supports whatever IT solution is requested. Normally, the IT expert is not trying to analyze the IT problem that the scientist needs to handle, and therefore he is normally not the one to propose the respective IT solution. What is generally accepted in the medical field seems to be a disqualifying criterion in scientific IT support. Therefore, it should be taken into consideration to redefine the role of the IT expert and his inclusion in the scientific process. This, however, requires a mind-change also with the scientists and consequently a new level of cooperation and trust. In the following, we will present and discuss possible measures to reach this new level of collaboration.

6 A Path Towards Computationally Unbiased Research When scientists have an idea, they will think about their science, and how to advance their field into a new direction. Ideally, they will not think about technical feasibility, but about the new science they will be able to uncover. In practice however, they will be biased with knowledge and know-how biases based on their previous experiences with IT systems, possibly unknowingly discarding ideas that could lead to a breakthrough, because they implicitly assume that technically, the idea will not be feasible. For example, a modeler might be running their familiar scientific code on their laptop routinely to tackle all kinds of different questions in their field. They now have a new idea about some phenomenon that they would like to test in their model. However, this phenomenon can only be resolved by resolutions that are several orders of magnitude larger than what they can run on their laptop. They might have heard about HPC systems, but never really considered that they might be an option for them, often simply due to unfamiliarity with the matter. They will thus drop the idea and move on to other matters. Had they simply known about their local IT center and have a contact there, they might have contacted the experts in due time and developed a solution together with them. Our experience with the piCS workshops at the Leibniz Supercomputing Centre [4] support this: a typical feedback we received after them was “had we known about these possibilities earlier”.

Scientific Partnership: A Pledge For a New Level …

85

7 Supporting the Complete Workflow Within a trusted partnership, the IT experts learns a lot about the research methods that the scientists apply. With this knowledge, they can then support the full workflow with appropriate IT methods. Within recent projects based on partnerships with hydrologists for example, we have gathered experiences with this approach. As described in [9], a complex workflow has to be executed for an accurate prediction of hydrological extreme events such as flooding and flash floods. First, a weather or climate model needs to be run to obtain information about the atmospheric situation (i.e. [10]). Next, the information has to be downscaled to the hydrologic scale to drive a hydrological model, estimating the runoff (i.e. [11]). This in turn drives the hydrodynamic models (i.e. [12]), which once again needs detailed surface information. Also, subsurface flow needs to be estimated and fed back into the hydrological and hydrodynamic models. Ideally, hydrological, hydrodynamic and subsurface models are all coupled together to represent the complex interactions. Next, the fluvial flooding from the hydrodynamic model together with the pluvial flooding from the hydrologically estimated runoff can be combined into a flood prediction, or a statistical analysis of flood risks. Finally, the results can be used for an impact assessment, either for the authorities for an accurate response, or in terms of mid- to long-term civil protection and public information. Similarly, we support several projects in the atmospheric sciences, from the enormous amounts of data generated by remote sensing1 and climate models [10], to particle tracing for different trace gases, all the way to operational pollen information networks.2 Similar workflows also apply i.e. to the solid Earth as well as many other applications from various fields. Usually, one does not have to apply the full workflow, but a given subset is sufficient to answer the research question at hand or fulfil the operational task. This however leads to a huge variety of tools and methods, each tailored to an individual task. More often than not, scientists code their own models, specifically tailored to their particular problem. Due to its complexity, setting up one toolchain for the full workflow is probably unrealistic. However, a modular and adaptive system could help a lot of scientists and public authorities with their workflows, releasing them of the necessity to re-write new code for very similar purposes. A first step in this process could be the creation of common interfaces. In meteorology, NetCDF3 has become the de facto standard for spatially and temporally distributed datasets. For hydrological models however, each model implements their own standard, requiring every scientist to write their own converters to enable the data flow between the models.

1 i.e.

http://www.so2sat.eu/.

2 https://epin.lgl.bayern.de. 3 https://www.unidata.ucar.edu/software/netcdf/.

86

J. Weismüller and A. Frank

8 E-Infrastructures and Research Data Management An important step in countering the knowledge- and know-how biases has been the introduction of wide e-infrastructure solutions such as PRACE [13], GEANT [14] or EUDAT [15]. By offering services such as unified data and meta data repositories, large-scale networking solutions, unified access to large scale systems and many others, they simplify the access to modern ICT technology significantly. However, there are downsides to the approach, since e-infrastructures can introduce their own biases: Once a researcher gets familiar with an existing e-infrastructure, he becomes inclined to stick with it, and disregard data or tools that are stored and hosted by other infrastructures. Thus, it is necessary that also the infrastructure providers work closely together, linking their infrastructures to each other’s, explaining to the scientists the limitations of their tools, and maybe even pointing them towards competing offers. Also, meta-infrastructures can play an important role, giving access to many other infrastructures via a single point of access. For example, the GeRDI infrastructure [16] aims to provide a single point of access to research data from various data repositories, enabling scientists to browse through various data and linking disciplines. Furthermore, the underlying idea between wide digital infrastructures is to introduce a digital abstraction layer between the service or data provider and the user, rather than a human interface. While this seemingly counteracts the idea of personal contact within a trusted relationship, e-infrastructures are actually an important component in modern IT based research, since they enable access to sources that would otherwise be practically inaccessible. However, their use has to be complemented by personal contact between scientists and their trusted IT partners, who are the ones that should have wide knowledge over existing tools and methods.

9 Science Managers, Embedded Librarians, Etc. To build a trust relationship between scientists and IT experts, we have proposed a partnership model that can be elaborated insofar that IT experts are partially or even fully integrated into the scientific team. Although this may seem a quite unusual approach, it has already been exercised in other areas of science support activities. In the field of science management, it is quite common to have non-scientific people involved in a scientific team to provide the necessary link to administration or public relation. A not so well known but by no means less interesting concept is the idea of so-called “embedded librarians” [17], resembling the notion of embedded journalists for war news coverage. This also stresses the idea of including a librarian fully or in a part-time fashion into a team of researchers to support this team with expert knowledge on information retrieval, publications, etc. One can see that not only the approach, but also the expectations concerning quality, efficiency, resilience, etc. are

Scientific Partnership: A Pledge For a New Level …

87

very similar to the approach that we propose for the inclusion of IT specialists into scientific teams. Therefore, we expect that in the future, scientific teams will more and more turn into a conglomerate of scientists working on the core scientific issues. Supporting persons will not directly be involved in the scientific process, but support it strongly— not in a subordinate fashion like some student software developer or assistant lab technician, but as a partner with a specific high-level expertise. This immediately leads to the issue, how this important contribution to the scientific process can be sufficiently acknowledged. Without addressing this issue any further, we would like to point to the FAIR4 data principles [18], and mention that systems like i.e. CRediT5 have a great potential to be of help here.

10 Need for Partnership As sketched above, it is most important for application scientists to partner with IT specialists. Scientists from the ICT fields may be a good partner at first, but they are usually mostly interested in demonstrating their own research with some sample application. For the sample applications, this might be a very fruitful collaboration. For most other scientists however, the findings are buried in ICT literature, inaccessible to them in practice. An IT center can help bridging the gap, if a trustful partnership exists. We may even argue that in order to follow a good scientific practice, this collaboration is mandatory: In order to avoid biased research, application scientists need to adopt modern IT technologies. Yet, applying new technologies without fully understanding them can then severely impact reproducibility requirements. Let us take for example a climate scientist, who wants to evaluate his dataset with the newest deep learning methods. On his own, he might be able to find some exciting features with this new technology. However, he might have unknowingly overtrained his algorithm, leading to doubtful and non-reproducible results. For a reliable, reproducible evaluation, the methods need to be understood in detail. It is imperative to understand them well enough to know their potential, but also their limits. We thus argue once more for an institution as an IT center to understand the methods developed by the ICT scientists, and transfer the knowledge over to the applications. To go back to the physician analogy: The best physician does not only give advice, but will develop a therapy based on a trusting partnership with the patient. The same holds true for IT service providers: The best centers do not only give advice, but build a trusting partnership to find the best methods together with the scientists.

4 FAIR:

findable, accessible, interoperable, reusable.

5 https://www.casrai.org/credit.html.

88

J. Weismüller and A. Frank

References 1. Hey, T., Tansley, S. Tolle, K.M.: The Fourth Paradigm: Data-Intensive Scientific Discovery, vol. 1, Microsoft research, Redmond, WA (2009). ISBN 978-0-9825442-0-4 2. Szalay, A., Gray, J.: 2020 computing: science in an exponential world. Nature 440, 413 (2006). https://doi.org/10.1038/440413a 3. Hachinger, S., Nguyen, H., Weber, W.J.: Addressing knowledge and know-how biases in the environmental sciences with modern data and compute services. In: EnviroInfo 2017, From Science to Society: The Bridge provided by Environmental Informatics (2017) 4. Frank, A., Jamitzky, F., Satzger, H., Kranzlmüller, D.: In need of partnerships—an essay about the collaboration between computational sciences and IT services. Procedia Comput. Sci. 14, 1816–1824 (2014). https://doi.org/10.1016/j.procs.2014.05.166 5. Hazeleger, W.: Annual Report 2017: Enabling Digitally Enhanced Science. https://www. esciencecenter.nl/2017/ (2017) 6. Beckman, P., Nadella, S., Trebon, N., Beschastnikh, I.: SPRUCE: A System for Supporting Urgent High-Performance Computing, in Grid-Based Problem Solving Environments, pp. 295–311. Springer, Boston, MA (2007). https://doi.org/10.1007/978-0-387-73659-4_16 7. Cope, J.M., Trebon, N., Tufo, H.M., Beckman, P.: Robust data placement in urgent computing environments. In: IEEE International Symposium on Parallel & Distributed Processing, pp. 1–13 (2009) 8. Leong, S.-H., Frank, A., Kranzlmüller, D.: Leveraging e-infrastructures for urgent computing. Procedia Comput. Sci. 18, 2177–2186 (2013). https://doi.org/10.1016/j.procs.2013.05.388 9. Weismüller, J., Gentschen Felde, N., Leduc, F.A.: Advancing the understanding and mitigation of hydrological extreme events with high-level IT services. In: EnviroInfo 2017, From Science to Society: The Bridge provided by Environmental Informatics (2017) 10. Leduc, M., Mailhot, A., Frigon, A., Martel, J.-L., Ludwig, R., Brietzke, G.B., Giguére, M., Brissette, F., Turcotte, R., Braun, M., Scinocca, J.: ClimEx project: a 50-member ensemble of climate change projections at 12-km resolution over Europe and northeastern North America with the Canadian regional climate model (CRCM5). J. Appl. Meteorol. Climatol. (2019). https://journals.ametsoc.org/doi/full/10.1175/JAMC-D-18-0021.1 11. Willkofer, F., Schmid, F.-J., Komischke, H., Korck, J., Braun, M., Ludwig, R.: The impact of bias correcting regional climate model results on hydrological indicators for Bavarian catchments. J. Hydrol. Reg. Stud. 19, 25–41 (2018). https://doi.org/10.1016/j.ejrh.2018.06.010 12. Reisenbüchler, M., Liepert, T., Nguyen, N.D., Bui, M.D., Rutschmann, P.: Prelimirary study on a Bavaria-wide coupled hydrological and hydromorphological model. In: Proceedings of the 32nd EnviroInfo Conference, Garching, Germany, pp. 145–148 (2018). https://doi.org/10. 2370/9783844061383 13. Affinito, F.: PRACE Annual Report 2017. http://www.prace-ri.eu/ar-17/ (2017) 14. Stöver, C.: GÉANT Compendium of National Research and Education Networks in Europe. https://compendium.geant.org (2016) 15. Lecarpentier, D., Wittenburg, P., Elbers, W., Michelini, A., Coveney, K.R.P., Baxter, R.: EUDAT: a new cross-disciplinary data infrastructure for science. Int. J. Digit. Curation 8, 279–287 (2013). https://doi.org/10.2218/ijdc.v8i1.260 16. de Sousa, N.T., Hasselbring, W., Weber, T., Kranzelmüller, D.: Designing a generic research data infrastructure architecture with continuous software engineering. In: 3rd Workshop on Continuous Software (2018)

Scientific Partnership: A Pledge For a New Level …

89

17. Dewey, B.: The embedded librarian: strategic campus collaborations. Resour. Shar. Inf. Netw. 17(1/2), 219–227 (2004) 18. Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L.B., Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., GonzalezBeltran, A., Gray, A.J.G., Groth, P., Goble, C., Grethe, J.S., Heringa, J., C’t Hoen, P.A., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., Mons, B.: The FAIR guiding principles for scientific data management and stewardship. Sci. Data, vol. 3, pp. 160018 (2016). https://doi.org/10.1038/sdata.2016.18

Emission-Based Routing Using the GraphHopper API and OpenStreetMap Martin Engelmann, Paul Schulze and Jochen Wittmann

Abstract Current online route planning systems available do not provide options for using an ecological route search. In this project an attempt was made to create a route planning programme which optimizes nitrogen oxide (NOx ) emissions through driving. For this purpose, published emission data for diesel vehicles with EURO 5 and EURO 6 emission standards (HBEFA 3.3) were used. It is expected that the emission factors for nitrogen oxide will influence the calculated routes compared to the fastest route. The program GraphHopper has been extended with weightings of nitrogen oxide emission factors for diesel vehicles in accordance to the EURO 5 and EURO 6 emission standards. OpenStreetMap data was used to provide map material. With the modified GraphHopper, three different routes were calculated. The fastest and the NOx -optimized routes were compared with each other. It could be shown that an ecological route guidance that takes the pollutant NOx into account is possible by modifying the GraphHopper. Roads with a maximum permitted speed of ≥ 100km/h such as motorways and country roads were predominantly avoided with a NOx -optimized route calculation. Low-speed roads such as roads in residential areas were also avoided. In all evaluated cases there was no NOx -optimized route found for which the estimated travel time is less than with a speed-optimized route calculation. Keywords GraphHopper · Emission-based routing · Car route planning · NOx

M. Engelmann (B) · P. Schulze · J. Wittmann Hochschule für Technik und Wirtschaft Berlin, University of Applied Sciences, Wilhelminenhofstraße 75A, 12459 Berlin, Germany e-mail: [email protected] P. Schulze e-mail: [email protected] J. Wittmann e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_7

91

92

M. Engelmann et al.

1 Introduction 1.1 Motivation As a result of ever more efficient combustion engines and progressive improvements in exhaust gas purification in motor vehicles, the specific emissions of pollutants and of the greenhouse gas carbon dioxide per person-kilometer or tonne-kilometer in Germany have fallen significantly since 1995 [1]. Between 1991 and 2016, mileage in Germany increased by 31% for passenger transport and 71% for freight transport [2]. The emission reductions achieved through technical measures are partially offset by the increase in absolute mileage in Germany [3]. The environmental impact of motor vehicles cannot be sufficiently reduced by technical improvements to motor vehicles alone in order to make a significant contribution to Germany’s climate protection goals [3]. Various measures for the reduction of traffic-related emissions have been implemented by the Federal Environment Agency [4] discussed. The trend in drive technology of vehicles is currently moving towards electromobility [5]. This is particularly evident in the German government’s current goal of making Germany a lead market for electromobility [6]. To reduce vehicle-specific emissions, the best solution in the long term is to convert vehicle drives to all-electric or fuel cell-based systems [5]. In short term, emission-optimized route planning to reduce vehicle-related emissions would be an alternative solution.

1.2 Current Status Route Planning. Current route planning systems available online do not give the possibility to carry out an ecological route search [7–11]. In most cases, the user is only asked whether they want the fastest or shortest route or whether they want to avoid certain road types such as motorways. In the field of vehicle-integrated route planning systems, the possibility of ecological route optimization has been offered for years. The fuel consumption criterion based on a fuel consumption model is used to calculate the most ecological route [12, 13]. The “fUel-SAVing trip plannEr” (USAVE) is a model for calculating the most fuel-efficient route based on the vehicle type, driving style and relevant road data such as the surface area [14]. Depending on the track geometry, fuel-efficient routes can be modelled, which tend to aim for a uniform course, taking into account turning processes and cornering as well as elevation meters [15]. Emission-Optimized Route Planning. In the field of emission-optimised route planning, a number of papers focusing on carbon dioxide (CO2 ) have been published. The authors Zeng et al. have demonstrated in detail that CO2 emission models are not absolutely necessary to calculate a CO2 emission-optimized route. Calculation models are based on easy to determine factors such as average speed, acceleration

Emission-Based Routing Using the GraphHopper …

93

and gradient [16]. In the area of route optimization for commercial vehicles, Figliozzi shows that, in addition to vehicle utilisation, other approaches to reducing CO2 emissions should also be considered. As CO2 emissions are scaled linearly with fuel consumption [17] an emissionoptimized route plan according to the CO2 criterion is nothing more than an optimization with regard to fuel consumption. In addition to CO2 , the combustion process in the engine emits other undesirable substances such as CO, HC, SOx , particles, NH3 and NOx . With the exception of NOx , these emissions are already negligibly low [18]. The NOx emissions result primarily from the oxidation of atmospheric nitrogen during the combustion process (Zeldovich mechanism) [19]. Consequentially there is no correlation between fuel consumption and NOx emissions [17]. Nevertheless, data exist on NOx emissions in certain driving situations. Emission factors for road vehicles can be found in the Handbook on Emission Factors for Road Traffic. It contains significant amounts of data for different vehicle categories, pollutants, road categories, road inclination, etc. (see also: http://www.hbefa.net/). The following figure shows the published data on nitrogen oxide emissions from EURO 5 and EURO 6 diesel vehicles from the Handbook for Emission Factors in Road Traffic [20] which can serve as a data basis for emission-optimized route planning. The figure shows that at an average speed of less than 30 km/h and more than 130 km/h, the specific nitrogen oxide emissions of diesel vehicles are significantly higher compared with average speeds between 30 and 100 km/h. The figure also shows that the specific nitrogen oxide emissions of diesel vehicles are significantly higher at an average speed of less than 30 km/h and more than 130 km/h compared with average speeds between 30 and 100 km/h. Additionally, the figure below shows that many approaches for calculating fuelefficient or CO2 -efficient routes already exist and are applied in practice. By taking into account the expected CO2 emissions and fuel consumption in the routing of the route, in particular the current traffic situation, it is possible to further reduce CO2 emissions and fuel consumption. What an emission-optimized route looks like in terms of NOx emissions is still unclear. Hypotheses. It is assumed that an ecological route search that takes NOx as a pollutant into account is possible. An existing routing engine (Sect. 1.3) can be modified for this purpose. Taking into account the emission data (cf. Emission-optimized route planning) for diesel vehicles of the EURO 5 emission standard, various hypotheses can be formulated. In the speed range between 50 and 90 km/h, the minimum for EURO 5 vehicles is the emitted NOx emissions per kilometer. This leads to the assumption that NOx -optimized route calculation will predominantly avoid roads with a maximum permitted speed of 100 km/h, such as motorways and country roads without speed limits. The higher nitrogen oxide emissions of EURO 5 vehicles at low speeds of up to 30 km/h suggest that NOx -optimized route calculation avoids roads in residential areas and traffic-calmed roads.

94

M. Engelmann et al.

The two theses above lead to the further assumption that the estimated travel time for the NOx -optimized routes will be longer than route calculations calculated according to time optimization.

1.3 Problem The determination of an emission-optimized route plan can be reduced to the shortest route problem. This is a minimization problem from graph theory. The road network is modelled as a directed graph whose arrows have lengths or costs. The shortest path through the directed graph can be calculated, if the length between each node is known. Optimized routes like fastest route or the most fuel-efficient route can be determined by adding a weighting like the average speed or the average fuel consumption to each edge. The calculation of the optimized route is based on a cost function. Each edge is assigned corresponding costs by means of factors. The optimization is carried out by determining the route with the lowest costs [21, 22]. Another optimization target must be defined for emission-optimized route planning. It is necessary to develop a weight function or cost function to map the vehicle-specific emissions. For the calculation of an emission-optimized route plan existing programs for route planning can be used. Some routing engines already exist for the OpenStreetMap project that take in account alternative cost functions. OSRM and GraphHopper are especially worth mentioning here [23].

2 Methodical Approach 2.1 Routing Algorithms The most common graph algorithms to solve shortest path problems are the Dijkstra algorithm, A* algorithm, Bellman Ford algorithm, and the Floyd-Warshall algorithm [24–26]. In addition, acceleration algorithms exist for the graph algorithms mentioned above. The best known are Contraction Hierarchies and Transit Node Routing [24, 25]. In this project the bi-directional Dijkstra algorithm was chosen using contraction hierarchies. In this procedure, two searches are started at two nodes in opposite directions and run until two subgraphs intersect.

Emission-Based Routing Using the GraphHopper …

95

2.2 Street Graphs In the commercial sector, the market is dominated by the map services of Here Maps, Google and TomTom [27]. The current market leader in the provision of road maps is Here Maps [28]. The card data can be retrieved here via chargeable interfaces. With OpenStreetMap, Open Source Project exists, which provides free geographic data [10] The map material of OpenStreetMap is generated and maintained by the users, which makes it a cost-effective variant. In addition, areas that are usually difficult to access are also maintained by the users [29, 30].

2.3 Extension of GraphHopper GraphHopper is a fast, memory optimized Java based routing engine released under the Apache 2.0 license. By default, OpenStreetMap and General Transit Feed Specification (GTFS) data are used. However, the software allows the user to import other data as well [31]. GraphHopper was chosen because the Apache license provides modifiability and extensibility. In addition, the common pathfinding algorithms A* and Dijkstra are integrated and well documented. With the help of the documentation, a quick familiarization is also given and the planned emission-based routings can be carried out thanks to the already integrated track matchings. By extensions it is possible to consider a terrain model with data to height meters [23]. GraphHopper is also one of the standard routing engines for OpenStreetMap. Classes for GraphHopper Core. The implementation was carried out with the help of an own weighting class for the pollutant class EURO 5. The AbstractWeighting class served as the base class for the weighting. The calculations for the weightings were copied from the FastestWeighting class and adjusted. Within the weighting class, the CO2 emission values were implemented in an auxiliary method (Fig. 1). The auxiliary method returns the amount of pollutant emitted based on the speed. Since within the route calculation partly not smoothly rounded velocities are calculated, but with calculated floating-point numbers as velocity, the emissions had to be subdivided into ranges. This breakdown is shown in Fig. 1. By outsourcing the emission factors into a separate method, a faster maintainability and better overview is given. The calcWeight and calcMinWeight methods were adjusted to include the emission factors in the calculation. This determines the required time for an edge with the help of the speed. Instead of the speed, the emission factor associated with this speed is now taken and multiplied by the factor 0,001 to convert the units correctly. GraphHopper works internally with the SI unit meter, the emission factors given in the Fig. 1 refer to g/km. Implementation Frontend. The added weights can be built into the frontend using the config.yml file. Since the edges and their weightings are already partially calculated when the GraphHopper server is started, the pre-calculated routes must be deleted if changes are made to the weightings.

96

M. Engelmann et al.

NOx emission factors for Diesel vehicles (at 0 % gradient) 1.80 1.60 1.40

NOx [g/km]

1.20 1.00 0.80 0.60 0.40 0.20 0.00 5

10

speed classes [km/h] 0–5 5 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 80 – 90 90 – 100 100 – 110 110 – 120 120 – 130 130 – 140 > 140

20

30

40

50

60

EURO5 (HBEFA 3.2) [g/km] 1,62 1,23 0,97 0,71 0,65 0,55 0,50 0,48 0,47 0,50 0,54 0,57 0,67 0,83 1,00 1,21

70

80

90

100 110 120 130 140 145

EURO6 (HBEFA 3.2) [g/km] 0,39 0,35 0,28 0,25 0,22 0,17 0,16 0,14 0,14 0,16 0,19 0,20 0,25 0,33 0,55 0,72

Fig. 1 NOx emission factors read off for EURO 5 and EURO 6 diesel vehicles from [20]

Emission-Based Routing Using the GraphHopper …

97

3 Implementation Methods The NOx -optimized weightings (cf. Sect. 2.3) and the GraphHopper’s optimized weighting according to travel time (fastest route) were used to carry out the emissionoptimized route planning. The modified GraphHopper is available at: https://gitlab.f2. htw-berlin.de/s0540276/emissionbasedrouting. In default settings, the GraphHopper optimizes the travel time (fastest route) with the bi-directional Dijkstra algorithm using contraction hierarchies without specifying further parameters. The following three routes were selected for the test: • Variant 1: Motorway • Variant 2: Town with bypass road • Variant 3: City centre The first variant was chosen in such a way that both the starting and end points are each located near a motorway exit. The selected motorway section can be used for the most part without speed limits. Next to the motorway there is a mainly parallel country road. For the second variant, the starting point in front of a surrounding road and the end point after the end of the bypass road were determined. The main part of the surrounding road can be driven on at 100 km/h. As a third variant, a route in an inner city was chosen. The following table lists the selected start and end points of the routes described above. For the trial, current map data for the free world map OpenStreetMap for the federal state Brandenburg (status 27 February 2019) were obtained from the website geofabik.de.1 For each of the routes listed in the Table 1 a request was sent to the GraphHopper server to calculate a route optimized for NOx and a route optimized for travel time. The result of the route calculation can be viewed directly via the web frontend in the form of a map. In addition to the map display, the GraphHopper server can also be addressed directly via an API, which outputs a JSON file with the results. This function was used to read the calculated NOx emissions, which are stored in the JSON file as “weighting”. Table 1 Start and end points for the considered routes

Starting point (decimal degree)

Endpoint (decimal degree)

Route 1 (motorway)

51.829803; 13.971348

51.782604; 14.05941

Route 2 (town with surrounding road)

52.551932; 13.693600

52.560751; 13.764753

Route 3 (downtown)

52.496528; 13.389115

52.483701; 13.428850

1 https://download.geofabrik.de/europe/germany/.

98

M. Engelmann et al.

All route calculations were carried out on a local server and the same map material was used. The GraphHopper server was started with the default settings.

4 Outcomes The following Figs. 2, 3, and 4 and the Table 2 show exemplary results of the route calculation.

4.1 Route 1—Route with Motorway Figure 2 shows the results of the calculation for the NOx -optimized (green) and the time-optimized (blue) routes with start and end points near a motorway exit. For the NOx -optimized route, a route is chosen on the country road parallel to the available motorway, which, with a total distance of 9, 64 km, is 0, 53 km longer than the blue route. For this route, NOx emissions of approx. 5, 18 g could be determined for the entire route with an expected travel time of 9:57 min. The route, which is optimized for driving time, leads over the motorway with an estimated driving time of 4:40 min and is 5:17 min (53%) faster than the NOx -optimized route. NOx emissions of 7, 34 g were determined for the motorway route. This is 2, 16 g or 41, 7% more than for the NOx -optimized route.

NOx-optimized fastest route

Fig. 2 Route 1 with motorway

Emission-Based Routing Using the GraphHopper …

99

NOx-optimized

fastest route

Fig. 3 Route 2 place with surrounding road

NOx-optimized and fastest route

Fig. 4 Route 3 through an inner city

100

M. Engelmann et al.

Table 2 Results of the route calculations for the considered variants Variants

Distance (km)

Time (min)

NOx emissions (g)

Emissions (g)

Variant 1 (motorway), NOx -optimized

9, 64

9:57

5, 18

−2, 16

Variant 1 (motorway), fastest

9, 11

4:40

7, 34

Variant 2 (town with bypass), NOx -optimized

5, 63

6:44

2, 83

Variant 2 (town with surrounding road, fastest

5, 90

4:16

3, 10

Variant 3 (downtown), NOx -optimized

3, 35

4:31

1, 84

Variant 3 (downtown), fastest

3, 35

4:31

1, 84

−0, 27

0, 00

4.2 Route 2—Town with Surrounding Road The following figure shows the calculation for the NOx -optimized (green) and the time-optimized (blue) routes with the starting point in front of a surrounding road and the end point after the end of the bypass. The proposed NOx -optimized route runs over country roads and a town centre and, with a total distance of 5, 63 km, is 0, 27 km shorter than the route over the surrounding road (blue route). For the NOx optimized route, NOx emissions of 2, 83 g were calculated with an expected travel time of 6:44 min. The blue route, which has been optimized for driving time, has a total distance of 5, 90 km. This route takes the surrounding road with an estimated driving time of 4:16 min and calculated NOx emissions of 3, 10 g. This means that travel time on the NOx -optimized route is approx. 57% longer with a saving of 0, 27 g and 8, 7% in NOx emissions.

4.3 Route 3—City Centre Figure 4 shows the calculation for the NOx -optimized (green) and the travel-timeoptimized (blue) route through an inner city. It can be seen that the same route was determined in both variants. The distance considered has a total length of 3, 35 km with an estimated driving time of 4:31 min. In both variants NOx emissions of 1, 84 g are calculated.

Emission-Based Routing Using the GraphHopper …

101

4.4 Summary Results The following Table 2 shows the results of the calculated routes for the considered variants motorway, town with bypass and city center for the NOx -optimized and the route calculation according to travel time. In addition, the last column shows the difference in NOx emissions between the NOx -optimized and the fastest route. For variant 1 (motorway) and variant 2 (town with bypass), a route could be determined in which less NOx emissions were emitted. In both variants, the total distance determined is approximately the same, but the estimated driving time differs considerably from one another. In variant 3 (city center), the NOx -optimized route was identical to the route optimized for driving time.

5 Discussion By modifying the GraphHopper, it has been demonstrated that an ecological route guidance that takes the pollutant NOx into account is possible. The weighting for EURO 5 (EU5Weighting) has been integrated into the GraphHopper class and can therefore be implemented via configuration. The weightings for EURO 6 (EU6Weighting) were also implemented, but the EURO 6 emission factors had no impact on route finding for the considered variants and were therefore not included in the results. Since the EURO 6 emission factors are significantly lower than the EURO 5 emission factors, lower NOx emissions are calculated for EURO 6 vehicles. By implementing the emission factors as separate weighting classes (EU5Weighting and EU6Weighting), the calculated quantity for the pollutants emitted (in this case NOx ) is equal to the sum of the selected edges. With regard to calculations optimized for driving time however, the edges do not contain the amount of pollutant, but the time as weighting. This means that no conclusions can be drawn about the quantity of pollutants emitted. In order to perform both calculations simultaneously, the FastestWeighting class of the GraphHopper must be extended, so that in addition to time as the sum, emissions can also be calculated as the sum of speed. For this reason, the modified GraphHopper currently does not allow NOx emissions to be calculated for the route optimized for driving time. In order to determine the NOx emissions from the speed-optimized route, a NOx -optimized route calculation was used and inserted into the route calculation until the calculated route corresponded to the time-optimized route. The route calculation in the first variant with start and end points close to a motorway exit, shows that the NOx -optimized route avoids the motorway as expected and instead outputs a route via a country road. On the motorway section under consideration there is predominantly no speed limit, which leads to a higher edge weighting of the motorway section compared to the parallel running country road due to the using emission factors (Fig. 1). Furthermore, it could be shown that the estimated travel time of the NOx -optimized route is significantly longer compared to the travel time optimized route in this variant.

102

M. Engelmann et al.

In the second variant, a route with a starting point in front of a surrounding road and an end point after the end of the bypass road was considered. The results show that the NOx -optimized route avoided the surrounding road as expected and instead determined a route through the town center. The surrounding road can mainly be driven on at 100 km/h, whereas the local through traffic can mainly be driven on at 50 km/h. Since both routes are almost identical in length, the higher emission factors (Fig. 1) resulted in a higher overall weighting of the edges of the bypass compared with the local through traffic. Also, in this case it could be shown that the estimated travel time of the NOx -optimized route is significantly longer compared to the travel time optimized route. However, it must be noted that the approach behavior at traffic lights or due to a generally difficult traffic situation as well as the turning behavior was not taken into account in the calculation. The calculated NOx savings of 0, 27 g or 8, 7% cannot therefore be transferred to a real scenario. Acceleration peaks are responsible for fuel consumption and emitted pollutants and are influenced by the driver and the traffic flow factors [32]. The third variant, which leads through an inner city, showed that the NOx optimized route calculation avoids roads with permissible maximum speeds of ≤30 km/h as expected, for example in residential areas or traffic-calmed areas. In the third variant, the NOx -optimized route corresponds exactly to the route optimized for driving time. The route proposed in both cases for the section under consideration is on main inner-city roads, which can be used at 50 km/h. The route is based on the following two criteria Due to the emission factors used, alternative routes through residential areas lead to a higher edge weighting of the residential area streets and are therefore avoided by the algorithm.

6 Conclusions In summary, it could be shown that the hypotheses were confirmed by the prototype. With NOx -optimized route calculation, roads with a maximum permitted speed of ≥100 km/h such as motorways and country roads are predominantly avoided. Furthermore, roads with low maximum speeds limit, such as those in residential areas and living streets, are avoided. In the cases no NOx -optimized route could be found for which the estimated travel time is less than for a speed-optimized route calculation. The estimated driving time of the NOx -optimized route was identical to the fastest route in the inner-city area. The expected driving time was higher compared to the fastest variant for the out-of-town routes. It could not be determined if the calculated NOx emissions can be transferred to a real scenario. Especially acceleration processes lead to higher emissions compared to driving at constant speed. Further research should also take this topic into account. Current online map services such as Google Maps offer the possibility of displaying the current traffic situation. With the help of this knowledge base, the routing algorithm could be adapted to reduce probable waiting times. Furthermore, a database with all traffic lights could be obtained, so that these could be integrated into the

Emission-Based Routing Using the GraphHopper …

103

algorithm as an additional weighting factor. Furthermore, the warm-up phases of the different engines should be taken into account aswell. This prototype focuses on data on an aggregated level. Further studies should focus to implement the data on a microsimulation level. These data should implement warm-up phases of engines, traffic lights and the driver individual behaviors. The implementation of these changes is very simple due to the modularity of the GraphHopper API.

References 1. UBA, Umweltbundesamt der Bundesrepublik Deutschland: Transport Emission Model, Datenund Rechenmodel TREMOD 5.81 (2018) 2. BMVI, Bundesministrerium für Verkehr und digitale Infrastruktur der Bundesrepublik Deutschland: Verkehr in Zahlen 2017/2018, Hamburg (2017) 3. UBA, Umweltbundesamt der Bundesrepublik Deutschland (2018, October 20). https://www. umweltbundesamt.de/daten/verkehr/emissionen-des-verkehrs 4. UBA: CO2 -Emissionsminderung im Verkehr in Deutschland. Dessau-Roßlau (2010) 5. Helmers, E.: Bewertung der Umwelteffizienz moderner Autoantriebe—auf dem Weg vom Diesel-Pkw-Boom zu Elektroautos. Umweltwissenschaften und Schadstoff-Forschung, pp. 564–578 (2010) 6. Bundesregierung der Bundesrepublik Deutschland: Pressekonferenz nach dem Spitzengespräch zur Elektromobilität (2012) 7. Google: Google Maps (2018, October 20). https://www.google.de/maps 8. Microsoft (2018, October 22). https://www.bing.com/maps 9. HERE Global B.V. (2018, October 22). https://wego.here.com 10. OpenStreetMap Foundation: OpenStreepMap (2018, October 22). https://www. openstreetmap.org 11. HeiGIT (2018, October 22). https://maps.openrouteservice.org 12. TomTom (2018, October 25). http://download.tomtom.com/open/manuals/new_GO/html/enus/RoutePlanning.htm 13. Goodwin, A.: Ford Adds Eco-Route Option to MyFord Touch Navigation. CNET (2010) 14. Arcidiacono: Joint Research Centre (JRC): fUel-SAVing trip plannEr (U-SAVE): Product of the JRC PoC Instrument. Publications Office of the European Union, Luxembourg (2017) 15. Figliozzi, M.: Vehicle routing problem for emissions minimization. Transp. Res. Rec. 2197, 1–7 (2010) 16. Zeng, et al.: Prediction of vehicle CO2 emission and its application to eco-routing navigation. Transp. Res. 194–214 (2016) 17. Schäfer, F. (ed.): Handbuch Verbrennungsmotor. Springer Vieweg, Wiesbaden (2015) 18. Koch, T.: Diesel – eine sachliche Bewertung der aktuellen Debatte – Technische Aspekte und Potenziale zur Emissionsminderung. Springer Vieweg, Wiesbaden (2018) 19. Pucher, E., et al.: Abgasemissionen. In: Handbuch Verbrennungsmotor. Springer Vieweg, Wiesbaden (2015) 20. HBEFA 3.3: Update of emission factors for EURO 4, EURO 5 and EURO 6 diesel passenger cars for the HBEFA version 3.3. Institute for Internal Combustion Engines and Thermodynamics, Graz (2017) 21. Becker, et al.: Graphentheoretische Konzepte und Algorithmen, 2. Auflage. Vieweg + Teubner, Wiesbaden (2009) 22. Ottmann, et al.: Algorithmen und Datenstrukturen, 5. Auflage. Spektrum Akademischer Verlag, Heidelberg (2012) 23. Ramm, F.: Routing engines für OpenStreetMap, FOSSGIS, Passau (2017). Accessed 24 Mar 2017

104

M. Engelmann et al.

24. Madkour, A.: A survey of shortest-path algorithms (2017) 25. Güting, et al.: Datenstrukturen und Algorithmen, 4. erweiterte und überarbeitete Auflage. Springer Vieweg, Wiesbaden (2018) 26. Nebel, et al.: Entwurf und Analyse von Algorithmen. Springer Vieweg, Wiesbaden (2018) 27. Honsel, G.: Technology review: Alles auf einer Karte (2015, August 26). https://www.heise. de/tr/artikel/Alles-auf-einer-Karte-2789336.html. Zugriff am 9 Nov 2019 28. Wendel, T.: Capital – Wirtschaft ist Gesellschaft: Wie Here Google schlagen will (2018, January 31). https://www.capital.de/wirtschaft-politik/wie-here-google-schlagen-willautoindustrie-digitalisierung-navigation. Zugriff am 9 Nov 2018 29. Goodchild, M.F.: Citizens as sensors: the world of volunteered geography. GeoJournal 69, 211–221 (2007) 30. Dallmeyer, J.: Simulation des Straßenverkehrs in der Großstadt. Springer Fachmedien, Wiesbaden (2014) 31. Graphhopper (2018, October 19). https://github.com/graphhopper/graphhopper 32. Mitschke, M., Wallentowitz, H.: Fahrleistungen und Kraftstoffverbrauch. In: Dynamik der Kraftfahrzeuge. Springer Vieweg, Wiesbaden (2014)

Digitally Enabled Sharing and the Circular Economy: Towards a Framework for Sustainability Assessment Maria J. Pouri and Lorenz M. Hilty

Abstract The prevailing patterns of consumption and production are not sustainable because they are based on increasing extraction of non-renewable resources (such as fossil fuels or scarce metals) from the Earth’s crust and overuse of life-sustaining ecosystem services (such as CO2 assimilation or the water cycle). One strategy to direct consumption to a sustainable pathway is the circular economy. The goal of the circular economy is to slow down the flow of material resources through the anthroposphere and to return them back to nature in a form that is as compatible as possible with the ecosystem processes. We focus on the first aspect, which means that each unit of material resource that enters the economic system should satisfy as much human needs as possible until it is considered waste. We ask the question if and how the emerging “sharing economy” can contribute to this specific goal. We see the phenomenon of sharing economy as a transformation of sharing practices with means of digital Information and Communication Technology (ICT). The resulting Digital Sharing Economy (DSE) can therefore be considered an important special case of ICT impact on sustainable development. We open up an argument on how sharing in the DSE can be either supportive or counter-productive with regard to the circular economy goals. We present a first framework that provides a guideline for the qualitative assessment of new sharing practices with regard to their potential contribution to a circular economy. Keywords Digital sharing economy · Circular economy · Resource consumption · Sustainability assessment

M. J. Pouri (B) · L. M. Hilty Department of Informatics, University of Zurich, Zürich, Switzerland e-mail: [email protected]; [email protected] L. M. Hilty e-mail: [email protected] L. M. Hilty Technology and Society Lab, Empa Materials Science and Technology, Dübendorf, Switzerland © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_8

105

106

M. J. Pouri and L. M. Hilty

1 Introduction Human society is heavily dependent on the environment. All material goods are made from natural resources in one way or another. The current rates of resource extraction are already a burden on the environment [1]. Therefore, the prevailing consumption and production patterns cannot be considered sustainable, causing depletion of Earth’s finite resources and degrading the environment [2]. As global population grows and economies expand, consumption will continue to rise. Encouraging and reaching sustainable consumption is a necessary condition for sustainability as “a situation in which human activity is conducted in a way that conserves the functions of the Earth’s ecosystems” [3]. Only then, sustainable development can be achieved, which meets “the needs of the present without compromising the ability of future generations to meet their own needs” [4]. Gaining traction as an approach to achieve sustainable development [5], the vision of a circular economy is now being widely discussed in academic literature as well as in policy-making and executive domains. The circular economy has positive effects on some aspects of sustainability [6]; its vision and approach ensure that products are being used with a better utilization over time [2]. The circular economy is “an industrial economy that is restorative or regenerative by intention and design” [7, p. 14]. It refers to the “design and business model strategies [that are] slowing, closing, and narrowing resource loops” [8, p. 309]. With respect to sustainability, the circular economy “… aim(s) to accomplish sustainable development, which implies creating environmental quality, economic prosperity and social equity, to the benefit of current and future generations” [9, p. 225].1 While such approaches intend to align consumption patterns with sustainability objectives, technological advancements and the application of digital Information and Communication Technology (ICT) in everyday life changes the consumption behaviour of individuals and societies. While the transformational power of ICT can make production and consumption patterns more sustainable, the efficiencies brought by the technological advancement do not “automatically” contribute to sustainable development [12]. The digital transition of societies and economies has downsides too [13]. As ICT brings efficiency, the demand for its efficiently produced solutions may become stimulated to a degree that can offset the theoretically possible savings partially or even entirely [12]; this phenomenon is called the “rebound effect” and has been widely discussed in studies on ICT-induced efficiencies and systems’ responses to them (e.g. [14, 15]). An important example of the transformative power of ICT on consumption is the digitally enabled sharing economy, or ‘Digital Sharing Economy’ (DSE) for short. The growth and popularity of the DSE is much due to its enabling individuals to 1 For

more on the conceptual relations between the two concepts of circular economy and sustainability, see [10]. For more on the relevance of the circular economy for sustainability and for the implementation of the Sustainable Development Goals (SDGs), see [5, 11].

Digitally Enabled Sharing and the Circular Economy …

107

connect and develop peer networks (‘peer’ in the sense of someone having similar needs) for engaging in sharing regardless of time, place, communication, and coordination limits, as it holds true for many activities in the digital world. This dynamic, collaborative participation (‘dynamic’ in the sense of active involvement and ‘collaborative’ in the sense of collective responsibility of the participating population) allows a large number of peers to gain a share of what others already own or offer, and to enjoy the economic and social benefits of sharing depending on their individual needs. From a circular economy perspective, the act of sharing has a potential to increase the ratio of satisfied human needs per amount of natural resources needed (i.e., to increase resource efficiency) and is thus compatible with sustainability objectives. However, unintended effects of the digitally enabled sharing should be taken into consideration too. As the authors argue in an earlier article [16], if any product or service in the DSE becomes cheaper, faster or more convenient to access, the usual reaction of a market is an increase in the demand for it, which may ultimately balance out the favorable effects of shared consumption within the same product/service system (direct rebound effect). It may also be the case that the time or money saved due to sharing is spent on consuming other products or on using a service which is less resource-efficient (indirect rebound effect) [16]. Taking this into account, the potential contribution of the DSE to create and support a circular economy through maximized resource efficiency needs to be addressed with more reflection. The implications of extensive sharing practices from a circular economy or a general sustainability perspective have not been much discussed by the existing studies. Literature lacks an integrated approach to explain the relevance of sharing for sustainability, in particular with respect to circular economy objectives. In this paper, we present a first framework for supporting sustainability assessment of the DSE by looking at the scenarios created by it and evaluating them from the perspective of the circular economy. The remainder of this paper is structured as follows: In Sect. 2, we address the relevance of the sharing practices used in the DSE for the circular economy. Section 3 reviews the definition, system’s characteristics, and resources in platform-based sharing. Building on the system’s specifications and considering the ‘circularity’ implications of extensive sharing, we will in Sect. 4 conduct an analysis of optimized consumption (as a desirable effect of sharing) and possible induction effects and rebound effects (as counter-productive effects of sharing from a circular economy perspective) in the context of sustainability. Sections 5 and 6 are devoted to further discussion and the conclusion.

2 The Relevance of Sharing for the Circular Economy To reach a circular economy, Bocken et al. [8] specify the following strategies: slowing resource loops, closing resource loops, and resource efficiency or narrowing resource flows. As further explained by Bocken and colleagues, through the design

108

M. J. Pouri and L. M. Hilty

of hard-wearing and long-life products, the utilization period of them becomes prolonged and this extended product life slows the resource loops over time; this slowdown corresponds to reduction in the rate of resource input flow. Closing loops between post-use and production phases means to recycle products after their endof-life, hence enabling a circular flow of resources with minimum waste [17]. By narrowing loops, fewer resources will be used per products and production processes. While much emphasis is put on strategies towards product design,2 achieving a slowdown is also influenced by consumer involvement in schemes that increase the utilization of products, such as sharing schemes [21]. Shared consumption is a prime example of and an important vision for product reuse [22]. Before the application and proliferation of digital platforms, there was a low profile of sharing systems in society and on the market [23]. However, with the expansion of the ICT-enabled, platform-based sharing economy (i.e. the DSE), this low profile grew along with the growth of sharing in almost every aspect, including the size of networks, scale of performance, diversity of resources and services, complexity of coordination and allocation, viability of transactions, etc. [24]. Platforms are now endorsing a massive amount of transactions and operations in the DSE, locally and even worldwide. The relevance of the sharing practices performed via digital platforms for the circular economy and product reuse has been already expressed in literature (e.g., [22, 25]). Sharing platforms can potentially help to slow down resource flow by enabling access to existing products [25]. Being an important enabler of the circular economy, the DSE can bring about significant improvements in resource efficiency as sharing can put the highest possible utilization capacity of the existing products into a more efficient use [22]. However, platform-based sharing also tends to use and sustain the current unsustainable infrastructures to serve the market [25] while even promoting unsustainable consumption trends [26]. In the following sections (after describing the DSE in some more detail), we will propose a framework for sustainability assessment of the sharing trend that is evolving in the DSE, focusing here on the sustainability implications of sharing with respect to the circular economy strategies. This framework aims to present the implications of the DSE for sustainability by projecting possible scenarios describing how the DSE can support or hinder a transition towards a circular economy.

3 The Digital Sharing Economy (DSE) “The digital sharing economy (DSE) is a class of resource allocation systems based on sharing practices which are coordinated by digital online platforms and performed by individuals and possibly commercial organizations with the aim to provide temporary access to resources without transfer of ownership while generating monetary or non-monetary value for the participants. DSE systems operate in the space between traditional sharing and formal markets” [24]. 2 For

more on product design for circular economy, see [8, 18–20].

Digitally Enabled Sharing and the Circular Economy …

109

In order for sharing to be viable, resources must be sharable across the system. Previous studies have classified sharable resources in ‘tangible’ versus ‘intangible’ resources [27], ‘material goods’ versus ‘less-tangible assets’ [28], and ‘capacityconstrained’ versus ‘capacity-unconstrained’ assets [29]. We draw on the classification of resources as tangibles and intangibles. For the purpose of this study, we will focus only on tangible ones because they are based on material resources. Tangible resources are further divided into durables and non-durables (or consumables). The reason for distinguishing between durable and non-durable resources is because of the difference in the way they are shared. By being ‘sharable’, a resource should be ‘durable enough’ to endure multiple access (either in sequence or in parallel). For a non-durable (consumable) resource, there is no repeated access to a given part of it: consumables (such as food, fuel, etc.) are exhausted once consumed. Yet, such resources and the act of consuming them can be shared (traditionally, as food is consumed in groups). In addition to giving access to sharable resources, the instances of the DSE usually have some typical characteristics: Low entry barrier: The DSE provides an open and easy-to-participate network for its users and in many cases (if not all); the entry barriers for participants are considerably lower in digital sharing systems compared to the formal markets to offer the same services. This is at least true for people who are using digital technologies such as a smartphone anyway. Having possession of a physical asset or a particular skill to offer as a service, a smartphone, installing an application, and agreeing to a platform’s regulations normally suffice to be a resource/service provider in the system (in cases, membership fees may be also applied).3 Value co-creation: In contrary to service provision by businesses in the formal market, in which a business is the only entity that creates and offers value to its customers, value in the DSE can also be created by its participants, leading to value co-creation [31]. In this context, platforms act as tools to enable and facilitate value co-creation across the system while being the ‘medium’ for value exchange. This value co-creation (both in P2P and B2P sharing modes) shifts the responsibility of many tasks which are normally undertaken by businesses to peers, rather than requiring platform owners to endorse all aspects of service provision from their own resources [24]. Lower prices: Being a ‘cross-market’ operating between the two extremes of the traditional sharing and the formal market, the DSE’s pricing mechanisms should allow for a range from free services (an attribute taken from the traditional sharing–closer to and including the lower extreme) to services that enforce a certain type of compensation, including fee-based services (an element of the formal market—closer to and ‘never’ reaching the upper extreme) [24]. In the DSE, the cost of receiving services should be lower than their equivalent provided by companies acting in the formal market, if such a market exists. By theory, prices for the DSE services should remain lower than their equivalent in the formal market (always below the upper extreme) [24]. 3 Entry barriers have been argued to have created inequality in terms of requiring possessing an asset

or skill to enter the platform-based labor market [30].

110

M. J. Pouri and L. M. Hilty

4 Conceptualizations of ICT Effects on Sustainability Through lenses of sustainability, ICT can have positive and negative effects on patterns of consumption and production. Existing literature presents various approaches and frameworks to reflect on the role of ICT in society’s transition towards sustainability.4 An early framework in this area was proposed by Berkhout and Hertin in 2001 [33], first introducing the “three order of effects” model. This basic model was extended and further developed by many authors, including Dompke et al. [31] and Hilty and Aebischer [12]. It has also been combined with early work done by Moktharian [34] on the relationships between telecommunications and transportation. We will apply, as a common core of these approaches, the following three categories of effects to analyze the ambivalent impact of the DSE on the circular economy: optimization effects, induction effects and rebound effects. Technology has a huge potential to increase efficiency—i.e., efficient use of resources such as energy, time, effort, etc.—by optimizing consumption and production processes [12]. As a result of the optimization, costs involved in doing an activity can decrease remarkably (a driver can optimize the route to take from A to B by using a navigation system, thus saving fuel and time). However, with an increase in efficiency and a corresponding decrease in costs, existing consumers can afford more consumption while new consumers are enabled to enter the consumption chain too. When the demand rises and, as a consequence, consumption rate goes beyond the availabilities, the system starts revealing unintended countereffects known as “rebound effects”. Rebound effects prevent the reduction of total resource use by converting efficiency improvements into additional consumption and ultimately lead to (partially or entirely) offsetting the initial efficiencies and positive effects [12]. “Efficient technologies can also stimulate the demand for the resource they use efficiently” [12, p. 5]. This means that through an induction effect [12], the consumption of other resources, which are required for an efficient resource, can become stimulated too (the use of electricity increases in order to charge more and more efficient, electric devices). In the following, we address the above effects for the DSE as a use case of ICT. These effects comprise our framework for a qualitative assessment of the sustainability of the DSE with an approach to the circular economy. Induction effects are, like rebound effects, unintended side effects of introducing an ICT-based solution leading to higher resource consumption. However, in contrast to rebound effects, they are not specifically explainable as a reaction to higher efficiency. This will become more clear when we will apply these concepts in the next section.

4 For

[32].

an overview of conceptual frameworks for structuring effects of ICT on sustainability, see

Digitally Enabled Sharing and the Circular Economy …

111

5 Qualitative Sustainability Assessment of Digital Sharing The DSE is a technological innovation [16] that has changed consumption preferences; it has shifted parts of consumption from an ownership-transfer mode to an access-based mode. The affordable (low cost/free), accessible (via a smart phone), and convenient (real-time information at fingertips) consumption of services through platform-based sharing has made it possible for people to access a vast variety of resources. Consumers are provided with temporary access to a resource that otherwise they would have needed to purchase on the formal market (e.g. borrowing a household tool from a neighbor); or access to resources that are less expensive or even free compared to their equivalent in the formal market (e.g. booking a room from a host on Airbnb5 instead of booking one from a hotel, or to receive free hospitality services from Couchsurfing6 ). Coordination of this access is enabled by ICT-based platforms making it possible to spot the available resources for being utilized at any given time, and providing access in seconds. Like in any other application domain, the effects of introducing ICT can be analyzed using the concepts introduced in Sect. 4. Optimization effect: Owners can help slowing down resource flow by sharing the unused capacity of their owned products with others, allowing resources to remain in the use phase for a longer time (or, more precisely, making resources provide more utility over their lifetime) by not being stored away most of the time or by being disposed of at an earlier stage [21, 35, 36]. With each act of sharing, parts of the unused capacity of a particular resource come into use. The underlying idea of sharing is that “whenever a user has some idle resource, she offers it to other users who at that time have unsatisfied needs … Such solutions can improve resource efficiency” [37, p. 1]. This efficient use, facilitated by ICT-enabled platforms, is in fact an optimization of the consumption process. For example, when a person uses a carpooling service, more people (up to the capacity of the car) can ride the same car, and therefore the number of functional units (here defined as one person kilometer) created on the trip (and therefore the utility created) increases. For car sharing (more people owning the same car, but not at the same time) or ride services (such as Uber7 ), the car can produce more person kilometers by transferring more people over its lifetime. This is possible because its idle time (which will nevertheless involve some form of aging) will be reduced. Therefore, in the DSE, optimization occurs through improved utilization of available resources during their lifetime. What is to be optimized is the number of functional units produced throughout the whole use phase (lifetime) of the resource. Rebound effects: If any product or service becomes faster, cheaper or more convenient to access, a usual reaction of the market to this increased efficiency is an increase in the demand for that good. This may ultimately balance out the favorable effects of shared consumption within the same product/service system (direct 5 https://www.airbnb.com/. 6 https://www.couchsurfing.com/. 7 https://www.uber.com/ch/en/.

112

M. J. Pouri and L. M. Hilty

rebound effect); or it can happen that the savings (time or money) gained from sharing is spent on other consumption which is even more resource-intensive (indirect rebound effect) [16]. Rebound effects can lead to additional resource inflow of natural resources into the system (e.g. for the millions of e-scooters now produced to be used in free-floating sharing systems in cities): a situation that is in contrast with the notion of a circular consumption and the strategies to achieve a slowdown in and narrowing resource loops. Special attention is to be given to time rebound effects when considering the sustainability of technology-based efficiency gains [38]. Time rebound comes from a decline in the time needed to acquire and consume a service; this reduces the explicit or implicit costs associated with time [13]. Owing to the efficiency of platforms in matching needs with availabilities (e.g. locating the nearest available car), the consumer can save time; this saved time can be spent on consuming more of the same or another service, or on some other activity with high resource intensity. Therefore, through lower costs associated with resource access and utilization in the DSE, rebound effect can cancel out the efficiencies and savings gained from the sharing system. It is even suggested that “the overall effects of sharing economy platforms may be small due to rebound effects” [39, p. 8]. Induction effect: Any consumption, including the use of a shared resource, can stimulate the consumption of other resource(s). One could speak of ‘coupled consumption’. Back to the example of car sharing and riding services, in order for a car to operate, it requires fuel and temporary occupation of a part of the infrastructure (i.e. roads); without these complementary resources, using a car as a means of transportation could not produce its service. These complementary resources must be taken into account when assessing the overall effect of different sharing schemes. For example, for carpooling not only the car, but also the complementary resources of using the car can be divided by the number of passengers. This is, however, not the case for car sharing or ride services. Studies on the impact of the growth of platformbased ride services in San Francisco [40] and New York City [41] indicate adverse effects on urban traffic. Uber and Lyft are known to account for “more than half of San Fran-cisco’s real-world traffic increase” [40]. As Fox [40] further reports, from 2010 to 2016, these ride sharing services increased the time cars spend sitting in the city’s traffic by around 70%. The study of the New York City echoes similar findings. Schaller [41] introduces the platform-based riding services as a leading source of in-crease in non-personal auto travel in the city, and as “not a sustainable way to serve the growing transportation needs generated by the city’s expanding population and economic activity” (p. 26). Since building and maintaining infrastructures is one of the most resource-intensive activities of urban societies, the fact that a sharing system occupies an increased share (!) of this infrastructure per functional unit is clearly in contradiction to the goals of a circular economy.

Digitally Enabled Sharing and the Circular Economy …

113

6 Discussion We have shown that induction and rebound effects have to be taken into account to see if the optimized consumption achieved by new practices of sharing in the DSE actually contributes to the goals of the circular economy—the final outcome of these effects may be different for each instance of the DSE and requires a closer look at the resource flows and demand dynamics in each case. Besides the ‘classical’ impact categories of optimization, induction, and rebound effects, there is another aspect we want to introduce because it is closely linked to the vision of the circular economy, the aspect of asset degradation. Multiple access raises a relevant consideration regarding product lifetime becoming shorter as a result of the product’s faster degradation. As pointed out by Weber [42], shared use of a product degrades it faster. Degradation by intensified wear and tear (as it can be observed, e.g., with shared bicycles or e-scooters) counteracts the optimization effect from a life cycle perspective. Faster degradation corresponds to higher replacement costs [42] and apparently new resource inflows. Although approaches to this problem can be to embed justified prices for sharing and to set activities that extend the service life of the products (such as maintenance and repairs) [42], the importance of product design remains significant with respect to the DSE becoming a viable approach to achieve a circular economy. In addition to strategies for product design and lifetime extension, consumers play an important role in driving products’ lifetime [18, 21, 43]. In the DSE, user behavior is to a large degree both influenced by and influencing the services in the system. With respect to the users’ influence, setting sustainable consumption patterns cannot be achieved without consumers’ attitudes to support sustainability [44], and the circular economy as a particular approach to it. With people being more and more willing to and cautious about ‘circular choices’, service providers can capture their considerations and preferences to move their services into the direction of a more circular, sustainable economy. It is also important to explore whether it is more important for consumers to practice sustainable sharing, or just to take economic benefits from sharing. When the DSE reaches a point where sharing may not be sustainable anymore, will people refrain from participating? It remains important to investigate to what extent individual motivations and user behavior incorporate environmental considerations in their choices for consumption [45]. On the influence of services on user behavior, we may point out the role of service providers in encouraging desirable behaviors. According to Tukker [46], it is highly plausible that people treat products that they do not own with less care; this in some cases may lead to higher environmental impacts (for example by increasing the need for frequent product replacements). To deal with such undesired behaviors, providers can enforce some regulations to be observed by receivers of their services (for example, Zipcar requires its users to take care of the cars and to return them clean and undamaged as regulations for using its services). Users’ engagement can correspond with value co-creation in consuming services. This value co-creation involves consumers in keeping products in better status (in the sense of better maintenance

114

M. J. Pouri and L. M. Hilty

and utilization) so that to extend products useful utilization Whether the users are motivated to such engagement might also depend on the business model of the system: Is the business model still based on the idea of the common good, or is the provider a company that has the only goal to grow its profits?

7 Conclusion The intensified utilization of resources in the DSE motivates new consumption trends while raising discussions about the sustainability implications of the digitally enabled sharing and its contributions to the circular economy, as an approach to achieve sustainability. We tried to shed light on the relevance of platform-enabled sharing practices for sustainability and circular economy goals through presenting a qualitative assessment framework. Based on this framework, optimization in consumption, induction effects and rebound effects are the potential implications of the DSE in the context of sustainability and as an approach to support circular economy goals. To summarize, although sharing by itself has positive effects on the efficiency of resource use, the extensive platform-based sharing practices should be checked for undesired side effects in terms of resource flows: • The basic objective function in the system should be the number of functional units (also known as ‘service units’) delivered over the use phase of the resources bound for the purpose of producing the service. This objective function goes up by shared use but is also driven down by the faster degradation of products that can occur as the result of large-scale, relatively anonymous and profit-oriented sharing systems. • Not only the shared resource itself, but also coupled consumption activities (e.g. the use of complementary products such as fuels or infrastructure capacity) should be taken into account to get the whole picture. If all resources needed are considered, sharing one of them at the cost of the others may not necessarily contribute to sustainability. For example, the travel induced by Couchsurfing should be included in a sustainability assessment of this platform. • The risk of direct or indirect rebound effects that in the end lead to increasing resource flows must be taken into account. Investments in additional assets (to be shared) may indicate that there is a rebound effect (e.g. when apartments are acquired just to ‘share’ them on Airbnb). Investors will understand increasing demand not as rebound effect, but as expected growth.

Digitally Enabled Sharing and the Circular Economy …

115

References 1. Swiss Federal Office for the Environment. https://www.bafu.admin.ch/bafu/en/home.html 2. Velenturf, A., Purnell, P., Tregent, M., Ferguson, J., Holmes, A.: Co-producing a vision and approach for the transition towards a circular economy: perspectives from government partners. Sustainability 10(5), 1401 (2018) 3. International Organization of Standardization. ISO 15392: 2008 Sustainability in Building Construction—General Principles, ISO Geneva (2008) 4. WCED, World Commission on Environment and Development: Our Common Future. Oxford University Press, Oxford (1987) 5. Schroeder, P., Anggraeni, K., Weber, U.: The relevance of circular economy practices to the sustainable development goals. J. Ind. Ecol. 23(1), 77–95 (2019) 6. Murray, A., Skene, K., Haynes, K.: The circular economy: an interdisciplinary exploration of the concept and application in a global context. J. Bus. Ethics 140(3), 369–380 (2017) 7. Ellen MacArthur Foundation: Towards the circular economy Vol. 2: Opportunities for the consumer goods sector. https://www.ellenmacarthurfoundation.org/assets/downloads/ publications/Ellen-MacArthur-Foundation-Towards-the-Circular-Economy-vol.1.pdf 8. Bocken, N.M., de Pauw, I., Bakker, C., van der Grinten, B.: Product design and business model strategies for a circular economy. J. Ind. Prod. Eng. 33(5), 308–320 (2016) 9. Kirchherr, J., Reike, D., Hekkert, M.: Conceptualizing the circular economy: an analysis of 114 definitions. Resour. Conserv. Recycl. 127, 221–232 (2017) 10. Geissdoerfer, M., Savaget, P., Bocken, N.M., Hultink, E.J.: The circular economy–a new sustainability paradigm? J. Clean. Prod. 143, 757–768 (2017) 11. Berg, A., Antikainen, R., Hartikainen, E., Kauppi, S., Kautto, P., Lazarevic, D., Piesik, S., Saikku, L.: Circular Economy for Sustainable Development, Reports of the Finnish Environment Institute 26 (2018) 12. Hilty, L.M.; Aebischer, B.: ICT for sustainability: an emerging research field. In: Hilty, L.M., Aebischer, B (eds.) ICT Innovations for Sustainability, pp. 3–36. Springer International Publishing, Cham, Switzerland (2015) 13. Coroam˘a, V.C., Mattern, F.: Digital rebound–why digitalization will not redeem us our environmental sins (2019) 14. Hilty, L.M., Köhler, A., von Schéele, F., Zah, R., Ruddy, T.: Rebound effects of progress in information technology. Poiesis Prax. 4, 19–38 (2006) 15. Gossart, C.: Rebound effects and ICT: a review of the literature. In: ICT Innovations for Sustainability, Advances in Intelligent Systems and Computing, pp. 435–448. Springer International Publishing, Cham, Switzerland (2015). ISBN 978-3-319-09227-0 16. Pouri, M.J., Hilty, L.M.: Conceptualizing the digital sharing economy in the context of sustainability. Sustainability 10(12), 4453 (2018) 17. Bourguignon, D.: Closing the loop: new circular economy package. European Parliamentary Research Service, p. 9 (2016) 18. Van den Berg, M.R., Bakker, C.A.: A product design framework for a circular economy. In: Proceedings of the PLATE Conference, Nottingham, UK, 17–19 June 2015, pp. 365–379 (2015) 19. Moreno, M., de los Rios, C., Charnley, F.: Guidelines for circular design: a conceptual framework. Sustainability 8, 937 (2016) 20. den Hollander, M.C., Bakker, C.A., Hultink, E.J.: Product design in a circular economy: development of a typology of key concepts and terms. J. Ind. Ecol. 21, 517–525 (2017) 21. Wastling, T., Charnley, F., Moreno, M.: Design for circular behaviour: considering users in a circular economy. Sustainability 10(6), 174 (2018) 22. Korhonen, J., Honkasalo, A., Seppälä, J.: Circular economy: the concept and its limitations. Ecol. Econ. 143, 37–46 (2018) 23. Mont, O.: Institutionalisation of sustainable consumption patterns based on shared use. Ecol. Econ. 50(1–2), 135–153 (2004) 24. Pouri, M.J., Hilty, L.M.: A theoretical framework for the digital sharing economy. Author manuscript.

116

M. J. Pouri and L. M. Hilty

25. Konietzko, J., Bocken, N., Hultink, E.J.: Online platforms and the circular economy. In Innovation for Sustainability, pp. 435–450. Palgrave Macmillan, Cham (2019) 26. Martin, C.J.: The sharing economy: a pathway to sustainability or a nightmarish form of neoliberal capitalism? Ecol. Econ. 121, 149–159 (2016) 27. Pouri, M.J., Hilty, L.M.: ICT-enabled sharing economy and environmental sustainability— a resource-oriented approach. In: Advances and New Trends in Environmental Informatics, pp. 53–65. Springer, Cham, (2018) 28. Ranjbari, M., Morales-Alonso, G., Carrasco-Gallego, R.: Conceptualizing the sharing economy through presenting a comprehensive framework. Sustainability 10(7), 2336 (2018) 29. Wirtz, J., So, K.K.F., Mody, M., Liu, S., Chun, H.: Platforms in the peer-to-peer sharing economy (2019) 30. Schor, J.B., Attwood-Charles, W.: The, “sharing” economy: labor, inequality, and social connection on for-profit platforms. Sociol. Compass 11, e12493 (2017) 31. Dompke, M., von Geibler, J., Göhring, W., Herget, M., Hilty, L.M., Isenmann, R., Kuhndt, M., Naumann, S., Quack, D., Seifert, E.: Memorandum Nachhaltige Informationsgesellschaft. Fraunhofer IRB, Stuttgart (2004). ISBN: 3-8167-6446-0 32. Hilty, L.M., Lohmann, W.: An annotated bibliography of conceptual frameworks in ICT for sustainability. In: ICT4S 3013, pp. 14–16 (2013) 33. Berkhout, F., Hertin, J.: Impacts of information and communication technologies on environmental sustainability: speculations and evidence. Report to the OECD (2001). http://www. oecd.org/dataoecd/4/6/1897156.pdf 34. Mokhtarian, P.L.: A typology of relationships between telecommunications and transportation. Transp. Res. Part A: Gen. 24(3), 231–242 (1989) 35. Wilson, G.T., Smalley, G., Suckling, J.R., Lilley, D., Lee, J., Mawle, R.: The hibernating mobile phone: dead storage as a barrier to efficient electronic waste recovery. Waste Manag. 60, 521–533 (2017) 36. The Waste and Resources Action Programme (WRAP). Switched on to Value: Powering Business Change, WRAP. Oxford, UK (2017) 37. Georgiadis, L., Iosifidis, G., Tassiulas, L.: On the efficiency of sharing economy networks. IEEE Trans. Netw. Sci. Eng. (2019) 38. Binswanger, M.: Technological progress and sustainable development: what about the rebound effect? Ecol. Econ. 36(1), 119–132 (2001) 39. Frenken, K., Schor, J.: Putting the sharing economy into perspective. Environ. Innov. Soc. Trans. 23, 3–10 (2017) 40. Fox, A. (2019). https://www.sciencemag.org/news/2019/05/uber-and-lyft-may-be-makingsan-francisco-s-traffic-worse 41. Schaller, B.: Unsustainable? The growth of app-based ride services and traffic, travel and the future of New York City. Schaller Consulting (2017) 42. Weber, T.A.: The dynamics of asset sharing and private use. In: Proceedings of the 51st Annual Hawaii International Conference on System Sciences (HICSS), Waikoloa Village, HI, USA, 3–6 January 2018, pp. 5205–5211 (2018) 43. Mugge, R., Schoormans, J.P.L., Schifferstein, H.N.J.: Design strategies to postpone consumers’ product replacement: the value of a strong person-product relationship. Des. J. 8, 38–49 (2005) 44. Hamari, J., Sjöklint, M., Ukkonen, A.: The sharing economy: why people participate in collaborative consumption. J. Assoc. Inf. Sci. Technol. 67(9), 2047–2059 (2015) 45. Demailly, D., Novel, A.S.: The Sharing Economy: Make It Sustainable, vol. 3. IDDRI, Paris, France (2014) 46. Tukker, A.: Product services for a resource-efficient and circular economy—a review. J. Clean. Prod. 97, 76–91 (2015)

Exploring the System Dynamics of Industrial Symbiosis (IS) with Machine Learning (ML) Techniques—A Framework for a Hybrid-Approach Anna Lütje, Martina Willenbacher, Martin Engelmann, Christian Kunisch and Volker Wohlgemuth Abstract Artificial Intelligence (AI) is one of the driving forces of the digital revolution in terms of the areas of application that already exist and those that are emerging as potential. One can envision the application field of Industrial Symbiosis (IS), so which potential role can AI play in the context of IS systems and how can AI support/contribute to the facilitation of IS systems, which is explored in more detail in this paper. A systematic literature review was conducted to identify problem/improvement driven fields of action in the context of IS and to present the current state of ICT tools for IS systems with corresponding implications. This led to the selection of suitable Machine Learning (ML) techniques, proposing a general framework of a combinatorial approach of Agent-Based Modelling (ABM) and ML for exploring the system dynamics of IS. This hybrid-approach opens up the simulation of scenarios with optimally utilized IS systems in terms of system adaptability and resilience. Keywords Artificial intelligence · Machine learning · Industrial symbiosis · Industrial ecology · Circular economy · Resource efficiency

A. Lütje (B) · M. Willenbacher Institute of Environmental Communication, Leuphana University Lüneburg, Universitätsallee 1, 21335 Lüneburg, Germany e-mail: [email protected] M. Willenbacher e-mail: [email protected] A. Lütje · M. Willenbacher · M. Engelmann · C. Kunisch · V. Wohlgemuth School of Engineering—Technology and Life, HTW Berlin, University of Applied Sciences, Treskowallee 8, 10318 Berlin, Germany e-mail: [email protected] C. Kunisch e-mail: [email protected] V. Wohlgemuth e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_9

117

118

A. Lütje et al.

1 Introduction Industrial Symbiosis (IS) belongs to the progressive field of Industrial Ecology and provides various approaches, perspectives and methods to tackle (at least partly) the nowadays (sustainability) challenges. It encompasses wide-ranging opportunities to improve resource productivity and efficiency while reducing the environmental burden [1–3]. IS is considered to be a key enabling factor for resource efficiency and circularity, which takes an emerging priority on the EU policy agenda [4]. The development of IS systems is triggered by pushing (pressures of environmental and health impacts, national regulations concerning environmental standards, waste management, carbon pricing/taxing, etc.) and pulling factors (public funding, incentives, etc.). “Industrial Symbiosis explores ways to establish knowledge webs of novel material, energy and waste exchanges and business core processes to facilitate the development of networks of synergies within and across different companies to support the development of high levels of nearly closed-loop material exchanges and efficiency of energy cascading within and across industrial ecosystems” [5]. The most cited definition is from Chertow [6], “IS engages traditionally separate industries in a collective approach to competitive advantage involving physical exchange of materials, energy, water and byproducts”. According to [7], “IS engages diverse organizations in a network to foster eco innovation and long-term culture change. Creating and sharing knowledge through the network yields mutually profitable transactions for novel sourcing of required inputs, value-added destinations for non-product outputs, and improved business and technical processes”. So in summary, IS is a systemic and collaborative (business) approach to optimize cycles of materials and energy [1, 3], share knowledge and initiate mutual learning processes [8] while generating ecological, technical, social and economic benefits. This implies cross-industry and cross-sectoral collaboration within a community through the exchange of material, energy, water and human resources [9]. In this era of increasing digitization, computer science and automation, Artificial Intelligence (AI) is one of the driving forces of the digital revolution in terms of the areas of application that already exist and those that are emerging as potential. One can envision the application field of Industrial Symbiosis (IS), so which potential role can AI play in the context of IS systems and how can AI support/contribute to the facilitation of IS systems, which is explored in more detail in this paper. As [10, 11] stated, AI techniques provide promising prospects for the field of IS. “(…) the current trend of growing amount of digitalized knowledge, coupled with machine learning algorithms, collective intelligence can be tapped by aggregating and further analyzing such vast reservoir of untapped knowledge sources to uncover new knowledge to help identify novel, synergistic, and environmentally benign IS process chains” [11].

Exploring the System Dynamics of Industrial Symbiosis (IS) …

119

2 Systematic Literature Review An extensive systematic literature review, employing a forward and backward snowballing technique, was conducted to analyze a total of 60 publicly and freely accessible papers, including 45 case studies, to extract knowledge of Artificial Intelligence (AI), specifically Machine Learning (ML), and Industrial Symbiosis (IS) in order to synthesize possible answers of the following research questions (RQs): RQ1: What are the problem-/improvement driven fields of action in the context of IS? RQ2: How can ML techniques leverage the technology-enabled fields of action of IS? The research activities were carried out according to Fig. 1, in order to approach the research questions in a systematic manner. First, this paper briefly introduces the basic ML techniques. Then, based on the systematic literature review, the current state of ICT tools for IS systems with corresponding implications are addressed and technology-enabled fields of action are identified, which, in sum, lead to respective implications of suitable ML techniques. The implications and discussions of this paper could significantly advance the speed, dynamics and scope of action of IS, triggering further future research in this emerging application area. German and English written publications were sourced from the following databases CORE, Semantic Scholar, Directory of Open Access Journals, ResearchGate and Google scholar. The search queries comprised the following terms “Artificial Intelligence”, “Machine Learning”, “Deep Learning”, “Artificial Neural Networks” and “Industrial Symbiosis”.

Fig. 1 Research framework

120

A. Lütje et al.

Fig. 2 Overview and differentiation of AI techniques

3 Artificial Intelligence Techniques 3.1 Positioning of Machine Learning as an Artificial Intelligence Technique AI describes an entire knowledge field, in which it is a matter of putting computers/machines in a position to gain the capability of perceiving and processing input information, in order to (re-)act in a manner resembling human intelligence [12]. Figure 2 shows the overview and differentiation of AI techniques, where a distinction is made between general and narrow AI. While general AI demands extensive generalist capabilities, narrow AI focuses on a single subset of cognitive abilities, so the computer can perform specific tasks with progressive advancements in that spectrum. Machine Learning (ML) is a sub-part of AI and a method to solve narrow AI tasks. ML is a generic term for the “artificial” generation of knowledge from experience. So ML is a statistical learning method in which each instance in a dataset is assigned a respective set of characteristics or attributes, whereas Deep Learning (DL) is also a statistical learning method but extracts features or attributes out of raw data [13]. DL deploys neural networks (Artificial Neural Networks ANN) with hidden layers, big amounts of data, and powerful computational resources [13].

3.2 Machine Learning This section addresses the four main types of Machine Learning (ML), providing introductory differentiations between the methods, illustrated in Fig. 3. Classical ML can be divided into two categories: supervised and unsupervised learning. Supervised learning deals with labeled data, so the data is pre-categorized by

Exploring the System Dynamics of Industrial Symbiosis (IS) …

121

Fig. 3 Types of machine learning

a supervisor/teacher or numerical. There are two types of supervised learning: classification and regression. With classification an object’s category shall be predicted. Example algorithms for classification are decision tree, Naïve Bayes, k-nearest neighbours and support vector machine. With regression a specific point on a numeric axis shall be forecasted by example algorithms of linear and polynomial regressions, indicating average correlations. Unsupervised learning handles unlabeled data, so the machine/computer tries to solve a task by detecting patterns on its own. It can be used for explorative data analysis. Clustering is a classification but without predefined classes, it divides objects based on unknown features by similarity (example algorithms: K-means clustering, Mean-Shift, Density-based spatial clustering of applications with noise (DBSCAN)). Objects that have lots of similar features are merged into one cluster. Generalization (Dimension Reduction) can find hidden dependencies by assembling specific features into more abstract/high-level ones (example algorithms: Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Latent Dirichlet allocation (LDA), Latent Semantic Analysis (LSA)). Association learns rules by finding patterns in an ordered sequence in order to predict the next item in the sequence (example algorithms: Apriori, Euclat, FPgrowth). Ensembles are the result of the combinatorial application of several algorithms that correct each other’s mistakes (example algorithms: Random Forest, Gradient Boosting). This includes for example the method of stacking, the output of several parallel models, using different algorithms on the same dataset, is passed to the last one which compiles a final decision. Reinforcement learning takes note of rewards received by specific actions and record it, in order to adapt the model (example algorithms: Q-Learning, State–action–reward–state–action (SARSA), A3C). Reinforcement learning is used in cases when the problem is not data-related and all possible eventualities cannot be foreseen in a specific environment (e.g. real environment, virtual city). The principle is rather to minimize error and increase rewards, than to predict all possible actions. Deep Learning utilizes artificial neural networks (ANN) to explore complex and huge datasets (example architectures: Perceptron, Convolutional Network (CNN), Recurrent Networks (RNN)). ANN are mathematical models that are suitable for

122

A. Lütje et al.

Fig. 4 Artificial neural network mechanism

tasks such as the recognition of functional relationships between physical, chemical, economic and ecological as well as environmental parameters [14]. ANN strive to create complex neural networks, mimicing the learning process of the human brain. They consist of a number of computing units (the neurons), which are connected to each other via information channels and are usually arranged in layers, see Fig. 4. ANN are interesting for all applications where there is no, incomplete or only little explicit (systematic) knowledge about the problem to be solved [15]. These are, for example, text, image and face recognition, where several hundred thousand to millions of pixels have to be converted into a comparatively small number of permitted results. In a supervised learning network, the input data is typically ‘labeled’, so engineers provide information about the input. For example, when the task should be to visually diagnose a melanoma, the engineers provide the network with characteristics of a melanoma such as edges, colors, size, depths etc. [16]. Then the first network layer is targeted for the first characteristic representation, e.g. the edges [16]. After successfully detecting those, the result flows into the second layer, which tries to identify another characteristic feature, e.g. color [16]. After correctly identifying this, the results of the first and second layer are fed into the third layer and so on [16]. So it is an iterative procedure with special focus on predetermined characteristics in each decision loop. Unsupervised neural networks, which imitate human learning more closely, typically process ‘unlabeled’ data, so they freely analyze and detect objects such as potential melanoma on their own [16]. They efficiently extract relevant information from multiple layers of neural networks according to their defined task [16]. Unsupervised learning is usually applied in deep learning techniques. At the University of California, a trained Google deep-learning network was able to detect melanoma with an accuracy rate of 96% [16]. There are already apps such as FirstDerm and SkinVision which allow remote diagnosis by taking a picture of the skin abnormality and sending it to a physical doctor. In the future, these

Exploring the System Dynamics of Industrial Symbiosis (IS) …

123

apps can pre-diagnose skin abnormalities, being able to differentiate between benign moles and different types of carcinomas [16]. Based on the patient’s data combined with large data bases of treatment options, IBM’s Watson AI computer can already support physicians with treatment recommendations as it was shown in a trial of 1000 cases [16]. 99% of the cases were consistent with the same results as the original oncologist’s treatment plans [16].

4 Fields of Action of Industrial Symbiosis 4.1 Industrial Symbiosis Case Studies Many companies in the context of circular economy and Industrial Symbiosis (IS) approach their “waste” streams as new business opportunities or extended business models. For example, waste heat/exceeding steam can be forwarded to other companies [17–20] and if the company operates an own (renewable) power plant (e.g. block heat and power station, photovoltaic system), it can sell its exceeding energy to other companies as well, acting as an energy supplier, and thus, expand its business model. Based on the conducted case study analysis, the following systematization scheme for the points of contact of IS activities was derived. Beside non-material exchanges such as knowledge and utility sharing as well as joint management of disposal and procurement, there are material exchanges of water, energy and materials. In addition to energy and resource efficiency measures within a single company, such as internal water treatment and recycling or heat recovery, the IS points of contact can be systematized in four categories: (1) waste heat/steam, (2) gaseous waste/aerosols, (3) solid waste and (4) liquid waste (differentiating waste water and sludge/mud). Based on these points of contact, companies can extract valuable substances or forward (reprocessed) output flows to other entities. For example, a smeltery in China took full advantage of by-product utilization by recovering raw materials out of gaseous waste/aerosol, sludge/mud and solid waste by filtering, extracting, concentrating and compressing [21]. The Guitang Group in China solved a disposal problem by using their sludge as the calcium carbonate feedstock to a new cement plant, while reducing residual and waste flows [22]. Gaseous waste streams such as fly ash can be used as cement additive [23–25] or soil additive [26, 27]. Moreover, waste water from a company that processes food such as olives, cereals, fruit and vegetables can be further used for the fertilized irrigation of agricultural land [2, 27]) or further processed into a fertilizer product. Organic residual (solid) waste can be converted to animal/fish feeding (material utilization) [28, 29], where health and hygiene reasons need to be considered, or can be used as biogas and biofuel (energetic utilization) [28]. The digestate of a biogas plant can be reused as fertilizer again [28, 30].

124

A. Lütje et al.

Furthermore, structural IS formations and morphologies were analyzed by different researchers [31–36]. Each entity in an IS system can act as an originator and/or receiver, the structural formation and composition depends on the geographical and business environment, the number of connections (connected entities) and the tie direction, shaping various structure patterns, which have not been yet classified. To sum up, the successful operation of an IS system highly depends on the system resilience, which is mainly related to the material/energy flows within the IS system and how it is connected with which entities.

4.2 Current State of ICT Tools Potential IS activities between companies can be identified when an interorganizational communication and information exchange is facilitated [37], which can be significantly enabled by Information and Communication Technologies (ICT) [38, 39]. Grant et al. [40] listed 17 ICT applications for IS in their study, whereat most of them are either inoperative or not publicly available. They stated that those tools were developed at an early stage of IS development, what may be the reasons why most of them are not active anymore. Most of the investigated ICT tools for IS concentrate on output-input matching of various resource flows among industrial organizations, functionality and technical opportunities [40], but do not provide comprehensive decision support such as advanced analysis regarding economic viability of different IS opportunities [41] and ecological impacts (reduction). Also the social aspects such as community and trust building, connecting economic actors by building human and business relationships are not considered [10, 40, 42]. Maqbool et al. [43] assessed 20 European IS supporting information technology tools. They found out, that the improvement potentials identified by Grant et al. [40] are being removed by the newer IT tools developed in Europe [43]. However, the IT tools still predominantly focus on the identification of IS opportunities, the implementation and management of IS activities are not addressed at all [43]. Maqbool et al. [43] stated, that “matchmaking tools and assessment methodologies hold the most promise for innovation for academia, IT tool developers and the facilitators of industrial symbiosis”. All investigated tools focus on the as-is analysis of the IS system. However, possible future scenarios as well as conceivable transformation paths from the actual state to the desired target vision are not addressed at all. The following tools are first attempts to simulate the effects and impacts of possible IS activities. Raabe et al. [41] used an agent-based modelling approach (ABM), called by-product exchange network (BEN) model, in order to develop a system architecture of an IS collaboration platform. They applied the model to a case study of food waste in Singapore as a decision support tool for companies to evaluate the economic viability of IS [41]. So in the model, entities such as plants or facilities in the IS network are represented by agents that are programmed based on rules

Exploring the System Dynamics of Industrial Symbiosis (IS) …

125

to actively consume and/or produce resources, while resources are represented by agents that passively change their states such as quantities and locations [41]. Yazdanpanah et al. [44] approached IS networks with coordinated game theory and normative multi-agent systems addressing fair and stable benefit allocation among the entities involved. Yazan and Fraccascia [45] developed an IS decision-support tool which is based on an enterprise input-output model providing a cost-benefit analysis. An agent-based simulation was integrated to show how companies share the total economic benefits resulting from IS activities. They proposed an IS model, which allows the exploration of “the space of cooperation, defined as the operationally favourable conditions to operate IS in an economically win-win manner” [45].

5 General Framework for a Hybrid-Approach As Agent-Based Modeling (ABM) was already applied to IS systems in previous scientific work, in order to investigate the system dynamics, deriving recommendations for action, we propose a hybrid-approach of ABM and ML. A useful application of Machine Learning (ML) can be considered, when the problem’s complexity and need of adaptivity are given. Especially in order to meet the dynamics of Industrial Symbiosis (IS) systems with their complex and high number of features and attributes (e.g. fluctuating material and energy flows, changing composition of the entities involved) in an adequate manner, powerful decision support tools can be applied to process massive amounts of data, recognizing patterns and recommending concrete actions. ABM deals with multiple agents, interacting with each other and operating in a defined environment, following basic (learning) rules and decision-making heuristics to regulate the exchanges with other agents (requiring a full ontology) [46]. ABM simulates the agent’s actions, revealing their corresponding effects to other agents and the environment. The decisive weak point is, that an individual agent does not know the entire problem space and needs to discover appropriate solutions by learning. The learning phase can be accelerated by applying ML techniques such as Reinforcement Learning and Deep Learning. The combinatorial approach turns an ABM into an adaptive ABM with intelligent agents. ABM and ML provides mutually benefits, on the one hand ML can use ABM as an environment and a reward generator and on the other hand ABM can use ML to refine the internal models of the agents [47]. For example, neural networks can be used as computational emulators of entire ABMs (e.g. as a computational approximation of the non-linear, multivariate time series generated by the ABM) [48]. Applying reinforcement learning in ABM, an industrial entity agent may be given certain targets such as stable material/energy flows, rather than fixed and hand-crafted rules. At the beginning, the agents have little knowledge of the world, but given a goal modelling reward function, the agents learn to perform better over time.

126

A. Lütje et al.

Fig. 5 Framework for incorporated ML in ABM (according to Rand [47])

Figure 5 shows a general framework of an adaptive ABM. Once the virtual environment, agents and the internal models are created, the agents observe the world, these observations are sent to the incorporated ML, where the observations are recorded, then the internal model is updated based on the made experiences and an action is recommended, which is recorded again. The information of the recommended action is sent to the ABM again, where the agents are enabled to refine their internal model in order to take the virtual action and to learn (iteratively converge to a problem solution) by taking note of received rewards. This hybrid-approach opens up the simulation of scenarios with optimally utilized IS systems in terms of (best) connectivity of the entities, material/energy flows and system resilience. It enables the investigation of effects of certain IS activities and disruptive events such as the entry/exit of entities and fluctuating material/energy flows. It can simulate the necessary flows adaptability range (minimum-maximum flow) to maintain the IS system without malfunction. Furthermore, it reveals impacts of the structure and composition of the entities involved in an IS system. The combinatorial approach of ABM and ML can play an essential role in the simulation of the consequences of possible dynamics in an IS system, mining and synthesizing knowledge for further action and prioritization of IS measures. The innovative part is the integration of ML algorithms to simulate and model various transformation pathways with two anchor points: from the actual state of the IS system to the desired future scenario which can be aligned to definable (sustainability) goals (e.g. “zero waste”, “zero emission” or “CO2 -neutral”-park). It can therefore simulate transformation pathways in which the resilience of the IS system can be ensured at the highest level.

Exploring the System Dynamics of Industrial Symbiosis (IS) …

127

6 Discussion and Concluding Impulses for Future Research By the digitalization of data collection and processes, companies can track their economic, social and ecological performance through advanced and intelligent Information and Communication Technology (ICT). Once a global infrastructure of information networks is established, the Internet of Things (IoT) allows that relevant information from the real world can be automatically captured, linked together and made available in the network to be further processed via intelligent, dynamic analysis models and ML techniques. By using ML methods in a combinatorial application with other techniques such as ABM in the context of IS, previous challenges can be solved to some extent. For example, the complexity and dynamics of such IS systems can be captured and used in a targeted and solution-oriented way. The dynamics of the entities as well as significant changes in processes and material flows can be better coped with via facilitated mapping of resilience and adaptation scenarios to changed framework conditions (e.g. companies have entered or left the IS system). Due to adaptive ABM and ML algorithms, considerably fewer resources (temporal, financial, human capacities, etc.) are required in the identification process of IS possibilities and development of transformation pathways. Supplementary to this, better scenarios can be simulated with a correspondingly extensive and adaptive knowledge base in order to deal with uncertainty about effects or effectiveness of planned/prioritized IS measures. ML techniques alone provide extended solutions for knowledge creation but to exhaust their potentials, hybrid-approaches with other methods such as ABM can significantly expand the scope of action to simulation and modelling impacts of identified/planned/implemented IS activities and, hence, enabling the development of transformation pathways/scenarios and a comprehensive understanding of the dynamics of IS systems. Even if technological progress is steadily continuing and converging to some problem solutions, it is noteworthy, that every kind of resource efficiency and productivity increase is trivial, as it only delays the negative impact or spreads it over the extended time period, so technology alone cannot solve humankinds’ challenges. So we, as a global society, need to pose the question of “what and where are the thresholds of bearable industrial activities and their respective social and environmental impacts?” Acknowledgements This doctoral research work is co-financed by funds from the Berlin Equal Opportunities Programme.

References 1. Chertow, M.R.: Industrial symbiosis. Encycl. Energy 3, 407–415 (2004) 2. Chertow, M.R.: Industrial ecology in a developing context. In: Clini, C., Musu, I., Gullino, M. (eds.) Sustainable Development and Environmental Management, pp. 1–19. Springer (2008)

128

A. Lütje et al.

3. Herczeg, G., Akkerman, R., Hauschild, M.Z.: Supply chain management in industrial symbiosis networks. Ph.D. thesis, Technical University of Denmark, pp. 7–45 (2016) 4. EEA, European Environment Agency: More from less—material resource efficiency in Europe. EEA report No. 10/2016 Technical report European Environment Agency (2016) 5. Li, X.: Industrial Ecology and Industry Symbiosis for Environmental Sustainability—Definitions, Frameworks and Applications. Palgrave Macmillan (2018). ISBN 978-3-319-67500-8. https://doi.org/10.1007/978-3-319-67501-5 6. Chertow, M.R.: Industrial symbiosis: literature and taxonomy. Ann. Rev. Energy Environ. 25(1), 313–337 (2000) 7. Lombardi, D.R., Laybourn, P.: Redefining industrial symbiosis. J. Ind. Ecol. 16(1), 28–37 (2012) 8. Taddeo, R., Simboli, A., Morgante, A., Erkman, S.: The development of industrial symbiosis in existing contexts. Experiences from three Italian clusters. Ecol. Econ. 139, 55–67 (2017). https://doi.org/10.1016/j.ecolecon.2017.04.006 9. Ruiz-Puente, C., Bayona, E.: Modelling of an industrial symbiosis network as a supply chain. Conference Paper (2017) 10. Van Capelleveen, G., Amrit, C., Yazan, D.M.: A literature survey of information systems facilitating the identification of industrial symbiosis. In: Otjacques, B., et al. (eds.) From Science to Society, Progress in IS. Springer International Publishing AG (2018). https://doi.org/10.1007/ 978-3-319-65687-8_14 11. Yeo, Z., Masi, D., Low, J.S.C., Ng, Y.T., Tan, P.S., Barnes, S.: Tools for promoting industrial symbiosis: a systematic review. J. Ind. Ecol. 1–22 (2019). https://doi.org/10.1111/jiec.12846 12. Gambus, P., Shafer, S.L.: Artificial Intelligence for everyone. Anesthesiology 128(3), 431–433 (2018). https://doi.org/10.1097/aln.0000000000001984 13. Shorten, C.: Machine learning vs. Deep learning (2018). https://towardsdatascience.com/ machine-learning-vs-deep-learning-62137a1c9842. Accessed 18 June 2019 14. Biegler-König, F., Bärmann, F.: A learning algorithm for multilayered neural networks based on linear least squares problems. Neural Netw. 6(1), 127–131 (1993). https://doi.org/10.1016/ S0893-6080(05)80077-2 15. Sarle, W.: Neural networks and statistical models. In: Proceedings of the Nineteenth Annual SAS Users Group International Conference, Cary, pp. 1538–1550. SAS Institute, NC (1994). ftp://ftp.sas.com/pub/neural/neural1.ps 16. Meissner, G.: Artificial intelligence-consciousness and conscience (2018). https://doi.org/10. 13140/rg.2.2.36626.76488 17. Earley, K.: Industrial symbiosis: harnessing waste energy and materials for mutual benefit. Renew. Energy Focus 16(4), 75–77 (2015). https://doi.org/10.1016/j.ref.2015.09.011 18. Mirata, M.: Experiences from early stages of a national industrial symbiosis programme in the UK: determinants and coordination challenges. J. Clean. Prod. 12, 967–983 (2004) 19. Pakarinen, S., Mattila, T., Melanen, M., Nissinen, A., Sokka, L.: Sustainability and industrial symbiosis—the evolution of a Finnish forest industry complex. Resour. Conserv. Recycl. 54(12), 1393–1404 (2010). https://doi.org/10.1016/j.resconrec.2010.05.015 20. Yu, F., Han, F., Cui, Z.: Evolution of industrial symbiosis in an eco-industrial park in China. J. Clean. Prod. 87, 339–347 (2015). https://doi.org/10.1016/j.jclepro.2014.10.058 21. Yuan, Z., Shi, L.: Improving enterprise competitive advantage with industrial symbiosis: case study of a smeltery in China. J. Clean. Prod. 17, 1295–1302 (2009). https://doi.org/10.1016/j. jclepro.2009.03.016 22. Zhu, Q., Lowe, E.A., Wei, Y., Barnes, D.: Industrial symbiosis in China: a case study of the Guitang Group. J. Ind. Ecol. 11, 31–42 (2008) https://doi.org/10.1162/jiec.2007.929 23. Dong, L., Fujita, T., Zhang, H., et al.: Promoting low-carbon city through industrial symbiosis: a case in China by applying HPIMO model. Energy Policy 61, 864–873 (2013). https://doi. org/10.1016/j.enpol.2013.06.084 24. Cui, H., Liu, C., Côté, R., Liu, W.: Understanding the evolution of industrial symbiosis with a system dynamics model: a case study of Hai Hua industrial symbiosis, China. Sustainability 10(3873) (2018). https://doi.org/10.3390/su10113873

Exploring the System Dynamics of Industrial Symbiosis (IS) …

129

25. Golev, A., Corder, G., Giurcob, D.P.: Industrial symbiosis in Gladstone: a decade of progress and future development. J. Clean. Prod. 84, 421–429 (2014). https://doi.org/10.1016/j.jclepro. 2013.06.054 26. Bain, A., Shenoy, M., Ashton, W., Chertow, M.: Industrial symbiosis and waste recovery in an Indian industrial area. Resour. Conserv. Recycl. 54, 1278–1287 (2010) 27. Notarnicola, B., Tassielli, G., Renzulli, P.A.: Industrial Symbiosis in the Taranto industrial district: current level, constraints and potential new synergies. J. Clean. Prod. 122, 133–143 (2016). https://doi.org/10.1016/j.jclepro.2016.02.056 28. Alkaya, E., Bö˘gürcü, M., Uluta¸s, F.: Industrial symbiosis in Iskenderun Bay: a journey from pilot applications to a national program in Turkey. In: Conference Paper, Proceedings of the Conference SYMBIOSIS 2014 (2014) 29. Chertow, M.: Uncovering industrial symbiosis. J. Ind. Ecol. 11(11), 11–30 (2007) 30. Martin, M.: Industrial Symbiosis in the Biofuel Industry: quantification of the environmental performance and identification of synergies. Dissertation No. 1507, Linköping Studies in Science and Technology (2013). ISSN: 0345-7524 31. Song, X., Geng, Y., Dong, H., Chen, W.: Social network analysis on industrial symbiosis: a case of Gujiao eco-industrial park. J. Clean. Prod. 193, 414–423 (2018). https://doi.org/10. 1016/j.jclepro.2018.05.058 32. Ashton, W.S.: Understanding the organization of industrial ecosystems: a social network approach. J. Ind. Ecol. 12(1), 34–51 (2008) 33. Chopra, S.S., Khanna, V.: Understanding resilience in industrial symbiosis networks: insights from network analysis. J. Environ. Manage. 141, 86–94 (2014) 34. Doménech, T., Davies, M.: Structure and morphology of industrial symbiosis networks: the case of Kalundborg. Procedia Soc. Behav. Sci. 10, 79–89 (2011) 35. Doménech, T., Davies, M.: The social aspects of industrial symbiosis: the application of social network analysis to industrial symbiosis networks. Prog. Ind. Ecol. 6(1), 68–99 (2009) 36. Zhang, Y., Zheng, H., Chen, B., Yang, N.: Social network analysis and network connectedness analysis for industrial symbiotic systems: model development and case study. Front. Earth Sci. 7(2), 169–181 (2013). https://doi.org/10.1007/s11707-012-0349-4 37. Ismail, Y.: Industrial symbiosis at supply chain. Int. J. Bus. Econ Law 4(1). ISSN: 2289-1552 (2014) 38. Lütje, A., Willenbacher, M., Möller, A., Wohlgemuth, V.: Enabling the Identification of Industrial Symbiosis (IS) through Information Communication Technology (ICT). In: Proceedings of the 52nd Hawaii International Conference on System Sciences (HICSS), pp. 709–719 (2019). ISBN: 978-0-9981331-2-6. http://hdl.handle.net/10125/59511 39. Sakr, D., El-Haggar, S., Huisingh, D.: Critical success and limiting factors for eco-industrial parks: global trends and Egyptian context. J. Clean. Prod. 19, 1158–1169 (2011). https://doi. org/10.1016/j.jclepro.2011.01.001 40. Grant, G.B., Saeger, T.P., Massard, G., Nies, L.: Information and communication technology for industrial symbiosis. J. Ind. Ecol. 14(5), 740–753 (2010) 41. Raabe, B., Low, J.S.C., Juraschek, M., Herrmann, C., Tjandra, T.B., Ng, Y.T., Kurle, D., Cerdas, F., Lueckenga, J., Yeo, Z., Tan, Y.S.: Collaboration platform for enabling industrial symbiosis: application of the by-product exchange network model. In: Procedia Conference on Life Cycle Engineering CIRP, vol. 61, pp. 263–268 (2017) 42. Isenmann, R.: Beitrag betrieblicher Umweltinformatik für die Industrial Ecology—Analyse von BUIS-Software-Werkzeugen zur Unterstützung von Industriesymbiosen. In: Gómez, J.M., Lang, C., Wohlgemuth, V. (eds.), IT-gestütztes Ressourcen- und Energiemanagement. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-35030-6_37 43. Maqbool, A.S., Mendez Alva, F., Van Eetvelde, G.: An assessment of european information technology tools to support industrial symbiosis. Sustainability 11(131) (2019). https://doi.org/ 10.3390/su11010131 44. Yazdanpanah, V., Yazan, M., Zijm, H.: Industrial symbiotic networks as coordinated games. In: Dastani, M., Sukthankar, G., André, E., Koenig, S. (eds.) Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), Stockholm, Sweden (2018)

130

A. Lütje et al.

45. Yazan, D.M., Fraccascia, L.: Sustainable operations of industrial symbiosis: an enterprise inputoutput model integrated by agent-based simulation. Int. J. Prod. Res. (2019). https://doi.org/ 10.1080/00207543.2019.1590660 46. Livet, P., Phan, D., Sanders, L.: Why do we need ontology for agent-based models? Complex. Artif. Mark. 133–145 (2008) 47. Rand, W.: Machine learning meets agent-based modeling: when not to go to a bar (2006). https://ccl.northwestern.edu/papers/agent2006rand.pdf. Accessed 18 June 2019 48. Van Der Hoog, S.: Deep learning in (and of) agent-based models: a prospectus (2017). Preprint arXiv:1706.06302. https://arxiv.org/pdf/1706.06302.pdf. Accessed 18 June 2019

Graph-Grammars to Specify Dynamic Changes in Topology for Spatial-Temporal Processes Jochen Wittmann

Abstract The present paper first offers an overview of the basic approaches available for the dynamic specification of spatio-temporal geo-objects from the field of simulation technology or geoscience and how these are implemented in corresponding software systems. Based on this analysis, the possibilities of dynamics for the GIS primitives point, line and polygon are classified. Subsequently, the approach of graph grammars from the field of formal languages is transferred to the problems of the dynamics specification of topologies of spatio-temporal objects and the algorithmic optimization potential for the implementation of this approach is pointed out. Keywords Modeling · Simulation · GIS · Spatio-temporal · Model specification paradigm

1 Motivation: Models with Spatial Reference The requirements for analysis, modelling and simulation of dynamic processes have changed fundamentally in recent years. On the one hand, there has been an increasing spread of smartphones with automatic position determination via GPS satellites on the data acquisition side, which has become the standard even for the consumer sector, and on the other hand with free, convenient and fast access to geographical maps and images, e.g. via the web GIS Google Maps [1] or OpenStreetMap [2]. The approaches to modelling can now be geographically differentiated with high spatial resolution. On the part of computer science, this trend is supported by the concepts of objectoriented programming languages and individual-based modelling techniques, which make it possible to handle a large number of objects or individuals, even spatially differentiated ones, in a relatively simple and descriptive way (see e.g. [3]) or an introduction to object-oriented programming [4]. While the collection and storage of space-time data is essentially solved by corresponding object-oriented database J. Wittmann (B) Hochschule für Technik und Wirtschaft Berlin, University of Applied Sciences, Wilhelminenhofstraße 75A, 12459 Berlin, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_10

131

132

J. Wittmann

concepts, the specification of dynamic models with reference to space and time is difficult, which will be shown in detail in the following.

2 State of the Art: Modeling Spatio-Temporal Processes 2.1 Spatial Reference in Simulation From the point of view of simulation technology, an overview will be given here of whether and how it is possible to depict spatial phenomena using traditional modelling approaches.

2.1.1

Continuous Simulation

This section deals with ordinary and partial differential equations. With the common differential equation models the space relation is limited. Two alternatives are common: In the first case, the value of a state variable represents a stock at a fixed place, but not specified in the equations (e.g. the population number for Germany). Either the location is assumed to be point-like and concentrates the entire population there on itself, or the population is assumed to be homogeneously distributed over the area or space of the entity depicted. In the second case, the differential equation describes the movement of an entity (assumed to be point-like), whereby the surrounding space in which this movement takes place can have any dimension. In both cases the space model must be specified by the modeler himself, the modeling paradigm offers neither limits nor support.

2.1.2

Discrete Simulation

The discrete modeling paradigms offer a completely different view of dynamic systems. Sudden state changes, which can be interpreted without any time delay at a given point in time and are activated by a triggering condition, are the only mechanism for describing system dynamics. Such a mechanism can be used to map state changes for entities with a fixed position in whatever specified space, but also to map “sudden” movements when the state change described in the event body maps a position change of the entity. In both cases, the discrete modeling paradigm does not specify the specification of the space, it is completely at the discretion of the modeler. Here, too, the modeling approach offers complete freedom for the space model used, but on the other hand it does not provide any support for its construction and administration.

Graph-Grammars to Specify Dynamic Changes …

2.1.3

133

Special Case Cellular Automaton

The situation is quite different with a special case of discrete models, the cellular automatons. Here, the modeling concept provides a homogeneous grid (usually square) for describing the space, which corresponds to the concept of raster data in GIS. In each raster element, a state is assumed that is clocked and changes according to identical rules for all raster cells depending on the states of the neighboring cells event-driven in the sense of discrete simulation. Essentially, all the approaches presented by O’Sullivan and Perry [5] are based on such extensions of a cellular automaton as, for example, differential variations for the behavior within a cell, random walk approaches for describing motion in the grid, and communication between cells. Due to its simplicity and its direct transferability to a surface dynamic, this approach is widely used and the originally very strict definition of the cellular automaton is adapted to the conditions of the respective object under investigation by countless extensions and adaptations (honeycombs instead of quadratic cells, multidimensional state vector for each cell, various neighborhood variants, etc.). However, the common basic concept is a tiling of the plane with cells identical in form and behavior. Extension to the 3-dimensional space with identical, cube-shaped space elements is conceivable, however, because of their similarity to the finite element approaches they have found little distribution.

2.1.4

Object-Oriented and Individual-Based Approaches

In contrast to the previous approaches, the object-oriented or even individual-based modeling paradigm (for a precise definition and differentiation, here only Ortmann [3] is referred to for reasons of space) usually offers all expressive possibilities of a general (object-oriented) programming language in connection with a kernel for the runtime system, which regulates the communication and synchronization of the model entities (see [6, 7]). Such approaches ensure the correctness of the temporal behavior. With regard to the space reference in which the model objects act, the modeler has no restrictions or assistance here either.

2.1.5

Evaluation of Approaches from the Field of Modelling Techniques

As already hinted at in the characterization of the concepts, the difficulty lies in the fact that a spatial reference is possible, but conceptually little supported. Only the partial differential equation systems and the cellular automata explicitly demand a spatial reference, but they are also the approaches that limit the freedom of the modeler the most. One approach that pursues an integrative concept is the MARS system with the claim of a simple, intuitive GIS coupling and the focus on individualoriented behavioral description [8]. In the applications, however, the group concentrates more on the individual-oriented concepts and does not consistently pursue the

134

J. Wittmann

ideas of representing 4D dynamics. Although other approaches offer the possibility of an interface to spatial reference, many applications nevertheless dispense with an explicit representation of geographical space, probably, according to the author’s subjective opinion, because of the growing complexity of the resulting models. What is certain, however, is that a genuine geographical spatial concept has not found its way into any of the modelling paradigms, although work with geodata is becoming increasingly important.

2.2 Dynamics in GIS In this section the analysis concentrates on the view of geographical information systems and their ability to represent dynamics and to model and simulate dynamic processes.

2.2.1

Visualization Techniques

Irrespective of the type of specification of spatial dynamics, applications in the GIS environment regularly face the difficulty of visualizing the dynamic changes of the displayed objects (features) in the sense of a result representation. In order to see whether the methods used here can also be used for the dynamics specification, they will be briefly discussed here. Essentially, three fundamentally different methods for dynamics visualization can be used, which are more or less conveniently offered as tools or software features in the corresponding software systems: – First, the representation of the dynamics in the form of an animation (if necessary with fading technique) of single images (maps) with an interactive slider for the time axis [9]. – Or, in the sense of a static map image, the representation of n individual maps for n selected points in time or periods, as known from a history atlas, for example [10]. – Or the transfer of time to a suitable symbology of the displayed objects; e.g. different colors for different time intervals in which the object is active, or a color coding of the change for an attribute value over time (so-called “change maps”, also in [10]). Irrespective of the advantages and disadvantages of these alternatives, it should be noted at this point that in all cases it is purely a matter of visualizing the result data of a dynamic modelling or a simulation run, but not of specifying the dynamics themselves. Consequently, these methods do not help in the sense of a modelling language for processes in space and time.

Graph-Grammars to Specify Dynamic Changes …

2.2.2

135

Methodological Approaches

In addition to pure visualization, there are also a few approaches in the field of geosciences that methodologically process the mapping of dynamic processes. In the following, three of these approaches will be presented as examples and discussed with regard to their usefulness for a model specification.

Yattaw: Classification of Spatial-Temporal Geo-Objects Yattaw [11] provides a comprehensive overview of the classification of entities in space and time. On one axis one finds the basic types point, line, area, volume, on the other axis a characterization of the dynamic behavior with the classes static, continuous, recurring and erratic. Analogies to the described classes of modeling paradigms of the first chapter are obvious, a new aspect offer the periodic phenomena. This compilation shows an approach similar to object-oriented modeling, but Yattaw does not go beyond the purely descriptive level of classification and offers no suggestions on how to specify the classified dynamic types in the sense of a model description.

openMI-Standard as a Data Exchange Format Another development that goes beyond mere classification is the openMI standard [12], a data exchange format for loosely coupled simulation components. It is assumed that there is no self-contained simulation model for the entire system, but that several specialized submodels execute the time progress as independent components and exchange the necessary (naturally time-consuming) inputs and outputs via a synchronized data interface. The openMI standard has been designed for this data interface and allows objects with both time and space reference. Essentially, it offers a specified data model for the classification made at Yattaw. Of course, the basic object classes point, line and area also form the basis here. However, this interface is also limited purely descriptively to the description of a current system state, the description of the inherent dynamics as well as the time continuation of the simulation clock is reserved for the proprietary connected partial models. This approach has been very successful in the field of hydrodynamic models, as various studies by Harpham et al. have shown [13].

Yuan: Queries on Events in Spatial-Temporal Datasets Certainly not an established approach, but an approach that shows how proprietary the conceptual problem of simultaneous space and time reference is solved can be found in Yuan [14]. The main purpose of this paper is to analyze weather events, more precisely storms. These are characterized by the fact that they are defined

136

J. Wittmann

as temporally related phenomena (the storm with several rain fronts), which lead to effects at different locations in geographical space (the fronts move on during the storm process). In other words, a storm is a geo-object with both spatial and temporal dynamics. The aim of the work is to detect all observations belonging to a storm object defined in this way in space and time in a weather database. For this Yuan defines the storm as a so-called “event” which forms a union of several related “processes”. A process in turn is defined as a temporal sequence of individual spatially and temporally fixed states. This makes it possible to query the temporal and spatial course of a storm. Two comments on this approach: First of all, it should be noted that there is no real specification of the process dynamics. Rather, data already available through measurements in a database with a “search template”, which is structured by the self-introduced terms “state”, “process”, and “event”, is searched for a “storm event”, which manifests itself through spatially and temporally differentiated data. A causal description of the storm dynamics, which algorithmically describes where and when the storm appears with which rain or wind values, and which could be further developed in the sense of a prognosis into the future, is not given. The second remark refers to the formation of terms for yuan, which is diametral different from the usual use of the terms “process” and “event” in the field of simulation technology. The following section will introduce these terms in detail. At this point it becomes clear how an interdisciplinary approach can constructively bring together already structured knowledge of the two disciplines involved and how such cooperation can lead to mutually beneficial synergies. In any case, the approach clearly shows the need for what Yuan calls “object-field-representation for complex phenomena”.

3 Coupling Simulation and GIS at Software System Level Essentially, the two approaches of the preceding sections can be summarized under two aspects: Both systems are highly specialized for their purpose: GIS focus on the precise mapping of spatial objects in relation to the earth’s surface and offer little support in the task of specifying dynamic value changes for attributes of geo-objects. Simulation paradigms provide constructs for dynamic description, but have neither established concepts for the representation of (geographical) space nor corresponding methods for spatial analysis. For application projects, therefore, the decision is often not made for one of these two software systems, but for an alternative solution that does not require a closed, methodological basis and is ultimately based solely on the software coupling of the Geographical Information System and the simulation system. This makes it possible to use the specialized functionalities of both systems, but at the price of a more or less complicated and often slow data interface.

Graph-Grammars to Specify Dynamic Changes …

137

Such a combination or integration of simulation tool and GIS can now be discussed on two different levels and implemented accordingly (a) Functional aspects of integration The aim is to combine the functional possibilities of GIS and simulation tools. This quickly leads to two fundamentally different solution alternatives, which have been discussed in the literature for a long time – Integrate model dynamic description into GIS – Integrate GIS functionality into the simulation model or its specification. (b) Software technical aspects of the integration Transferred to the software architecture level, this also results in two fundamentally different solution approaches: Namely on the one hand, – to summarize all functionalities within a universal, spatio-geographical simulation system, or on the other hand, – to implement a problem-related, individually adapted solution or architecture for the individual case, which must then, however, abandon the claim to universality. In addition, there are increasingly intermediate solutions for software integration that are based on a dynamic coupling of specialized modules for the execution of necessary functionalities. For example, QGIS offers the possibility of integrating new functionalities via plug-ins, such as a geospatial simulation plugin for point-based simulation models [15]. All solutions are taken in practice. Since the discussion in these cases concentrates on software technological aspects and the concepts for the description of spatio-temporal processes are methodologically not touched, the respective arguments should not be described here in a more detailed way for reasons of space. However, it is noticeable that with typical applications, especially with different spatial and/or temporal scaling, deficits occur again and again which cannot be treated in a user-oriented way even with an integrated solution. Therefore, in the following we will basically describe how temporal dynamics for geo-objects can be described and how they can be processed algorithmically in the sense of a simulation.

4 Basic Primitives for Dynamic Changes of Spatio-Temporal Objects After the preceding analysis, a constructive proposal shall be developed how the dynamics of geoobjects can be described and treated algorithmically. The following section classifies the primitives of GIS point, line, polygon and topology and specifies how motion can express itself.

138

J. Wittmann

4.1 Dynamics of Geo-Objects For each of the primitives, the possibilities of the dynamic changes that occur and how these are to be described in the model are to be classified here. The classification of Yattaw as well as the classification of model description methods are included and combined to a dynamic description for geo-objects. Figure 1 shows an overview of the classification.

4.1.1

Type Point

In the simplest case there is a point feature. Depending on the model target, a point object can either move in space by continuous motion or suddenly change its position. In the first case, this continuous motion can be represented by a motion vector, i.e. the indication of direction and velocity; mathematically, the description can be traced back to a differential equation. In the case of the jerky position change one must fall back on the concepts of the discrete simulation and interpret the position change as a discrete event, whose execution takes place without time delay (thus just sudden) and which is released by any kind of logical condition. In addition to these motion forms of a point object existing in the system or in the model at the current time, dynamic changes in the topology are possible, which in this case represent the creation or deletion of a point object. Examples are obvious: for the continuous case, a continuously moving vehicle or a bird flying continuously in 3D space. An animal’s position measured or observed only at discrete points in time during animal migrations or all movements between stops of a public transport vehicle where the actual route is not essential for the model purpose but only the information when the vehicle is at which stop according to the timetable. The appearance and disappearance of point objects does not need any further examples; it can be combined with the continuous case as well as with the discrete case.

4.1.2

Type Line

For line objects, the dynamics must be further differentiated: It can be a movement of the line object as a whole or the line object changes its shape only in parts, in which only a subset of the defining points moves, but the other grid points retain their position. Both variants are possible for both the continuous and the discrete case. In addition, there are the changes in the topology, which can be mapped by the elementary methods “add grid point” and “remove grid point”.

Graph-Grammars to Specify Dynamic Changes …

Fig. 1 Dynamics specification for GIS primitives

139

140

J. Wittmann

An example for the continuous case could be the dynamic development of a coastline, the discrete case could represent for example a barrier or a fence, which is adapted once a year to geographical conditions, for example as protection for walkers to the continuous change of a steep coast. If the line dynamics is described by the dynamics of a subset of the sampling points, it is important for the modelling that the semantics and topology of the line as a whole are not violated by the dynamics specified separately for the individual points (e.g. the property of not crossing). Accordingly, during the description or at least during the subsequent processing of the description in the simulation, care must be taken that semantic violations lead to an error message and to the termination of the simulation.

4.1.3

Type Polygon

For objects of type polygon, the same remarks apply as for line objects. Polygons can move synchronously with all their grid points, either continuously or suddenly. If only a subset of the polygon’s grid points shows dynamics, this can be specified by a motion vector (continuous) or by event-like, abrupt changes of the grid point’s position. Also here it can come to the injury of the topological characteristics of the polygon by the single dynamics of supporting points. For such cases appropriate precautions have to be taken in model description and simulation. The corresponding topological dynamic alternatives behave analogously to the previously discussed ones. An example for a discrete movement of a polygon object is the daily moving of a sheep pile as a whole or with adaptation of the piles to geographical conditions. The continuous development of a polygon, for example, forms the development of a settlement area. In this example, the supporting points will have different movement patterns. The continuous movement of a harrow pulled by a continuously driving tractor over a field can be regarded as an example of a continuous but shape-maintaining dynamic.

4.1.4

Remarks on the Previous Classification

Remark 1 In principle, modeling is possible by specifying a motion vector or by specifying a discrete event. The motion vector to be interpreted continuously is sufficient for the dynamics specification and is preferred for physical-scientific models. It enables the simulation with time steps that go towards zero, i.e. a very high temporal resolution. However, this also requires a considerable computing time for models described in this way. Therefore, the alternative of discrete models has to be considered, which allow only sudden changes of the system state, but usually require much less computing time. In the case of discrete events, it is necessary to specify a logical condition (when the event takes place) and an “effect” of the event to be formulated in any formal language (what happens at the time of the event).

Graph-Grammars to Specify Dynamic Changes …

141

Remark 2 In the tabular overview of the different dynamic specification variants it is noticeable that a continuous specification of changes is not provided in the topology. This is due to the set-theoretical definition of topology: an element of a set (corner or edge) is either present or not. A “growing of an edge must therefore be emulated by the movement of the supporting points defining the edge. Remark 3 When creating new objects (points or grid points) only the dynamic aspect was discussed in the description and the effect on the topology was described. Of course, all attributes of the newly created objects have to be parameterized sensibly. Remark 4 In this section only the specification possibilities for the dynamic relationships for geo-objects should be classified. The question how amount and direction of a motion vector can be calculated and how far and where a grid point jumps in an event depends on various other variables, or in general terms, on the current system state.

5 Graph Grammars as an Approach to Formalize Topological Changes With the standard methods of continuous and discrete simulation, a number of the cases classified in Fig. 1 could be treated. However, difficulties arise when the movement of parts of an object is different, because the consistency of the topology of the objects may be violated by this movement (new intersections occur at lines, a polygon develops into two partial polygons, …). Therefore, it makes sense not to bind the specification of dynamic changes to the description of the dynamics of individual determinants of an object (e.g. individual points as supporting points), but to specify the dynamics specification on a higher level for a more complex situation. This will show that consistency problems at the root can be avoided. For this purpose, an approach by Schneider [16] will be discussed in the following, which comes from the field of formal languages and deals with so-called graph substitution systems or graph grammars. The definitions and the example of the following two sections are taken literally, with only minor adaptations, from the above-mentioned Schneider script, which summarizes the state of research competently and compactly. The transfer to the situation of the geo-objects then takes place in the third section of this chapter.

5.1 Definitions The definition of a Graph-Grammar is motivated by the definition of the ChomskyGrammar working on strings. Therefore, we start by defining

142

J. Wittmann

Definition (Chomsky Grammar [17]): A Chomsky grammar (phrase structure grammar) is a quadruple G = (T, N, P, S) where T and N are disjoint finite sets (alphabets), S is a distinguished element of N, and P is a finite subset of L*NL* × L* with L = T ∪ N. The elements of T and N are called terminal symbols and nonterminal symbols, respectively. S is the start symbol or axiom, and P is the set of productions. Usually, the productions are written in the Backus-Naur-style [2] as u :: = v instead of (u, v). Definition (Chomsky Language): Each Chomsky grammar defines a set of strings that can be derived from the start symbol and that does not contain any nonterminal symbol: L(G) := {w|w ∈ T∗ ∧ S ⇒∗ w} A set of strings that can be defined by a Chomsky grammar is called a Chomsky language.

So much for the formally precise definition of a grammar. For those readers who are not familiar with this formalism, the practical meaning of this definition can be summarized roughly as follows: If a string contains a sequence of characters that corresponds to the left side of a production, this found sequence of characters can be removed from the string and replaced by the sequence of characters of the right side of the production. Replacements defined in this way are iterated until the character string consists only of terminal symbols that cannot be further resolved. In this way, an infinite number of “allowed” character strings (words of the language) can be generated by only a few rules. In a second step, we can generalize Chomsky’s approach to formalizing the notion of a graph grammar. The main point is to apply productions until some kind of normal form is reached. For this, Chomsky has distinguished terminal symbols from nonterminal ones. Productions are applied until the string does no longer contain nonterminal symbols. We can easily transfer this idea to graph grammars: Definition (Graph grammar): A graph grammar is given by a quadruple G = (L, T, P, S) with P being a finite set of graph productions in a category of labeled (hyper-)graphs using L as the labeling alphabet. T ⊆ L is called the terminal alphabet, S is the starting graph.

Please note that L = (LE, LV) consists of labeling alphabets for (hyper-)edges and nodes, which may be equal, but in general, are not. T ⊆ L is to be interpreted componentwise, i.e., TE ⊆ LE and TV ⊆ LV. Another difference between our definition and Chomsky’s is that we allow a starting graph instead of a starting symbol. But this is only a matter of taste: We can add a production the right-hand object of which is S and whose left-hand object consists of only one node labeled with a special starting symbol. Alternatively, we can use a graph with exactly one edge labeled with the starting symbol. Our solution to allow an arbitrary starting graph avoids the problem of deciding whether we start with a nonterminally labeled edge or with a nonterminally labeled node.

Graph-Grammars to Specify Dynamic Changes …

143

Definition (Graph language): If G is a graph grammar, then the set L(G ) := {G|S ⇒∗ G ∧ lEG [EG ] ⊆ TE ∧ lVG [VG ] ⊆ TV } is called the language of G .

The language of a grammar includes all the graphs that are derivable from the starting graph, but do not contain any nonterminally labeled node or edge. Again the attempt to summarize the meaning of this definition with simple words: If a graph contains a subgraph that corresponds to the left side of a production, this subgraph can be removed from the entire graph and replaced by the subgraph of the right side of the production. These replacements can also be applied iteratively. However, the situation is considerably more complicated compared to the character strings: In the case of the character string, the left and right sides of the character string to be replaced are clearly defined by the reading direction. In the case of graphs, the fits are much more complicated to specify, because rotations and permutations of the nodes are possible. In addition, the graph grammars allow the characterization of nodes and edges by labels, which must also be checked for conformity with the labels in the respective production when replacing them. As we will see, this property proves to be very advantageous if we want to apply formalism to geo-objects, because in this way additional, non-topological attributes of the objects can be considered.

5.2 Example for Graph Substitution Figure 2 shows the situation using the example of the relation “is_mother”. The nodes of the given graph are persons of female (f), male (m) or arbitrary (x) gender. The nodes are numbered consecutively for identification with superscript integers. Between the nodes there is the relation “is_mother”, also with superscript identifier. A three-part substitution scheme is specified as the production: On the left side, a section of a graph is described as a parameterized actual situation. In the middle there is the so-called interface graph and on the right side the situation after application of the production has to be indicated. The purpose of the production is to add the relation “sister_of”. The left-hand object of the production must ensure not only that both persons have the same mother, but also that the person at the source of the new edge is female. We ensure this by defining w mary, joan, jane, dora and x mary, joan, jane, dora, david. Then, we have three ways of applying this production to the given graph (so5, so6 and so7) as depicted in the figure. It should be noted that through the specification of the production all semantic conditions are automatically met: appropriate gender of the persons involved, relationship to the common mother. In addition, it is already clear at this point that finding suitable fits of the left side in the currently given graph can be a non-trivial task.

144

J. Wittmann

Fig. 2 Example for graph replacement (adopted from [16])

5.3 Examples for an Application on Geo-Objects For space reasons the potential of this approach for the description of spatio-temporal geo-objects can only be indicated. Three examples should at least illustrate how complex correlations can be given in general form by suitable graph substitution rules. Topologically such dynamics can be described as growth rules for graphs. Figure 3 shows three simple examples: – If a predetermined flight route encounters an obstacle, it should be avoided by a local change of the route with an additional grid point. – If a point-to-point connection in a network is strongly inquired, another one is weakly inquired, a direct connection for the strongly inquired route shall be established and the weakly inquired route shall be deleted. – In the third example, the objects can be interpreted both as polygons and as graphs. In any case, the aim is to describe growth processes. A shape as shown to the left of the arrow can change into a shape as shown to the right of the arrow. Interpreted as polygons, the development of a settlement area could be modelled in this way. As a graph, the growth of a supply network could be described in this way. The basic procedure how to work with the graph-grammar formalism can be described very simply in 4 steps: 1. Create a set of productions that reflect the dynamics of the treated objects. 2. Find in an existing graph a representation of the left side of a production and extract this left side as subgraphs.

Graph-Grammars to Specify Dynamic Changes …

Fig. 3 Examples of the dynamics of topologies

145

146

J. Wittmann

3. In the subgraph given by the left side, make the change according to the right side of the production. 4. Re-integrate the changed subgraph into its context of the existing graph. However, when looking at the details, some questions arise which complicate the described procedure: 1. How far must the agreement go when comparing the current situation with the left side of the production? Only topologically? With all attributes of the nodes involved? … This is essentially a specification problem that can be solved by the formalism of graph grammars. (In the example by the inclusion of the sex of the nodes.) 2. How to find all fits in the given graph? This is a search-problem, which needs closer consideration, but can be grasped algorithmically by formalization. 3. How are the rules (productions) to be formulated in order to achieve consistency? Formalism also helps here by making the properties of a production set mathematically derivable (analogous to the properties of Chomsky grammars). 4. What is the time behavior of such a specified dynamic? Here the author proposes an interpretation according to the paradigm of discrete models, in which the execution of a production is treated as an effect of a discrete event.

6 Approaches for the Optimization of the Algorithmic Handling of a Dynamic Specification by Graph Grammars Here, two starting points for optimization, which result solely from the presented simulation concept, will be discussed as topics for further research work. An essential difference to the treatment of geo-primitives in GIS lies in the fact that an object does not have to move in the same way with all its determining supporting points, but that the dynamics can be specified differentiated for each individual supporting point. This leads to considerable problems with storage and access to these objects, because efficient access to the individual grid points also violates the encapsulation of the higher, composite objects (e.g. of a polygon). Here considerations are necessary as to how these accesses can be made possible efficiently. In addition to these aspects of efficient storage, the optimization potential that results from the introduction of graph substitution should also be considered. In the conceptual part the specification in form of the replacement rules was indicated to this step, the partial steps, which must cover an implementation and which are essentially responsible for the computation times , were not specified. These are the steps:

Graph-Grammars to Specify Dynamic Changes …

147

1. Search for the pattern given by the left side of a replacement rule in the current model state. 2. Generate the set with allowed substitution variants. 3. Variation of the position of new nodes in space. Besides the obviously needed search strategies, methods for efficient expansion of the search space through the systematic generation of alternatives are needed. For both methods effective and efficient data structures have to be found. For locally limited searches suitable hashing strategies should be considered. Alternatives found must be evaluated and ranked. Efficient access to the evaluation model is necessary. The evaluation must be secured by appropriate statistical procedures (confidence intervals, …). This in turn suggests accelerating the simulation runs for different parameterizations by parallelization. Since similar problems will occur frequently (e.g. finding an optimal position for a new node), it is likely that self-learning procedures can significantly shorten this process step. Before a set of replacement rules is used, the findings from the theory of graph grammars allow syntactic and semantic tests for completeness and consistency of the set of rules. It is conceivable and desirable to extend these tests by conditions to ensure the correct topology and topography. If such statements can already be formally derived from the ruleset, topological and topographic errors are excluded at runtime and thus the number of alternatives to be compared is reduced from the beginning.

7 Conclusion The starting point for the paper was the observation that in the interdisciplinary field of modelling and simulation of spatial objects a specification level is missing that is on the one hand descriptive enough to represent complex dynamic changes also for the non-informatics experts from the respective fields of application and on the other hand formal enough to be accessible to an algorithmic treatment in the sense of a simulation algorithm. An analysis of existing modelling paradigms and software systems shows that the usual modelling techniques offer little support in the specification of spatial dynamics. Although the object-oriented, individual-based approaches can also be used to model spatial processes, the users are referred to the proprietary solutions when observing spatial consistency conditions. In this situation, the article first analyzes the dynamics for the GIS primitive point, line and polygon and shows in particular the need to model complex dynamics of the grid points (type point) topologically and topographically consistent for line and polygon. Based on this analysis, the formalism of the graph substitution systems is transferred to the application in the field of spatial models. The benefit, which contrasts with the not inconsiderable effort through formalization, consists above all in the clean algorithmic handling and processing of model dynamics described by graph

148

J. Wittmann

productions. Consistency conditions in consequence of the spatial reference can be considered on the meta-level of the productions and avoid topologically senseless dynamics. In addition, formalism proves to be advantageous when it comes to making the algorithmic processing of the numerous alternative possibilities for the development of objects in space manageable and optimizing the complexity and consistency of the solutions. The integration of this approach into a simulation runtime system will be discussed in a separate paper. The idea is to treat the execution of a replacement given by production as a discrete event in the sense of discrete event simulation. In addition, the practicability of the approach and especially the suitability of the proposed dynamics description for communication with non-informaticians will be tested in suitable practical projects from the field of application of environmental informatics.

References 1. Google Maps, 5 3 2019. https://maps.google.com 2. OpenStreetMap, 5 3 2019. https://openstreetmap.org 3. Ortmann, J.: Ein allgemeiner individuenorientierter Ansatz zur Modellierung von Populationsdynamiken in Ökosystemen unter Einbeziehung der Mikro- und Makroebene. Dissertation am Fachbereich Informatik, Universität Rostock, Rostock (1999) 4. Balzert, H.: Lehrbuch der Objektmodellierung. Analyse und Entwurf. Spektrum, Heidelberg (1999) 5. O’Sullivan, D., Perry G.L.W.: Spatial Simulation—Exploring Pattern and Process. Wiley, Chichester (2013) 6. Zeigler, B.: Object-Oriented Simulation with Hierarchical, Modular Models. Academic Press, London (1990) 7. Uhrmacher, A.: Dynamic structures in modeling and simulation—a reflective approach. ACM Trans. Model. Simul. 11(2), 206–232 (2001) 8. MARS: MARS-Group, 04 03 2019. https://mars-group.org/features/#gis 9. Esri ArcMap, Esri ArcMap Online-Hilfe: Erstellen von Animationen in ArcGIS. 25 6 2018. https://desktop.arcgis.com/de/arcmap/10.3/map/animation/about-building-animationsin-arcgis.htm 10. Kimerling, A., Buckley, A., Muehrcke, P., Muehrcke, J.: Map use, Chapter 8, quantitative thematic maps, p. 206. ESRI Press, Redlands, California (2016) 11. Yattaw, N.J.: Conceptualizing space and time: a classification of geographic movement. Cartogr. Geogr. Inf. Sci. 26(2), 85–98 (1999) 12. openMI: openMI, 4 4 2019. https://www.openmi.org/ 13. Harpham, Q., Danovaro, E.: Towards standard metadata to support models and interfaces in a hydro-meteorological model chain. J. Hydroinformatics 17(2), 260–274 (2015) 14. Yuan, M.: Representing complex geographic phenomena in GIS. Cartogr. Geogr. Inf. Sci. 28(2), 83–96 (2001) 15. Thorp, K.-R., Bronson, K.-F.: A model-independent open-source geospatial tool for managing point-based environmental model simulations at multiple spatial locations. Environ. Model. Softw. 50, 25–36 (2013) 16. Schneider, H.: Graph transformations—an introduction to the categorical approach. Vorlesungsunterlagen, 6 3 2019. https://www2.cs.fau.de/staff/schneider/gtbook/index.html 17. Chomsky, N.: On Certain formal properties of grammars. Inf. Control. 2, 137–167 (1959)

Online Anomaly Detection in Microbiological Data Sets Leonie Hannig, Lukas Weise and Jochen Wittmann

Abstract To prevent health risks caused by waterborne bacteria, significant changes of the bacterial community have to be detected as soon as possible. The aim of this study was to research suitable methods and implement a prototype of a system that can immediately detect such anomalous data points in microbiological data sets. The method chosen for the detection of anomalous cell counts was prediction-based outlier detection: auto generated models were used to predict the expected number of cells in the next sample and the real number was compared to the prediction. Significant changes in bacterial communities were identified using Cytometric Fingerprinting, a method that provides functionalities to compare multivariate distributions and quantify their similarity. The prototype was implemented in R and tested. These tests showed that both methods were capable to detect anomalies but have to be optimized and further evaluated. Keywords Anomaly detection · Water monitoring · Cytometric fingerprinting · Machine learning · Waterborne bacteria

1 Introduction Pathogenic bacteria can be transmitted via contaminated water and cause rapid and widespread health risks [1]. To prevent such risks, bacteria in water, especially drinking water, need to be monitored continuously. Flow cytometers are capable of rapidly measuring several characteristics of individual cells in water samples at once [2] and have been increasingly used for water monitoring [3], replacing conventional timeL. Hannig (B) · J. Wittmann Hochschule für Technik und Wirtschaft Berlin, Treskowallee 8, 10318 Berlin, Germany e-mail: [email protected] J. Wittmann e-mail: [email protected] L. Weise Berliner Wasserbetriebe, Neue Jüdenstraße 1, 10179 Berlin, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Schaldach et al. (eds.), Advances and New Trends in Environmental Informatics, Progress in IS, https://doi.org/10.1007/978-3-030-30862-9_11

149

150

L. Hannig et al.

consuming and laborious methods like plate counts [4]. The aim of this feasibility study was to use such flow cytometric data of water samples to identify and test suitable methods and approaches that can be used to identify “anomalous” samples at the time of their occurrence and implement a prototype of an automatic detection system. Which bacteria and how many bacterial cells can be found in a water sample strongly depends on environmental factors like pH, temperature, osmolarity and oxygen [5]. If these environmental factors change, the bacterial community changes too: for some groups these new conditions are less favorable and they grow less, other groups may grow more, hereby changing their respective percentages in the bacterial community. However, environmental factors are constantly changing in a natural habitat and so are the bacteria. There can even be regularly occurring changes like increasing and decreasing of bacterial populations depending on seasons [6]. The challenge is to distinguish between changes that are caused by normal or usual changes in the environment and changes that are caused by events that are not normal. Such events like sudden contamination by damage [4] or infiltration of river water into groundwater can lead to sudden spikes in the number of cells [7]. Apart from such anomalous cell counts there can also be anomalous changes to the composition of the bacterial community of an ecosystem. Since such anomalies may be indicative of a contamination with pathogenic bacteria they need to be detected as soon as they occur (“online”) to facilitate fast identification and reduction of health risks. Anomalies are defined as data points that differ significantly from the surrounding data [8]. This means that there are no hard criteria like maximum normal cell count or maximum normal percentage of large cells. Making development even more difficult, there were no examples of anomalous data that could be used to deduce such biological criteria for the available data set. Since the detection system could not use biological criteria, it had to rely solely on machine learning methods that can recognize patterns in data. The anomaly detection prototype is therefore based on the assumption that when using accurate models that accurately describe the relationship between data points, the patterns, statistical anomalies are indeed biological anomalies, i.e. anomalies with a biological cause. First, suitable methods for detecting cell count anomalies and anomalous bacterial communities were researched. The prototype was then implemented in R and preliminary tests were done using data from a river monitoring project that was conducted in spring 2018 on the river Spree in Berlin, Germany. Every day, except for holidays and weekends, an automatic water sampler was used to collect samples on a sampling site in Berlin. The samples were then taken to a laboratory and analyzed by flow cytometry. The analysis was conducted according to the procedure described by Kötzsch [9] using the fluorescent stain SYBR Green that binds specifically to nucleic acids (DNA and RNA) [3].

Online Anomaly Detection in Microbiological Data Sets

151

2 Theoretical Background and Methods 2.1 Data Collection—Flow Cytometry Flow cytometry is a method that makes it possible to characterize individual cells using their fluorescence [10]. The cells are suspended in a liquid and each cell is rapidly moved through a laser causing the emission of radiation in a way that is characteristic for different cell properties and cell components. The radiation that is scattered in a forward direction (Front Scatter, FSC) is dependent on the size of a cell while the radiation that is scattered to the sides (Side Scatter, SSC) is indicative of the granularity of a cell. In addition to that, antibodies or dyes can be added that bind specifically to certain cell structures [10]. E.g., the fluorescence emitted by SYBR Green, the fluorescent stain that was used for this study, is directly proportional to the cell’s nucleic acid content [3]. For each single cell a multivariate data set is created and appended to a file as a new row of fluorescence intensity measurements. The parameters (channels) chosen for the monitoring program were Side Scatter, Front Scatter, red fluorescence and green fluorescence. The complete data set that was used contained 325 single files, each representing a single point in time during the sampling period between April 3 and May 23. The number of rows in a file corresponds to the number of cells in the sample. The cell counts for each point in time are shown in Fig. 1. The arrows indicate start or endpoints of the different subsets that were used for the evaluation. The two spikes on April 15 and May 15/16 were caused by rainfall. The gaps in the data set are caused by the days on which no samples were taken.

2.2 Data Analysis—Detecting Anomalies in Multivariate Distributions and Time Series Anomalous Bacterial Communities Cells with similar properties, e.g. size or nucleic acid content, have similar fluorescence intensities. Such groups of similar cells (“populations”) can be visually distinguished using two-dimensional scatter plots [11]. Each point represents a cell and its respective measured fluorescence intensities. Different densities are heatmap coded, ranging from low densities in blue to high densities in red. Figure 2 shows two such scatter plots of samples with the same cell number but significantly different bacterial communities: Fig. 2a shows a community that consists of both cells with high nucleic acid content and low nucleic acid content whereas the community shown in Fig. 2b mainly consists of cells with high nucleic acid content. Each water sample has its own unique bacterial community which in turn is characterized by its unique multivariate distribution of fluorescence intensities. To answer the question how similar or dissimilar two bacterial communities are to each

152

L. Hannig et al.

Fig. 1 Complete river data set from April 3 to May 23, 2018: measurements are indicated by blue circles, the black line depicts the linear imputation

other, a technique developed by Roederer et al. [12, 13] called Probability Binning can be employed. In order to compare two distributions, one distribution (the control) is divided into bins that each contain the same number of cells (Fig. 3a). The second distribution is compared by counting the number of cells that fall into each of the bins and the counts are statistically analyzed. The method was refined by Rogers et al. [14] who called it Cytometric Fingerprinting. It is available for use in R via the FlowFP package [15]. The fingerprint, that the method was named after, is a visualization of the difference in cell counts between control and test sample. The fraction by which the cell count deviates between sample and control in each bin is log2 -transformed and

Online Anomaly Detection in Microbiological Data Sets

(a)

153

(b)

Fig. 2 Scatterplots. a Sample containing cells with high and low nucleic acid content. b Sample containing mainly cells with high nucleic acid content

entered in a graph like the example shown in Fig. 3b [14]. The further a point deviates from zero (blue line) the more different is the cell count between template and sample in that bin. The difference between the two plots can now be quantified easily by computing the standard deviation which is also given in the plot above the fingerprint. For easy analysis of many samples at once, a color code indicates the degree of similarity, ranging from high similarity indicated by low standard deviations in green to low similarity indicated by high standard deviations in red. Cell Count Anomalies Since the river water samples were taken consecutively on the same site, the measurements are not independent from each other and are highly correlated (“temporal continuity” [8]). Such time series data has to be analyzed using appropriate methods that take this correlation into account [16]. These methods can be used to develop models that describe the behavior of the data mathematically, e.g. ARIMA (autoregressive integrated moving average) models. This type of model is based on the assumption that current values are strongly influenced by past values, making it possible to use past values to model current values [16]. Since the number of bacteria at a given point in time depends mainly on the number of bacteria at the previous point in time, their reproduction and mortality, such models are well applicable to microbiological data sets. Anomalies occur when the parameters of the time series, the underlying mechanisms, change and the observed values cannot be explained by the model anymore [17].

154

L. Hannig et al.

Identifying anomalies as soon as they occur (“Online Anomaly Detection”) can be accomplished by computing a model and predicting the next expected value. This prediction is then compared to the real value which is classified as normal or anomalous depending on wether the real value falls within the prediction interval or not (Fig. 4). This is called “Prediction-based Outlier Detection” [8]. Model fitting and

(a)

(b)

Fig. 3 Fingerprinting. a Template showing the bins. b Fingerprinting deviation plot for one sample

(a)

(b)

Fig. 4 Classification of real values. a Normal. b Anomaly. Forecasting horizon 2 h; blue: predicted values, gray: prediction intervals (confidence level 0.99)

Online Anomaly Detection in Microbiological Data Sets

155

forecasting can be done automatically using the R package forecast [18], developed by Hyndman and Athanasopoulos [19]. However, the methods and techniques used are only applicable to regular time series, i.e. values taken at equally spaced points in time, and missing values have to be imputed.

3 Concept and Implementation Two versions of the script were developed: one version for the analysis of artificially created test files with defined properties and another version for the analysis of a test data set containing data that was obtained on adjacent points in times. Figure 5 shows the workflow of the analysis script for such a continuous data set. The data set is divided into an initial training data set and an initial test set. For developing and testing purposes the first 30 files were used as initial training data set. This was done to obtain meaningful results while keeping the already long computation time as short as possible. However, the division can be changed according to the user’s needs. The files in the test set are then consecutively analyzed. First, the four last files from the training data set, representing the condition of the investigated ecosystem during the 8 h period before the current point in time, are used to compute the Fingerprinting template that serves as control. This “sliding window” approach was used because bacterial communities always change in a certain range and the template needs to represent this normal variability in order to detect changes that exceed the normal range. Gaps caused by days on which no sampling was done were ignored. The test file is then compared to the template and classified using the standard deviation. Samples

Fig. 5 Workflow of the prototype

156

L. Hannig et al.

with a standard deviation of or above 0.5 are classified as anomalies, samples with lower standard deviations are classified as normal. In the next step, the cell count is classified via Prediction-based Outlier Detection: the training data set is used to compute an ARIMA model describing the time series via the auto.arima function of the forecast package. This model is used to forecast the next expected cell count and the real cell count is compared to the expected value. It is classified as normal if it falls within the prediction interval, values that fall outside of the prediction interval are classified as anomalous. For this part of the analysis, missing values are added via imputation before model computation to create a regular time series. For time series imputation Kalman Smoothing provided by the imputeTS package [20] was used. After completion of the analysis, the test file is added to the training data set and the process is repeated with the next file. The script for analyzing test files is slightly different: all test files mimic data taken on the same point in time, therefore they are not added to the training data set and all tests are run using the same model and template. For each file forecasting-plots and fingerprinting-plots are created and saved. When all files in the test set are analyzed, a list of all files, cell counts, standard deviations and classifications is saved as a .txt file. The script was implemented in R Version 3.4.3. For handling of the files and fingerprinting the packages flowCore (Version 3.7) and flowFP (Version 1.38.0) were used. Handling of time series data was done with the help of the packages chron (Version 2.3-52), zoo (Version 1.8-1), imputeTS (Version 2.6) and tseries (Version 0.10-44). Time series analysis and forecasting was done using the package forecast (Version 8.3).

4 Evaluation Figure 6 shows the results of a test run using the whole original data set with imputed missing values. To test specificity and sensitivity of the anomaly detection script, tests with different subsets of the river data set were conducted. For each subset (the dates are shown in Fig. 1 and listed in Table 1) the file containing the measurements taken on the last point in time of that subset was taken and used for test data creation with a python script. The created test files either had changed cell counts (increased or decreased, starting at 5%) but unchanged bacterial communities or changed bacterial communities (increased or decreased percentage of cells with high and low nucleic acid content, starting at 5%) but unchanged cell counts. The tests were done with forecasting horizons of 1 h or 2 h and a confidence level of 0.99. Looking at the highest change that is classified as normal and the lowest change that is classified as anomalous gives a range of percentage changes. The threshold, the exact percentage change that separates normal and anomalous values, lies within

Online Anomaly Detection in Microbiological Data Sets

157

Fig. 6 Anomalies identified using the whole data set with imputation; red dots: cell count anomalies, red circles: anomalous bacterial communities

that range. The results for the different subsets of the river data set are given in Table 1. The threshold for increased percentages of high nucleic acid cells lies between 10 and 15% with the exception of subset 5 which has a slightly higher threshold. Increasing the percentage of low nucleic acid cells leads to similar results: for each subset the threshold lies between 10 and 15%. Thresholds for cell count changes lie between 5 and 15% with the exception of subset 5 which has a threshold lying between 25 and 30% for increased and decreased cell counts. In addition to the original data, synthetic data sets were created that simulate a regular (continuous) time series without gaps. For Synthetic Data Set 1 the last sample from every sampling was taken and added to a new data set in the original order. This daily data set is shown in Fig. 7a. Testing of test data with this data set was done with a forecasting horizon of 1 day and confidence level of 0.95. They yielded similar results regarding the changes in high and low nucleic acid cells: the threshold was again in a range between 10 and 15%. The threshold for cell count increases falls into the range from 30 to 40% while the range for decreased cell counts lies between 5 and 10%. In all tests, all files that were changed by a higher percentage than the threshold were classified as anomalous. Tests with continuous data sets were also conducted. For this test Synthetic Data Set 1 was used but expanded by 5 additional files. The first 10 files were used as initial training data set. Test results are also shown in Table 1.

158

L. Hannig et al.

Table 1 Detection thresholds resulting from tests with test data: changed percentage of high (HNA) or low (LNA) nucleic acid cells with same cell count or changed cell count with unchanged HNA/LNA percentage. Confidence levels: 0.99 (river data) or 0.95 (Synthetic Data Set 1), h: forecasting horizon (h for hour, d for day), th: anomaly detection threshold Subset h Range of anomaly detection threshold th [%] Increased Increased Increased cell Decreased cell HNA LNA nr nr 1. 04/03–05/15 12:00 2. 04/03–05/15 04:00 3. 04/22–05/14 4. 04/22–04/24 5. 04/22–05/23 Synthetic Data Set 1

2h 1h 1h 2h 1h 1d

10 < th < 15 10 < th < 15 10 < th < 15 10 < th < 15 15 < th < 20 10 < th < 15

10 < th< 15 10 < th < 15 10 < th < 15 10 < th < 15 10 < th < 15 10 < th < 15

5 < th< 10 10 < th < 15 10 < th